docs | |
---|---|
tests | |
package |
Python support for linear TSV files
- Free software: MIT license
In contrast to Excel's TSV dialect, linear TSV is line-based.
"But hey", I hear you say, "isn't TSV always line-based?". Well, the issue arises when a cell contains a tab or newline character. In excel's TSV format, that cell is surrounded by quotes and the entry is continued on the next line. Now you have:
- entries spanning several lines
- quotes that need to be ignored (")
- quotes that are escaped by doubling them ("")
Since entries can span several lines, many naïve file manipulations aren't possible:
- Taking the first 50 entries of a dataset: head -n 50 customers.tsv
- Filtering entries: grep "Zürich" customers.tsv
- Sorting the entries alphabetically: sort customers.tsv
All of this can be prevented if you simply:
- escape tabs: \t
- escape newlines: \n
- escape carriage returns: \r
- escape backslashes: \\
Lastly, linear TSV can also encode None as \N.
That's [linear TSV](http://dataprotocols.org/linear-tsv/) in a nutshell.
pip install tsv2dict
You can also install the in-development version with:
pip install https://github.com/nkurmann/tsv2dict/archive/master.zip
https://tsv2dict.readthedocs.io/
To run all the tests run:
tox
Note, to combine the coverage data from all the tox environments run:
Windows | set PYTEST_ADDOPTS=--cov-append tox |
---|---|
Other | PYTEST_ADDOPTS=--cov-append tox |