Skip to content

nkurmann/tsv2dict

Repository files navigation

Overview

docs Documentation Status
tests
Travis-CI Build Status AppVeyor Build Status Requirements Status
Coverage Status
package

Python support for linear TSV files

  • Free software: MIT license

What is Linear TSV

In contrast to Excel's TSV dialect, linear TSV is line-based.

"But hey", I hear you say, "isn't TSV always line-based?". Well, the issue arises when a cell contains a tab or newline character. In excel's TSV format, that cell is surrounded by quotes and the entry is continued on the next line. Now you have:

  • entries spanning several lines
  • quotes that need to be ignored (")
  • quotes that are escaped by doubling them ("")

Since entries can span several lines, many naïve file manipulations aren't possible:

  • Taking the first 50 entries of a dataset: head -n 50 customers.tsv
  • Filtering entries: grep "Zürich" customers.tsv
  • Sorting the entries alphabetically: sort customers.tsv

All of this can be prevented if you simply:

  • escape tabs: \t
  • escape newlines: \n
  • escape carriage returns: \r
  • escape backslashes: \\

Lastly, linear TSV can also encode None as \N.

That's [linear TSV](http://dataprotocols.org/linear-tsv/) in a nutshell.

Installation

pip install tsv2dict

You can also install the in-development version with:

pip install https://github.com/nkurmann/tsv2dict/archive/master.zip

Documentation

https://tsv2dict.readthedocs.io/

Development

To run all the tests run:

tox

Note, to combine the coverage data from all the tox environments run:

Windows
set PYTEST_ADDOPTS=--cov-append
tox
Other
PYTEST_ADDOPTS=--cov-append tox

About

Python support for linear TSV files

Resources

License

Stars

Watchers

Forks

Packages

No packages published