py-csv

why

it should be possible to process well formed csv in pure python with decent performance.

what

a code template of boilerplate to copy/paste and then modify row handling code.

how

python doesn't have inlining or macros, so far maximum performance we have to manually inline code into the body of the parsing loop.

for input we minimize allocations and syscalls by reading large chunks into a buffer and only allocating meta data about the start and end offset of each column in a row.

for output we mutate a byte buffer and only call write when the buffer would overflow.

this parsing is only for well formed csv, and is only aware of comma and newline. if you need to parse that is not well formed, use the standard library parser.

if you need to go faster than this, look into using similar techniques to minimize allocations and syscalls with native code

demo

>> time pypy3 gen_csv.py 8 15000000 > /tmp/large.csv
real    0m9.075s

>> ls -lh /tmp/large.csv
-rw-r--r-- 1 nathants nathants 1.1G Jul  1 13:24 /tmp/large.csv

>> time pypy3 csv_stdlib.py 3,7 </tmp/large.csv >/dev/null
real    0m17.199s

>> time pypy3 csv_faster.py 3,7 </tmp/large.csv >/dev/null
real    0m6.636s

>> time pypy3 csv_fastest.py 3,7 </tmp/large.csv >/dev/null
real    0m4.259s

>> time python3 csv_stdlib.py 3,7 </tmp/large.csv >/dev/null
real    0m21.720s

>> time python3 csv_faster.py 3,7 </tmp/large.csv >/dev/null
real    0m11.531s

>> pypy3 csv_stdlib.py 3,7 </tmp/large.csv | md5sum
b64f540b7a1713bcc9f509ff3f9062a5  -

>> pypy3 csv_faster.py 3,7 </tmp/large.csv | md5sum
b64f540b7a1713bcc9f509ff3f9062a5  -

>> pypy3 csv_fastest.py 3,7 </tmp/large.csv | md5sum
b64f540b7a1713bcc9f509ff3f9062a5  -

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
csv_faster.py		csv_faster.py
csv_fastest.py		csv_fastest.py
csv_stdlib.py		csv_stdlib.py
gen_csv.py		gen_csv.py
license.txt		license.txt
readme.md		readme.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

csv_faster.py

csv_faster.py

csv_fastest.py

csv_fastest.py

csv_stdlib.py

csv_stdlib.py

gen_csv.py

gen_csv.py

license.txt

license.txt

readme.md

readme.md

Repository files navigation

py-csv

why

what

how

demo

About

Releases

Packages

Languages

License

nathants/py-csv

Folders and files

Latest commit

History

Repository files navigation

py-csv

why

what

how

demo

About

Resources

License

Stars

Watchers

Forks

Languages