1 Billion Row Challenge in Python. No libs, just std.
This repository follows few reasons:
- I want to challenge myself to see how efficiently I can process large datasets using only the Python standard library.
- I want to practice profiling python
- I want just to have fun with python :)
I'll post my thoughts and findings in the README as I progress through the challenge. I think cool blog post can be written at the end of the challenge.
Ok, we got straight forward implementation now. Just read csv row by row, save min, max sum, count in dict. On 10 million items we got 7.19 seconds. Not very good, on 1 billion it will be painfully slow. Let's get our hands dirty and find what takes this time.