-
-
Notifications
You must be signed in to change notification settings - Fork 18.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
DOC: Timings/space of datatypes in the docs #3871
Comments
my 2c:
|
(Not sure why I took out a mention of R, it was there, this was the main one I had in mind :) ).
|
https://groups.google.com/forum/m/#!topic/pydata/G6Z-SN9SJnY for a conversion about this |
can prob add this 2 |
@jreback I kindof think these should be distinct, but not sure what a good name would be. |
ok...sure..maybe a new top-level section (or maybe part of FAQ or something) |
related #696, and perf of read_csv from wes' blog http://wesmckinney.com/blog/?p=543 |
Reproducing my answer here (from the above link): You have to do this in reverse.
Technically memory is about this (which includes the indexes)
So 160MB in memory with a 400MB file, 1M rows of 20 float columns
MUCH more compact when written as a binary HDF5 file
Data is not that compressible though as its random. WIth strings (same string, so maybe a little bogus) (file is about 1/2 size of the floats!)
|
Just put this in for perf comparison of IO methods: 0d79ff8 so paritial progress for this |
closing due to lack of activity, and it's not really clear what's needed anymore at this point |
Would it be useful to have a section in the docs discussing:
Probably distinct from comparing functionality (although that may also be interesting) e.g. like numpy do it for features against matlab here: http://wiki.scipy.org/NumPy_for_Matlab_Users) e.g. #3980
The text was updated successfully, but these errors were encountered: