-
-
Notifications
You must be signed in to change notification settings - Fork 17.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Optionally use sys.getsizeof in DataFrame.memory_usage #11595
Comments
|
Yup, probably shouldn't be default. I'd be quite happy to opt-in. On the other side of the comparision, 500ms is very small compared to serialization and communication time if we mistakenly decide that it's a good idea to communicate this dataframe to another machine. |
going to give you a cythonized comparison in a sec |
|
I am giving back the original nbytes + the actual overhead (e.g. to store it costs you the pointer in the ndarray + the actual storage) |
Seems like a good idea. The speedup there is nice. |
yep, ok, easy enough (then would turn off the '+') if you opt-in |
|
What do you think about overriding |
ahh, so |
…ntrospection of memory consumption, pandas-dev#11595
PERF/DOC: Option to .info() and .memory_usage() to provide for deep introspection of memory consumption #11595
Glad to finally see #8578 implemented. 👍 . It appears that when a Continuum co-worker complains of a pandas wart it gets fixed in 60 minutes instead of being repeatedly deflected with excuses over the course of 3 days until the user runs away screaming in exasperation. Good work! |
@jickersville that's not a very nice comment. What issue has:
???? |
since @jickersville account was created today. I suspect you are actually @kay1793 whom was banned for egregious behavior. prove me wrong here. |
sigh On Wed, Mar 30, 2016 at 10:23 AM, Jeff Reback notifications@github.com
|
I would like to know how many bytes my dataframe takes up in memory. The standard way to do this is the
memory_usage
methodFor object dtype columns this measures 8 bytes per element, the size of the reference not the size of the full object. In some cases this significantly underestimates the size of the dataframe.
It might be nice to optionally map
sys.getsizeof
on object dtype columns to get a better estimate of the size. If this ends up being expensive then it might be good to have this as an optional keyword argument.The text was updated successfully, but these errors were encountered: