Join GitHub today
GitHub is home to over 20 million developers working together to host and review code, manage projects, and build software together.
Optionally use sys.getsizeof in DataFrame.memory_usage #11595
Comments
|
mrocklin
commented
Nov 13, 2015
|
Yup, probably shouldn't be default. I'd be quite happy to opt-in. On the other side of the comparision, 500ms is very small compared to serialization and communication time if we mistakenly decide that it's a good idea to communicate this dataframe to another machine. |
|
going to give you a cythonized comparison in a sec |
|
|
I am giving back the original nbytes + the actual overhead (e.g. to store it costs you the pointer in the ndarray + the actual storage) |
mrocklin
commented
Nov 13, 2015
|
Seems like a good idea. The speedup there is nice. |
|
yep, ok, easy enough (then would turn off the '+') if you opt-in |
|
|
What do you think about overriding |
|
ahh, so |
jreback
added Output-Formatting API Design
labels
Nov 13, 2015
jreback
added this to the
0.17.1
milestone
Nov 13, 2015
jreback
referenced
this issue
Nov 13, 2015
Merged
PERF/DOC: Option to .info() and .memory_usage() to provide for deep introspection of memory consumption #11595 #11596
jreback
added a commit
to jreback/pandas
that referenced
this issue
Nov 13, 2015
|
|
jreback |
89cad6b
|
jreback
closed this
in #11596
Nov 13, 2015
jreback
added a commit
that referenced
this issue
Nov 13, 2015
|
|
jreback |
ddd0372
|
jreback
referenced
this issue
Dec 29, 2015
Closed
index is included in memory usage by default #11867
jickersville
commented
Mar 30, 2016
|
Glad to finally see #8578 implemented. Good work! |
|
@jickersville that's not a very nice comment. What issue has:
???? |
|
since @jickersville account was created today. I suspect you are actually @kay1793 whom was banned for egregious behavior. prove me wrong here. |
|
sigh On Wed, Mar 30, 2016 at 10:23 AM, Jeff Reback notifications@github.com
|
mrocklin commentedNov 13, 2015
I would like to know how many bytes my dataframe takes up in memory. The standard way to do this is the
memory_usagemethodFor object dtype columns this measures 8 bytes per element, the size of the reference not the size of the full object. In some cases this significantly underestimates the size of the dataframe.
It might be nice to optionally map
sys.getsizeofon object dtype columns to get a better estimate of the size. If this ends up being expensive then it might be good to have this as an optional keyword argument.