index is included in memory usage by default #11867

max-sixty · 2015-12-19T06:59:15Z

...and sys.getsizeof returns correct value. Closes #11597

Is this the best implementation for a common method across classes?

jreback · 2015-12-19T13:34:10Z

pandas/core/base.py

@@ -544,6 +544,8 @@ def memory_usage(self, deep=False):
            v += lib.memory_usage_of_objects(self.values)
        return v

+    __sizeof__ = com._sizeof


you can put this in PandasObject I think

jreback · 2015-12-19T13:41:26Z

pandas/tests/test_frame.py

@@ -7693,6 +7693,10 @@ def test_info_memory_usage(self):
        DataFrame(1,index=pd.MultiIndex.from_product([['a'],range(1000)]),columns=['A']).index.nbytes
        DataFrame(1,index=pd.MultiIndex.from_product([['a'],range(1000)]),columns=['A']).index.values.nbytes

+        # sys.getsizeof works the same as memory_usage with defaults, albeit with some
+        # GC overhead
+        self.assertAlmostEqual(sys.getsizeof(df), df.memory_usage().sum(), delta=100)


test this on series/Index/Categorical as well. (for Index put a test in test_index on the Base which will run for all indexes).

I added onto the existing tests in IndexOps - lmk if that's not OK

max-sixty · 2015-12-21T19:08:25Z

@jreback Python 2.6 error with delta argument on assertAlmostEqual - should I skip the test on 2.6 or find an alternate way of doing this?

xref #7718

jreback · 2015-12-21T19:12:47Z

pandas/tests/test_base.py

                self.assertEqual(o.memory_usage(index=False) + o.index.memory_usage(),
                                 o.memory_usage(index=True))

+            # sys.getsizeof will call the .memory_usage defaults, and add on some GC overhead
+            self.assertAlmostEqual(res, sys.getsizeof(o), delta=100)
+


we don't use this

use tm.assert_almost_equal

I thought we disabled the test routines that one shouldn't use - in tm.TestCase
but maybe that's an open issue (or if not can you create one)

we are dropping 2.6 soon - but still like for there to be 1 way to do testing

OK - although the functionality is materially worse than the nosetests one (actually kinda peculiar with the bool argument for 3 vs 5):

check_less_precise : bool, default False
Specify comparison precision.
5 digits (False) or 3 digits (True) after decimal points are compared.

max-sixty · 2015-12-21T20:41:09Z

@jreback assertLess isn't supported either on 2.6?
But we're green

jreback · 2015-12-23T17:52:26Z

doc/source/whatsnew/v0.18.0.txt

@@ -108,6 +108,9 @@ Other enhancements
 - A simple version of ``Panel.round()`` is now implemented (:issue:`11763`)
 - For Python 3.x, ``round(DataFrame)``, ``round(Series)``, ``round(Panel)`` will work (:issue:`11763`)
 - ``Dataframe`` has gained a ``_repr_latex_`` method in order to allow for automatic conversion to latex in a ipython/jupyter notebook using nbconvert. Options ``display.latex.escape`` and ``display.latex.longtable`` have been added to the configuration and are used automatically by the ``to_latex`` method.(:issue:`11778`)
+- sys.getsizeof(obj) returns the memory usage of a pandas object including the (non-object)


actually I think this should call with deep=True.

I don't have enough context to know whether deep should be True or False by default, but I can see the logic behind the default for .memory_usage should being the same as that for sys.getsizeof, given the similar use cases.

Or why are the use cases different? Because a user calling from the system wants accuracy vs. speed but a user calling from pandas wants speed vs. accuracy?

using __sizeof__ should have deep=True which gives the 'best' report of memory used. This is not the default for .memory_usage() because its expensive to compute (potentially). see #11595 where it can be somewhat slow (though the cython impl helps).

e.g. if someone is calling sys.getsizeof(df) then I think its appropriate to give the most accurate (if maybe somewhat non-performant) answer.

OK, if it's a material performance drag and a different enough set of use cases, sobeit

max-sixty · 2016-01-02T00:09:01Z

@jreback updated & green

jreback · 2016-01-02T03:08:26Z

pandas/core/base.py

+            # we could check the kwarg with inspect module, but different methods for Py versions
+            try:
+                mem = self.memory_usage(deep=True)
+            except TypeError:


nothing should raise a TypeError, what does? rather than catching this error, need to fix the underlying.

max-sixty · 2016-01-02T05:58:34Z

@jreback updated & green

jreback · 2016-01-02T23:14:11Z

@MaximilianR lgtm.

can you run: git diff master | flake8 --diff and fix those issues. we are going to be shortly enforcing this, so might as well not add to it :)

ping when green.

max-sixty · 2016-01-03T00:25:54Z

Done (phew...)

jreback · 2016-01-03T00:34:26Z

pandas/tests/test_categorical.py

 import pandas.util.testing as tm
+from pandas import Categorical, Index, Series, DataFrame, \


ok, though we usually do this with parens

jreback · 2016-01-03T00:35:16Z

lgtm. ping when green.

jreback · 2016-01-03T00:44:51Z

pandas/tests/test_frame.py

+
+        # sys.getsizeof will call the .memory_usage with
+        # deep=True, and add on some GC overhead
+        diff = df.memory_usage().sum() - sys.getsizeof(df)


how did this pass before?

this will never be true. as deep=True is >> non-deep with object.

Given that df isn't an object dtype, I don't think this was affected the results - I will fix regardless

…orrect value

jreback · 2016-01-03T16:53:53Z

merged via 60cacab

thanks!

jreback reviewed Dec 19, 2015
View reviewed changes

jreback added this to the 0.18.0 milestone Dec 19, 2015

jreback added the Compat pandas objects compatability with Numpy or Python functions label Dec 19, 2015

jreback reviewed Dec 19, 2015
View reviewed changes

max-sixty force-pushed the get-size-of branch 3 times, most recently from c462b19 to c99cda7 Compare December 21, 2015 18:45

jreback reviewed Dec 21, 2015
View reviewed changes

max-sixty force-pushed the get-size-of branch 2 times, most recently from 51de6c4 to 36d75d8 Compare December 21, 2015 19:59

jreback reviewed Dec 23, 2015
View reviewed changes

max-sixty force-pushed the get-size-of branch 6 times, most recently from 3c02600 to 6045ae7 Compare January 2, 2016 00:08

jreback reviewed Jan 2, 2016
View reviewed changes

max-sixty force-pushed the get-size-of branch from 6045ae7 to 21d6802 Compare January 2, 2016 05:06

jreback reviewed Jan 3, 2016
View reviewed changes

index is included in memory usage by default, sys.getsizeof returns c…

12a437f

…orrect value

max-sixty force-pushed the get-size-of branch from e5f0f22 to 12a437f Compare January 3, 2016 01:22

jreback closed this Jan 3, 2016

jreback mentioned this pull request Jan 3, 2016

Change .memory_usage() index default to True #11597

Closed

max-sixty deleted the get-size-of branch January 30, 2016 23:50

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

index is included in memory usage by default #11867

index is included in memory usage by default #11867

max-sixty commented Dec 19, 2015

jreback Dec 19, 2015

jreback Dec 19, 2015

max-sixty Dec 21, 2015

max-sixty commented Dec 21, 2015

jreback Dec 21, 2015

jreback Dec 21, 2015

max-sixty Dec 21, 2015

max-sixty commented Dec 21, 2015

jreback Dec 23, 2015

max-sixty Dec 29, 2015

jreback Dec 29, 2015

jreback Dec 29, 2015

max-sixty Dec 29, 2015

max-sixty commented Jan 2, 2016

jreback Jan 2, 2016

max-sixty commented Jan 2, 2016

jreback commented Jan 2, 2016

max-sixty commented Jan 3, 2016

jreback Jan 3, 2016

jreback commented Jan 3, 2016

jreback Jan 3, 2016

max-sixty Jan 3, 2016

jreback commented Jan 3, 2016

		import pandas.util.testing as tm
		from pandas import Categorical, Index, Series, DataFrame, \

index is included in memory usage by default #11867

index is included in memory usage by default #11867

Conversation

max-sixty commented Dec 19, 2015

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

max-sixty commented Dec 21, 2015

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

max-sixty commented Dec 21, 2015

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

max-sixty commented Jan 2, 2016

Choose a reason for hiding this comment

max-sixty commented Jan 2, 2016

jreback commented Jan 2, 2016

max-sixty commented Jan 3, 2016

Choose a reason for hiding this comment

jreback commented Jan 3, 2016

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jreback commented Jan 3, 2016