Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

index is included in memory usage by default #11867

Closed
wants to merge 1 commit into from

Conversation

max-sixty
Copy link
Contributor

...and sys.getsizeof returns correct value. Closes #11597

Is this the best implementation for a common method across classes?

@@ -544,6 +544,8 @@ def memory_usage(self, deep=False):
v += lib.memory_usage_of_objects(self.values)
return v

__sizeof__ = com._sizeof
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you can put this in PandasObject I think

@jreback jreback added this to the 0.18.0 milestone Dec 19, 2015
@jreback jreback added the Compat pandas objects compatability with Numpy or Python functions label Dec 19, 2015
@@ -7693,6 +7693,10 @@ def test_info_memory_usage(self):
DataFrame(1,index=pd.MultiIndex.from_product([['a'],range(1000)]),columns=['A']).index.nbytes
DataFrame(1,index=pd.MultiIndex.from_product([['a'],range(1000)]),columns=['A']).index.values.nbytes

# sys.getsizeof works the same as memory_usage with defaults, albeit with some
# GC overhead
self.assertAlmostEqual(sys.getsizeof(df), df.memory_usage().sum(), delta=100)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

test this on series/Index/Categorical as well. (for Index put a test in test_index on the Base which will run for all indexes).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added onto the existing tests in IndexOps - lmk if that's not OK

@max-sixty max-sixty force-pushed the get-size-of branch 3 times, most recently from c462b19 to c99cda7 Compare December 21, 2015 18:45
@max-sixty
Copy link
Contributor Author

@jreback Python 2.6 error with delta argument on assertAlmostEqual - should I skip the test on 2.6 or find an alternate way of doing this?

xref #7718

self.assertEqual(o.memory_usage(index=False) + o.index.memory_usage(),
o.memory_usage(index=True))

# sys.getsizeof will call the .memory_usage defaults, and add on some GC overhead
self.assertAlmostEqual(res, sys.getsizeof(o), delta=100)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we don't use this

use tm.assert_almost_equal

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I thought we disabled the test routines that one shouldn't use - in tm.TestCase
but maybe that's an open issue (or if not can you create one)

we are dropping 2.6 soon - but still like for there to be 1 way to do testing

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK - although the functionality is materially worse than the nosetests one (actually kinda peculiar with the bool argument for 3 vs 5):

check_less_precise : bool, default False
Specify comparison precision.
5 digits (False) or 3 digits (True) after decimal points are compared.

@max-sixty max-sixty force-pushed the get-size-of branch 2 times, most recently from 51de6c4 to 36d75d8 Compare December 21, 2015 19:59
@max-sixty
Copy link
Contributor Author

@jreback assertLess isn't supported either on 2.6?
But we're green

@@ -108,6 +108,9 @@ Other enhancements
- A simple version of ``Panel.round()`` is now implemented (:issue:`11763`)
- For Python 3.x, ``round(DataFrame)``, ``round(Series)``, ``round(Panel)`` will work (:issue:`11763`)
- ``Dataframe`` has gained a ``_repr_latex_`` method in order to allow for automatic conversion to latex in a ipython/jupyter notebook using nbconvert. Options ``display.latex.escape`` and ``display.latex.longtable`` have been added to the configuration and are used automatically by the ``to_latex`` method.(:issue:`11778`)
- sys.getsizeof(obj) returns the memory usage of a pandas object including the (non-object)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

actually I think this should call with deep=True.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't have enough context to know whether deep should be True or False by default, but I can see the logic behind the default for .memory_usage should being the same as that for sys.getsizeof, given the similar use cases.

Or why are the use cases different? Because a user calling from the system wants accuracy vs. speed but a user calling from pandas wants speed vs. accuracy?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

using __sizeof__ should have deep=True which gives the 'best' report of memory used. This is not the default for .memory_usage() because its expensive to compute (potentially). see #11595 where it can be somewhat slow (though the cython impl helps).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

e.g. if someone is calling sys.getsizeof(df) then I think its appropriate to give the most accurate (if maybe somewhat non-performant) answer.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, if it's a material performance drag and a different enough set of use cases, sobeit

@max-sixty max-sixty force-pushed the get-size-of branch 6 times, most recently from 3c02600 to 6045ae7 Compare January 2, 2016 00:08
@max-sixty
Copy link
Contributor Author

@jreback updated & green

# we could check the kwarg with inspect module, but different methods for Py versions
try:
mem = self.memory_usage(deep=True)
except TypeError:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nothing should raise a TypeError, what does? rather than catching this error, need to fix the underlying.

@max-sixty
Copy link
Contributor Author

@jreback updated & green

@jreback
Copy link
Contributor

jreback commented Jan 2, 2016

@MaximilianR lgtm.

can you run: git diff master | flake8 --diff and fix those issues. we are going to be shortly enforcing this, so might as well not add to it :)

ping when green.

@max-sixty
Copy link
Contributor Author

Done (phew...)

import pandas.util.testing as tm
from pandas import Categorical, Index, Series, DataFrame, \
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok, though we usually do this with parens

@jreback
Copy link
Contributor

jreback commented Jan 3, 2016

lgtm. ping when green.


# sys.getsizeof will call the .memory_usage with
# deep=True, and add on some GC overhead
diff = df.memory_usage().sum() - sys.getsizeof(df)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

how did this pass before?

this will never be true. as deep=True is >> non-deep with object.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Given that df isn't an object dtype, I don't think this was affected the results - I will fix regardless

@jreback
Copy link
Contributor

jreback commented Jan 3, 2016

merged via 60cacab

thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Compat pandas objects compatability with Numpy or Python functions
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants