Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

minimal DataFrame.apply call causes RecursionError #25196

Closed
moonshoes87 opened this issue Feb 6, 2019 · 2 comments

Comments

@moonshoes87
Copy link

commented Feb 6, 2019

Code Sample, a copy-pastable example if possible

import numpy as np
import pandas as pd
df = pd.DataFrame(np.zeros((100, 15)))
df.T.apply(dict)
---------------------------------------------------------------------------
RecursionError                            Traceback (most recent call last)
<ipython-input-153-4987478fba35> in <module>()
      3 #df = pd.DataFrame(np.zeros((2454, 15)))
      4 df = pd.DataFrame(np.zeros((100, 15)))
----> 5 df.T.apply(dict)

~/anaconda3/envs/pmagpy/lib/python3.6/site-packages/pandas/core/frame.py in apply(self, func, axis, broadcast, raw, reduce, result_type, args, **kwds)
   6485                          args=args,
   6486                          kwds=kwds)
-> 6487         return op.get_result()
   6488 
   6489     def applymap(self, func):

~/anaconda3/envs/pmagpy/lib/python3.6/site-packages/pandas/core/apply.py in get_result(self)
    113         if is_list_like(self.f) or is_dict_like(self.f):
    114             return self.obj.aggregate(self.f, axis=self.axis,
--> 115                                       *self.args, **self.kwds)
    116 
    117         # all empty

~/anaconda3/envs/pmagpy/lib/python3.6/site-packages/pandas/core/frame.py in aggregate(self, func, axis, *args, **kwargs)
   6286             pass
   6287         if result is None:
-> 6288             return self.apply(func, axis=axis, args=args, **kwargs)
   6289         return result
   6290 

... last 3 frames repeated, from the frame below ...

~/anaconda3/envs/pmagpy/lib/python3.6/site-packages/pandas/core/frame.py in apply(self, func, axis, broadcast, raw, reduce, result_type, args, **kwds)
   6485                          args=args,
   6486                          kwds=kwds)
-> 6487         return op.get_result()
   6488 
   6489     def applymap(self, func):

RecursionError: maximum recursion depth exceeded

Problem description

This code snippet did not cause a RecursionError in pandas 0.23.4 and earlier. It occurs with pandas 0.24.0 and 0.24.1.

This is possibly similar #23568.

Expected Output

Pandas Series:

0     {0: 0.0, 1: 0.0, 2: 0.0, 3: 0.0, 4: 0.0, 5: 0....
1     {0: 0.0, 1: 0.0, 2: 0.0, 3: 0.0, 4: 0.0, 5: 0....
2     {0: 0.0, 1: 0.0, 2: 0.0, 3: 0.0, 4: 0.0, 5: 0....
...

Output of pd.show_versions()

INSTALLED VERSIONS

commit: None
python: 3.6.6.final.0
python-bits: 64
OS: Darwin
OS-release: 17.7.0
machine: x86_64
processor: i386
byteorder: little
LC_ALL: None
LANG: en_US.UTF-8
LOCALE: en_US.UTF-8

pandas: 0.24.1
pytest: 3.8.0
pip: 10.0.1
setuptools: 40.2.0
Cython: 0.28.5
numpy: 1.15.2
scipy: 1.1.0
pyarrow: None
xarray: None
IPython: 6.5.0
sphinx: 1.7.9
patsy: None
dateutil: 2.7.3
pytz: 2018.5
blosc: None
bottleneck: 1.2.1
tables: 3.4.4
numexpr: 2.6.8
feather: None
matplotlib: 2.2.3
openpyxl: 2.5.6
xlrd: 1.1.0
xlwt: 1.2.0
xlsxwriter: 1.1.0
lxml.etree: 4.2.5
bs4: 4.6.3
html5lib: 1.0.1
sqlalchemy: 1.2.11
pymysql: None
psycopg2: None
jinja2: 2.10
s3fs: None
fastparquet: None
pandas_gbq: None
pandas_datareader: None
gcsfs: None

@jschendel

This comment has been minimized.

Copy link
Member

commented Feb 7, 2019

Thanks, I can confirm that this is broken on master:

In [1]: import pandas as pd; pd.__version__
Out[1]: '0.25.0.dev0+77.g51fca4cc9'

In [2]: df = pd.DataFrame([[0, 0], [0, 0]])

In [3]: df.apply(dict)
---------------------------------------------------------------------------
RecursionError: maximum recursion depth exceeded while calling a Python object

And it worked on 0.23.4:

In [1]: import pandas as pd; pd.__version__
Out[1]: '0.23.4'

In [2]: df = pd.DataFrame([[0, 0], [0, 0]])

In [3]: df.apply(dict)
Out[3]:
0    {0: 0, 1: 0}
1    {0: 0, 1: 0}
dtype: object
@jschendel

This comment has been minimized.

Copy link
Member

commented Feb 7, 2019

Issue appears to be caused by this line:

if is_list_like(self.f) or is_dict_like(self.f):

Specifically is_dict_like considers dict to be dict-like, but I think the intention is that it should only look for initialized dict-like structures, and not the constructors themselves. Note that this is inconsistent with is_list_like, which does not consider list to be list-like:

In [1]: from pandas.core.dtypes.common import is_dict_like, is_list_like

In [2]: is_dict_like(dict)
Out[2]: True

In [3]: is_list_like(list)
Out[3]: False

So the fix is to modify is_dict_like to not consider dict (and other constructors) as dict-like. This requires a little more than just excluding dict itself, as other similar things, like the defaultdict constructor, should also not be considered dict-like:

In [4]: from collections import defaultdict

In [5]: is_dict_like(defaultdict)
Out[5]: True
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
3 participants
You can’t perform that action at this time.