Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Series.map(m: Mapping) can fail with "object is not callable" #29733

Closed
ivirshup opened this issue Nov 20, 2019 · 4 comments · Fixed by #29788
Closed

Series.map(m: Mapping) can fail with "object is not callable" #29733

ivirshup opened this issue Nov 20, 2019 · 4 comments · Fixed by #29788
Labels
Algos Non-arithmetic algos: value_counts, factorize, sorting, isin, clip, shift, diff Enhancement Series Series data structure

Comments

@ivirshup
Copy link
Contributor

ivirshup commented Nov 20, 2019

Code Sample, a copy-pastable example if possible

import pandas as pd
from collections import UserDict, Counter

class MissingDict(UserDict):
    def __missing__(self, key):
        return key

s = pd.Series(list("aab"))
s.map(Counter({"a": 1}))
# 0    1
# 1    1
# 2    0
# dtype: int64

s.map(MissingDict({"a": "c"}))
# <ipython-input-48-6cd4b0d364c9> in <module>
# ----> 1 s.map(MissingDict({"a": "c"}))

# /usr/local/lib/python3.7/site-packages/pandas/core/series.py in map(self, arg, na_action)
#    3826         dtype: object
#    3827         """
# -> 3828         new_values = super()._map_values(arg, na_action=na_action)
#    3829         return self._constructor(new_values, index=self.index).__finalize__(self)
#    3830 

# /usr/local/lib/python3.7/site-packages/pandas/core/base.py in _map_values(self, mapper, na_action)
#    1298 
#    1299         # mapper is a function
# -> 1300         new_values = map_f(values, mapper)
#    1301 
#    1302         return new_values

# pandas/_libs/lib.pyx in pandas._libs.lib.map_infer()

# TypeError: 'MissingDict' object is not callable

Problem description

Series.map can sometimes take Mapping subclasses other than dict, but not always. I think it would make sense if any object that satisfied isinstance(x, collections.abc.Mapping) could be used as a mapping. I came across this while trying to use map where I only wanted values present as keys in the mapping to change (I went for s.map(lambda x: {"a": "b"}.get(x, x)) as a workaround).

collections.Counter and collections.defaultdict work fine, which made me expect other Mappings would as well.

Expected Output

I expected s.map(MissingDict({"a": "c"})) to do something like:

s.map(lambda x: MissingDict({"a": "c"})[x])
# 0    c
# 1    c
# 2    b
# dtype: object

Output of pd.show_versions()

INSTALLED VERSIONS
------------------
commit           : None
python           : 3.7.4.final.0
python-bits      : 64
OS               : Darwin
OS-release       : 18.7.0
machine          : x86_64
processor        : i386
byteorder        : little
LC_ALL           : None
LANG             : en_US.UTF-8
LOCALE           : en_US.UTF-8

pandas           : 0.25.3
numpy            : 1.17.4
pytz             : 2018.9
dateutil         : 2.7.5
pip              : 19.3.1
setuptools       : 41.6.0
Cython           : 0.29.14
pytest           : 5.2.2
hypothesis       : 4.44.1
sphinx           : 2.2.1
blosc            : None
feather          : None
xlsxwriter       : None
lxml.etree       : 4.4.1
html5lib         : 1.0.1
pymysql          : None
psycopg2         : None
jinja2           : 2.10.1
IPython          : 7.9.0
pandas_datareader: None
bs4              : 4.8.1
bottleneck       : None
fastparquet      : 0.3.2
gcsfs            : None
lxml.etree       : 4.4.1
matplotlib       : 3.0.3
numexpr          : 2.7.0
odfpy            : None
openpyxl         : 2.6.2
pandas_gbq       : None
pyarrow          : 0.15.1
pytables         : None
s3fs             : None
scipy            : 1.3.2
sqlalchemy       : None
tables           : 3.6.1
xarray           : 0.14.0
xlrd             : 1.2.0
xlwt             : None
xlsxwriter       : None
@ohad83
Copy link
Contributor

ohad83 commented Nov 22, 2019

Hey @ivirshup,

UserDict is pretty much deprecated by the ability to subclass dict since python 2.2.
The reason Counter and defaultdict work fine is exactly for this reason - they are dict subclasses.

Funnily enough, while UserDict is a subclass of collections.abc.Mapping, it is not enough for your example. UserDict's implementation contains the special case of __missing__ which isn't a part of collections.abc.Mapping.

Anyway, I see no harm in checking whether the object satisfies isinstance(x, collections.abc.Mapping) instead of isinstance(x, dict), and it might come handy for anyone wanting to create a mapping object which isn't a dict subclass (for whatever reason). However, as I noted, it won't work as expected for your example since unless we have a specific case for UserDict, which I don't think we should, the __missing__ property will only work on dict subclasses.

I'll create the pull request and let the maintainers decide whether it should go in.

@gfyoung gfyoung added Algos Non-arithmetic algos: value_counts, factorize, sorting, isin, clip, shift, diff Series Series data structure Enhancement labels Nov 22, 2019
@gfyoung
Copy link
Member

gfyoung commented Nov 22, 2019

Can someone explain the benefit of allowing UserDict directly when you can just pass in UserDict.data, which is a dict object?

https://docs.python.org/3.8/library/collections.html#userdict-objects

@ohad83
Copy link
Contributor

ohad83 commented Nov 22, 2019

As I said I don't believe UserDict should have a specific case. On the other hand, collections.abc.Mapping does seem suitable for this case.

@ivirshup
Copy link
Contributor Author

@gfyoung, the use case above was for when I wanted something like: series.map(lambda x: {"a": "b"}.get(x, x)) without the lambda. More broadly, there are many objects that inherit from Mapping or MutableMapping which I'd expected to work. I only used UserDict here since it seemed more concise than the typical subclass of MutableMapping.

@jreback jreback modified the milestones: No action, Contributions Welcome Jan 1, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Algos Non-arithmetic algos: value_counts, factorize, sorting, isin, clip, shift, diff Enhancement Series Series data structure
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants