Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

KeyError when using str.cat and index was changed #7857

Closed
toobaz opened this issue Jul 28, 2014 · 5 comments

Comments

Projects
None yet
2 participants
@toobaz
Copy link
Member

commented Jul 28, 2014

df = DataFrame(index=MultiIndex.from_product([[2011, 2012], [1,2,3]],
                                             names=['year', 'month']))

df = df.reset_index()

str_year = df.year.astype('str')
str_month = df.month.astype('str')
str_both = str_year.str.cat(str_month, sep=' ')

... so far, everything is fine. Now filter the index and retry:

df = df[df.month > 1]

str_year = df.year.astype('str')
str_month = df.month.astype('str')
str_both = str_year.str.cat(str_month, sep=' ')

... you will get a KeyError (tested against git, commit 90fa87e ):

KeyError                                  Traceback (most recent call last)
<ipython-input-12-9d3f1fbb70fc> in <module>()
     11 str_year = df.year.astype('str')
     12 str_month = df.month.astype('str')
---> 13 str_both = str_year.str.cat(str_month, sep=' ')

/home/pietro/nobackup/repo/pandas/pandas/core/strings.py in cat(self, others, sep, na_rep)
    933     @copy(str_cat)
    934     def cat(self, others=None, sep=None, na_rep=None):
--> 935         result = str_cat(self.series, others=others, sep=sep, na_rep=na_rep)
    936         return self._wrap_result(result)
    937 

/home/pietro/nobackup/repo/pandas/pandas/core/strings.py in str_cat(arr, others, sep, na_rep)
     41 
     42     if others is not None:
---> 43         arrays = _get_array_list(arr, others)
     44 
     45         n = _length_check(arrays)

/home/pietro/nobackup/repo/pandas/pandas/core/strings.py in _get_array_list(arr, others)
     13 
     14 def _get_array_list(arr, others):
---> 15     if len(others) and isinstance(others[0], (list, np.ndarray)):
     16         arrays = [arr] + list(others)
     17     else:

/home/pietro/nobackup/repo/pandas/pandas/core/series.py in __getitem__(self, key)
    491     def __getitem__(self, key):
    492         try:
--> 493             result = self.index.get_value(self, key)
    494 
    495             if not np.isscalar(result):

/home/pietro/nobackup/repo/pandas/pandas/core/index.py in get_value(self, series, key)
   1194 
   1195         try:
-> 1196             return self._engine.get_value(s, k)
   1197         except KeyError as e1:
   1198             if len(self) > 0 and self.inferred_type in ['integer','boolean']:

/home/pietro/nobackup/repo/pandas/pandas/index.so in pandas.index.IndexEngine.get_value (pandas/index.c:2991)()

/home/pietro/nobackup/repo/pandas/pandas/index.so in pandas.index.IndexEngine.get_value (pandas/index.c:2806)()

/home/pietro/nobackup/repo/pandas/pandas/index.so in pandas.index.IndexEngine.get_loc (pandas/index.c:3532)()

/home/pietro/nobackup/repo/pandas/pandas/hashtable.so in pandas.hashtable.Int64HashTable.get_item (pandas/hashtable.c:7033)()

/home/pietro/nobackup/repo/pandas/pandas/hashtable.so in pandas.hashtable.Int64HashTable.get_item (pandas/hashtable.c:6974)()

KeyError: 0

@jreback jreback added Bug labels Jul 28, 2014

@jreback jreback added this to the 0.15.0 milestone Jul 28, 2014

@jreback

This comment has been minimized.

Copy link
Contributor

commented Jul 28, 2014

this looks like a bug in the impl of core/strings/_get_array_list, which is checking the 0th element if its a list/ndarray. prob should be others.values[0]

care to do a PR?

@toobaz

This comment has been minimized.

Copy link
Member Author

commented Jul 28, 2014

Indeed, this fixes the issue (although I must admit I didn't exert much effort in understanding the code apart from that line).

@jreback

This comment has been minimized.

Copy link
Contributor

commented Jul 28, 2014

hah....the problem is that since other is a series, others[0] selects the 0th element (which it doesn't have, hence the KeyError). It is trying to figure out whether the 0th element is actually a list or ndarray (rather than a scalar).

actually maybe this should be

_values_from_object(others)[0] because then others could be a ndarray OR a Series (I don't know if that's possible, but guards against it).

@toobaz

This comment has been minimized.

Copy link
Member Author

commented Jul 28, 2014

I am a bit lost (i understood the diagnosis, not the cure)... but will take a look again in the next days.

@toobaz

This comment has been minimized.

Copy link
Member Author

commented Aug 2, 2014

This seems to work in all cases I can conceive...

@jreback jreback closed this in #7902 Aug 2, 2014

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.