Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

string methods that return boolean arrays should return bool if 'nan' is in series (instead of 'nan') #1689

Closed
bmu opened this issue Jul 27, 2012 · 2 comments
Milestone

Comments

@bmu
Copy link

bmu commented Jul 27, 2012

In [32]:  s = pd.Series(['A', 'B', 'C', 'Aaba', 'Baca', np.nan, 'CABA', 'dog', 'cat'])

In [33]: s.str.startswith('A')
Out[33]: 
0     True
1    False
2    False
3     True
4    False
5      NaN
6    False
7    False
8    False

In [34]: s[s.str.startswith('A')]
---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-34-f0d46b8c76ff> in <module>()
----> 1 s[s.str.startswith('A')]

/net/home4/bmueller/.virtualenvs/myenv/lib/python2.6/site-packages/pandas/core/series.pyc in __getitem__(self, key)
    447         # special handling of boolean data with NAs stored in object
    448         # arrays. Since we can't represent NA with dtype=bool
--> 449         if _is_bool_indexer(key):
    450             key = self._check_bool_indexer(key)
    451             key = np.asarray(key, dtype=bool)

/net/home4/bmueller/.virtualenvs/myenv/lib/python2.6/site-packages/pandas/core/common.pyc in _is_bool_indexer(key)
    499         if not lib.is_bool_array(key):
    500             if isnull(key).any():
--> 501                 raise ValueError('cannot index with vector containing '
    502                                  'NA / NaN values')                                                                                                                                                                                  
    503             return False

ValueError: cannot index with vector containing NA / NaN values
@wesm
Copy link
Member

wesm commented Jul 27, 2012

I think here it may be better to be explicit. e.g.

s[s.str.startswith('A').fillna(False)]

or maybe add an option to startswith:

s[s.str.startswith('A', na=False)]

What do you think?

@bmu
Copy link
Author

bmu commented Jul 27, 2012

I wasn't aware of fillna(False) for this case. So I think this is a good solution. Maybe it could be included as an example in the docs and everything is fine.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants