Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

unexpected behavior of pd.core.algorithms._ensure_data() #22160

Closed
realead opened this issue Aug 1, 2018 · 2 comments · Fixed by #22161
Closed

unexpected behavior of pd.core.algorithms._ensure_data() #22160

realead opened this issue Aug 1, 2018 · 2 comments · Fixed by #22161
Labels
Algos Non-arithmetic algos: value_counts, factorize, sorting, isin, clip, shift, diff Bug Missing-data np.nan, pd.NaT, pd.NA, dropna, isnull, interpolate
Milestone

Comments

@realead
Copy link
Contributor

realead commented Aug 1, 2018

Examples

Example 1:

import pandas.core.algorithms as algos
comps = ['ss', 42]
values = ['42']
algos.isin(comps, values)

results in array([False, True], dtype=bool)

Example 2:

import pandas.core.algorithms as algos
comps = ['ss', np.nan]
values = [np.nan]
algos.isin(comps, values)

results in array([False, False], dtype=bool)

Expected Output

The results should be [False, False] for example 1 and [False, True] for example 2.

Problem description

The problem can be tracked down to this line: https://github.com/pandas-dev/pandas/blob/master/pandas/core/algorithms.py#L137

    # we have failed, return object
    values = np.asarray(values)
   return ensure_object(values), 'object', 'object'

np.asarray(values) tries to do smart things (like converting 42 to a string), it should be probably avoided by passing dtype=np.object to it.

Output of pd.show_versions()

pandas 0.22.0

@realead
Copy link
Contributor Author

realead commented Aug 1, 2018

This is an aspect of #22119 and probably related to #22148.

@gfyoung gfyoung added Bug Missing-data np.nan, pd.NaT, pd.NA, dropna, isnull, interpolate Algos Non-arithmetic algos: value_counts, factorize, sorting, isin, clip, shift, diff labels Aug 3, 2018
@gfyoung
Copy link
Member

gfyoung commented Aug 3, 2018

@realead : Agreed. Feel free to investigate the (unrelated) portion of this bug / issue.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Algos Non-arithmetic algos: value_counts, factorize, sorting, isin, clip, shift, diff Bug Missing-data np.nan, pd.NaT, pd.NA, dropna, isnull, interpolate
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants