New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ENH: Make corrwith ignore string columns when finding correlation with a Series #18570

Closed
tdpetrou opened this Issue Nov 30, 2017 · 2 comments

Comments

Projects
None yet
3 participants
@tdpetrou
Contributor

tdpetrou commented Nov 30, 2017

Code Sample, a copy-pastable example if possible

>>> df = pd.DataFrame({'a':np.random.rand(5), 
                   'b':np.random.rand(5),
                  'string_col':'some string'})
>>> df

          a         b   string_col
0  0.376004  0.761471  some string
1  0.402352  0.865937  some string
2  0.450365  0.715527  some string
3  0.445317  0.017645  some string
4  0.687363  0.903298  some string

>>> s = pd.Series(np.random.rand(100))

>>> df.corrwith(s)
TypeError: ("unsupported operand type(s) for /: 'str' and 'int'", 'occurred at index string_col')

Problem description

Pandas should silently drop the string columns. For now, you must do this:

Expected Output

>>> df.select_dtypes('number').corrwith(s)
a    0.161006
b   -0.000233
dtype: float64

@gfyoung gfyoung added the API Design label Dec 1, 2017

@gfyoung

This comment has been minimized.

Member

gfyoung commented Dec 1, 2017

@tdpetrou : Thanks for reporting this! While your workaround isn't that hard to do, I don't see why we shouldn't just drop columns that have non-numeric dtypes. We do that for DataFrame.describe.

@jreback @jorisvandenbossche : Thoughts?

@jreback

This comment has been minimized.

Contributor

jreback commented Dec 2, 2017

sure this could be done, .corrwith is a numeric only operation. Would take a community pull request.

@jreback jreback added this to the Next Major Release milestone Dec 2, 2017

@tdpetrou tdpetrou referenced this issue Dec 6, 2017

Merged

selected numeric data before correlation #18651

2 of 4 tasks complete

@jreback jreback modified the milestones: Next Major Release, 0.21.1, 0.22.0 Dec 6, 2017

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment