Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

plot(df) and plot_correlation(df) fail when data has 'list' columns #48

Closed
jnwang opened this issue Dec 21, 2019 · 11 comments
Closed

plot(df) and plot_correlation(df) fail when data has 'list' columns #48

jnwang opened this issue Dec 21, 2019 · 11 comments
Assignees
Labels
module: EDA type: bug Something isn't working

Comments

@jnwang
Copy link

jnwang commented Dec 21, 2019

When running plot(df) and plot_correlation(df) on the following dataframe, since the author column is a list, both plot and plot_correlation failed.

For plot(), the reported error is TypeError: unhashable type: 'list'

For plot_correlation(df), the reported error AssertionError: No numerical columns found

Screen Shot 2019-12-21 at 1 36 30 PM

@jnwang jnwang added the type: bug Something isn't working label Dec 21, 2019
@brandonlockhart
Copy link

This has been fixed for plot()
image

@Waterpine
Copy link
Contributor

@jnwang I think the reason is that plot_correlation() function only supports for numerical data. It is impossible for us to calculate a correlation value between list or categorical data. Thus, I believe that we could solve this problem in the next version. Thanks!

@jnwang
Copy link
Author

jnwang commented Jan 3, 2020

Can you please check how df.corr() handles this? It would be better if we can make plot_correlation() consistent with df.corr()?

@Waterpine
Copy link
Contributor

@jnwang @dovahcrow df.corr() also does not handles this. There is nothing output when we use the df.corr() function.
屏幕快照 2020-01-03 下午4 32 52

@jnwang
Copy link
Author

jnwang commented Jan 3, 2020

It looks like that df.corr() automatically ignores non-numerical columns. Can we do the same rather than raise an error?

https://www.geeksforgeeks.org/python-pandas-dataframe-corr/amp/

@Waterpine
Copy link
Contributor

@jnwang @dovahcrow @jinglinpeng I have fixed this bug. The plot_correlation() function only supports for numerical data. Plot_correlation() will ignore non-numerical data and output nothing when the input data is not numerical. If you have any questions, please let me know as soon as possible. Thanks! #57
Screenshot from 2020-01-04 03-32-59

@dovahcrow
Copy link
Member

@jnwang why empty output is good? I feel like a hard error is more explicit than an implicit empty return, and explicit is better than implicit.

@jnwang
Copy link
Author

jnwang commented Jan 6, 2020

We should output an empty plot if all cols are non-numerical. I think we did it for plot_correlation(k).

@Waterpine
Copy link
Contributor

@jnwang @dovahcrow I agree with young. However, an empty plot is also a good option.

@dovahcrow
Copy link
Member

Let do empty and can make it error later if users are complaining.

@Waterpine
Copy link
Contributor

Okay, I have fixed this bug. Thanks! #57

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
module: EDA type: bug Something isn't working
Projects
None yet
Development

No branches or pull requests

5 participants