plot(df) and plot_correlation(df) fail when data has 'list' columns #48

jnwang · 2019-12-21T21:39:26Z

When running plot(df) and plot_correlation(df) on the following dataframe, since the author column is a list, both plot and plot_correlation failed.

For plot(), the reported error is TypeError: unhashable type: 'list'

For plot_correlation(df), the reported error AssertionError: No numerical columns found

The text was updated successfully, but these errors were encountered:

brandonlockhart · 2019-12-23T17:43:22Z

This has been fixed for plot()

Waterpine · 2020-01-03T07:55:04Z

@jnwang I think the reason is that plot_correlation() function only supports for numerical data. It is impossible for us to calculate a correlation value between list or categorical data. Thus, I believe that we could solve this problem in the next version. Thanks!

jnwang · 2020-01-03T08:12:55Z

Can you please check how df.corr() handles this? It would be better if we can make plot_correlation() consistent with df.corr()?

Waterpine · 2020-01-03T08:35:30Z

@jnwang @dovahcrow df.corr() also does not handles this. There is nothing output when we use the df.corr() function.

jnwang · 2020-01-03T08:56:09Z

It looks like that df.corr() automatically ignores non-numerical columns. Can we do the same rather than raise an error?

https://www.geeksforgeeks.org/python-pandas-dataframe-corr/amp/

Waterpine · 2020-01-04T11:36:31Z

@jnwang @dovahcrow @jinglinpeng I have fixed this bug. The plot_correlation() function only supports for numerical data. Plot_correlation() will ignore non-numerical data and output nothing when the input data is not numerical. If you have any questions, please let me know as soon as possible. Thanks! #57

dovahcrow · 2020-01-06T00:36:54Z

@jnwang why empty output is good? I feel like a hard error is more explicit than an implicit empty return, and explicit is better than implicit.

jnwang · 2020-01-06T00:42:53Z

We should output an empty plot if all cols are non-numerical. I think we did it for plot_correlation(k).

Waterpine · 2020-01-06T02:21:33Z

@jnwang @dovahcrow I agree with young. However, an empty plot is also a good option.

dovahcrow · 2020-01-06T02:22:43Z

Let do empty and can make it error later if users are complaining.

Waterpine · 2020-01-07T02:12:03Z

Okay, I have fixed this bug. Thanks! #57

jnwang added the type: bug Something isn't working label Dec 21, 2019

jnwang assigned dovahcrow, jinglinpeng, brandonlockhart and Waterpine Dec 21, 2019

Waterpine mentioned this issue Jan 14, 2020

fix(eda.correlation): it works for the columns with missing values #57

Merged

10 tasks

jinglinpeng closed this as completed Jan 30, 2020

dovahcrow added the module: EDA label Jun 15, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

plot(df) and plot_correlation(df) fail when data has 'list' columns #48

plot(df) and plot_correlation(df) fail when data has 'list' columns #48

jnwang commented Dec 21, 2019

brandonlockhart commented Dec 23, 2019

Waterpine commented Jan 3, 2020

jnwang commented Jan 3, 2020

Waterpine commented Jan 3, 2020

jnwang commented Jan 3, 2020

Waterpine commented Jan 4, 2020

dovahcrow commented Jan 6, 2020

jnwang commented Jan 6, 2020

Waterpine commented Jan 6, 2020

dovahcrow commented Jan 6, 2020

Waterpine commented Jan 7, 2020

plot(df) and plot_correlation(df) fail when data has 'list' columns #48

plot(df) and plot_correlation(df) fail when data has 'list' columns #48

Comments

jnwang commented Dec 21, 2019

brandonlockhart commented Dec 23, 2019

Waterpine commented Jan 3, 2020

jnwang commented Jan 3, 2020

Waterpine commented Jan 3, 2020

jnwang commented Jan 3, 2020

Waterpine commented Jan 4, 2020

dovahcrow commented Jan 6, 2020

jnwang commented Jan 6, 2020

Waterpine commented Jan 6, 2020

dovahcrow commented Jan 6, 2020

Waterpine commented Jan 7, 2020