Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

df.convert_dtypes() raising new error in Pandas 1.2.0 #247

Closed
westernguy2 opened this issue Jan 27, 2021 · 5 comments · Fixed by #250
Closed

df.convert_dtypes() raising new error in Pandas 1.2.0 #247

westernguy2 opened this issue Jan 27, 2021 · 5 comments · Fixed by #250
Assignees
Labels
bug Something isn't working test Adding or Modifying test classes

Comments

@westernguy2
Copy link
Contributor

After updating to Pandas 1.2.0, the below test in test_pandas is now failing

df = pytest.college_df
cdf = df.convert_dtypes()
cdf._repr_html_()
assert list(cdf.recommendation.keys()) == ["Correlation", "Distribution", "Occurrence"]

It raises TypeError: data type not understood when the cdf is displayed with _repr_html_.

This test works in Pandas 1.1 but fails in Pandas 1.2 .

In the documentation for convert_dtypes, we see:

Changed in version 1.2: Starting with pandas 1.2, this method also converts float columns to the nullable floating extension type.

This new change could be potentially causing the bug.

@dorisjlee dorisjlee added bug Something isn't working test Adding or Modifying test classes labels Jan 28, 2021
@westernguy2 westernguy2 self-assigned this Jan 29, 2021
@westernguy2
Copy link
Contributor Author

In Pandas 1.2.0, a new data type Float64Dtype was added. It was meant to be a dtype for a column of floats, but with some null values, represented by pd.NA. More information about this is in the documentation here.

Currently, when we run convert_dtypes in the above college DataFrame, we are converting some of the columns to Float64Dtype. Then, when Altair runs sanitize_dataframe, it calls np.dtype on this new data type. Unfortunately, this is what causes the data type not understood error.

Currently, Altair has an open issue for a similar issue with a Period[M] type for sanitize_dataframe.

We can either sanitize the data ourselves before we send it to Altair, or we'll have to wait for a patch from likely Altair.

@dorisjlee
Copy link
Member

Thanks for looking into this @westernguy2!

Let's sanitize the data ourselves before sending it to Altair by converting the Flaot64Dtype to something that Altair can understand. We can do this inside AltairChart:sanitize_dataframe similar to how we handle column names that can not be visualized by Altair.

Could we also check that this Float64Dtype problem does not cause an issue for matplotlib?

@dorisjlee
Copy link
Member

dorisjlee commented Feb 20, 2021

The fix inside sanitize_dataframe is applied to any float type and is not specific to the unsupported Float64. I modified the condition to be only specific to Float64, since, inside the employee demo, the newly created %WorkingYearsAtCompany causes the warning when the column is set with the astype(float) copy.

df = pd.read_csv("https://raw.githubusercontent.com/lux-org/lux-datasets/master/data/employee.csv")
df["%WorkingYearsAtCompany"]=df["YearsAtCompany"]/df["TotalWorkingYears"]

import traceback
import warnings
import sys

def warn_with_traceback(message, category, filename, lineno, file=None, line=None):

    log = file if hasattr(file,'write') else sys.stderr
    traceback.print_stack(file=log)
    log.write(warnings.formatwarning(message, category, filename, lineno, line))

warnings.showwarning = warn_with_traceback

from lux.vis.Vis import Vis
Vis(["%WorkingYearsAtCompany","Age"],df)

image

@dorisjlee
Copy link
Member

dorisjlee commented Mar 7, 2021

The Float64 and float64 still seems to be causing issues. In particular, this is a problem inside pd.cut when nanmin is called. Here is a test dataset that reproduces this error: test2.csv.zip

image

import pandas.core.nanops as nanops
x = df["col_13"]
rng = (nanops.nanmin(x), nanops.nanmax(x))
nanops.nanmin(x)

@dorisjlee dorisjlee reopened this Mar 7, 2021
@dorisjlee
Copy link
Member

This issue is resolved now, thanks @westernguy2!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working test Adding or Modifying test classes
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants