Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Unsupported Category dtype as_type #79

Closed
dorisjlee opened this issue Aug 27, 2020 · 3 comments
Closed

Unsupported Category dtype as_type #79

dorisjlee opened this issue Aug 27, 2020 · 3 comments
Assignees
Labels
bug Something isn't working help wanted Extra attention is needed

Comments

@dorisjlee
Copy link
Member

There is a bug when using Lux with this example from Datashader.

import pandas as pd
import numpy as np
from collections import OrderedDict as odict

num=10000
np.random.seed(1)

dists = {cat: pd.DataFrame(odict([('x',np.random.normal(x,s,num)), 
                                  ('y',np.random.normal(y,s,num)), 
                                  ('val',val), 
                                  ('cat',cat)]))      
         for x,  y,  s,  val, cat in 
         [(  2,  2, 0.03, 10, "d1"), 
          (  2, -2, 0.10, 20, "d2"), 
          ( -2, -2, 0.50, 30, "d3"), 
          ( -2,  2, 1.00, 40, "d4"), 
          (  0,  0, 3.00, 50, "d5")] }

df = pd.concat(dists,ignore_index=True)
df["cat"]=df["cat"].astype("category") #  If commented, the df.intent=vis line doesn't break

df  #Select and export the scatterplot vis (big circular blob)

vis = df.exported[0]

df.intent = vis 
df # This errors on the key 'cat' which is a categorical

image

We should look into supporting Categorical data types in Pandas. More importantly, we should look into whether bugs show up when we perform astype operations for other data types.

@dorisjlee dorisjlee added bug Something isn't working help wanted Extra attention is needed labels Aug 27, 2020
@westernguy2
Copy link
Contributor

This is another bug where Lux isn't supporting CategoricalDtype. Specifically, both cut and qcut output a column of CategoricalDtype.

import pandas as pd
import numpy as np
import lux

url = 'https://github.com/lux-org/lux-datasets/blob/master/data/cars.csv?raw=true'
df = pd.read_csv(url)
df['Year'] = pd.to_datetime(df['Year'], format='%Y')

df["Weight"] = pd.qcut(df["Weight"], q = 3)
import pandas as pd
import numpy as np
import lux

url = 'https://github.com/lux-org/lux-datasets/blob/master/data/cars.csv?raw=true'
df = pd.read_csv(url)
df['Year'] = pd.to_datetime(df['Year'], format='%Y')

df["Weight"] = pd.cut(df["Weight"], bins = [0, 2500, 7500, 10000], labels = ["small", "medium", "large"])

Both code blocks have the same error:
Screen Shot 2020-09-13 at 12 42 22 AM

@westernguy2
Copy link
Contributor

Another bug for Category dtype is for this example:

import pandas as pd
import numpy as np
import lux

url = 'https://github.com/lux-org/lux-datasets/blob/master/data/cars.csv?raw=true'
df = pd.read_csv(url)
df['Year'] = pd.to_datetime(df['Year'], format='%Y')

new_df = df.drop([0, 1, 2], axis = "rows")
pd.merge(df, new_df, how = "left", indicator = True)

The indicator argument creates a column of dtype category called _merge, resulting in this bug:
Screen Shot 2020-09-21 at 12 22 56 AM

This happens because Lux can't handle category dtype, so it doesn't assign it a data type.

@dorisjlee
Copy link
Member Author

This is resolved with @westernguy2's new PRs!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working help wanted Extra attention is needed
Projects
None yet
Development

No branches or pull requests

2 participants