Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Categoricals with NaNs #3678

Closed
jseabold opened this issue May 21, 2013 · 8 comments · Fixed by #8007
Closed

Categoricals with NaNs #3678

jseabold opened this issue May 21, 2013 · 8 comments · Fixed by #8007
Labels
API Design Bug Categorical Categorical Data Type Missing-data np.nan, pd.NaT, pd.NA, dropna, isnull, interpolate
Milestone

Comments

@jseabold
Copy link
Contributor

Not sure how to handle this yet. It looks like NaNs do not become a level. Should they? Maybe so. Also describe fails if NaNs are present.

pandas.Categorical([np.nan, np.nan, 1, 1, 1, 2, 3, 4, 5, 5, 4, 3, 3]).describe()
@cpcloud
Copy link
Member

cpcloud commented May 21, 2013

their labels become -1...maybe it's being used as a sentinel here. i vote for nan levels, but i don't really use this much

@jreback
Copy link
Contributor

jreback commented May 21, 2013

yes
the -1 means it was not found so u could then do something with it/ignore whatever

@jankatins
Copy link
Contributor

right now, you can use np.nan as a level: pd.Categorical(values, levels=[1,2,3,np.nan]). Per default, NaN is not used as a level but as NaN (-1). AFAIK there is no way to "rename" NaN other than first addin np.nan to the levels and then doing a reassign (cat[isnull(cat)] = np.nan).

@jseabold is that enough? Or do you need an explicit "use NaN as a level" keyword?

@jreback jreback added the Bug label Jul 16, 2014
@jreback
Copy link
Contributor

jreback commented Jul 16, 2014

@JanSchulz I think this is a bug actually. When you construct this categorical, the levels are float64; like np.nan is being intermixed somehow. When it should be specially treated. What should this looks like? (I think the describe part is a sympton not the problem).

@jreback jreback modified the milestones: 0.15.0, 0.15.1 Jul 16, 2014
@jankatins
Copy link
Contributor

I have a fix in #7768, but this doesn't work yet:

cat = pd.Categorical([1,2,3, np.nan], levels=[1,2,3])
cat.levels = [1,2,3,np.nan]
cat[pd.isnull(cat)] = np.nan
Traceback (most recent call last):
  File "C:\portabel\miniconda\envs\pandas_dev\lib\site-packages\IPython\core\interactiveshell.py", line 2883, in run_code
    exec(code_obj, self.user_global_ns, self.user_ns)
  File "<ipython-input-33-e0614b20be67>", line 1, in <module>
    cat[pd.isnull(cat)] = np.nan
  File "c:\data\external\pandas\pandas\core\categorical.py", line 771, in __setitem__
    # require identical level set
ValueError: cannot setitem on a Categorical with a new level, set the levels first

@jankatins
Copy link
Contributor

The describe call also still fails... I investigate...

@jankatins jankatins mentioned this issue Jul 16, 2014
5 tasks
@jankatins
Copy link
Contributor

describe is fixed as well: jankatins@1334684

@jankatins
Copy link
Contributor

all three issues should be done in #7768 . If one wants to have np.nan in levels, one would have to specify the levels: Categorical(values, levels=[1,2,3,np.nan]).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
API Design Bug Categorical Categorical Data Type Missing-data np.nan, pd.NaT, pd.NA, dropna, isnull, interpolate
Projects
None yet
4 participants