BUG in remove_unused_categories with NaNs in values #11599

Closed
jorisvandenbossche opened this Issue Nov 13, 2015 · 0 comments

Comments

Projects
None yet
2 participants

From SO: http://stackoverflow.com/questions/33693601/removing-unused-categories-in-series-results-in-duplicated-categories

removed_unused_categories gives a wrong result when there is a NaN in the values:

>>> s = pd.Series(["A", "B", pd.np.nan]).astype("category")
>>> s.cat.remove_unused_categories()
0      A
1      B
2    NaN
dtype: category
Categories (3, object): [B, A, B]

I think the -1 in the codes (NaN) duplicates the last item in the categories

jorisvandenbossche added this to the 0.17.1 milestone Nov 13, 2015

@jreback jreback modified the milestone: Next Major Release, 0.17.1 Nov 13, 2015

@jreback jreback modified the milestone: 0.17.1, Next Major Release Nov 18, 2015

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment