New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

BUG: method .nunique on categorical series in v0.21 with only NaNs gives ValueError #18051

Closed
topper-123 opened this Issue Oct 31, 2017 · 6 comments

Comments

Projects
None yet
5 participants
@topper-123
Contributor

topper-123 commented Oct 31, 2017

Code Sample, a copy-pastable example if possible

>>> ser = pd.Series(pd.Categorical([np.nan]))
>>> ser.nunique()
ValueError: buffer source array is read-only

Problem description

The above code gave 0 in v20.3 and is expected to give 0 also in v0.21. The problem is independent of if I set some categories.

EDIT: Actually this doesn't give error if I set categories. so this only happens if no categories are set. The use case for no categories in my case is programmatically reading in data, where some columns are empty and of dtype categorical.

Expected Output

0 (zero)

Output of pd.show_versions()

INSTALLED VERSIONS

commit: 8137209
python: 3.5.4.final.0
python-bits: 32
OS: Windows
OS-release: 10
machine: AMD64
processor: Intel64 Family 6 Model 78 Stepping 3, GenuineIntel
byteorder: little
LC_ALL: None
LANG: None
LOCALE: None.None

pandas: 0.21.0
pytest: 3.2.3
pip: 9.0.1
setuptools: 36.5.0.post20170922
Cython: None
numpy: 1.13.3
scipy: None
pyarrow: None
xarray: None
IPython: 6.1.0
sphinx: None
patsy: None
dateutil: 2.6.1
pytz: 2017.3
blosc: None
bottleneck: None
tables: None
numexpr: None
feather: None
matplotlib: None
openpyxl: 2.4.8
xlrd: None
xlwt: None
xlsxwriter: None
lxml: None
bs4: None
html5lib: None
sqlalchemy: None
pymysql: None
psycopg2: None
jinja2: 2.9.6
s3fs: None
fastparquet: None
pandas_gbq: None
pandas_datareader: None

@topper-123 topper-123 changed the title from BUG: method .nunique on categoricals in v0.21 with only NaNs gives ValueError to BUG: method .nunique on categorical series in v0.21 with only NaNs gives ValueError Oct 31, 2017

@jreback

This comment has been minimized.

Show comment
Hide comment
@jreback

jreback Oct 31, 2017

Contributor

this is a variation of #10043, fixed by #10070
and duplicated by #17192, though this is example is simple, so should fix.

PR's welcome!

Contributor

jreback commented Oct 31, 2017

this is a variation of #10043, fixed by #10070
and duplicated by #17192, though this is example is simple, so should fix.

PR's welcome!

@topper-123

This comment has been minimized.

Show comment
Hide comment
@topper-123

topper-123 Oct 31, 2017

Contributor

This individual issue can be fixed with a simple ìf not len(self.cat.categories): return 0 but that feels like bypassing the issue in the .unique method. Is that ok is something more involved required?

If this is something in cython or requires larger refactoring, this will be beyond my ability, I'm sorry.

Contributor

topper-123 commented Oct 31, 2017

This individual issue can be fixed with a simple ìf not len(self.cat.categories): return 0 but that feels like bypassing the issue in the .unique method. Is that ok is something more involved required?

If this is something in cython or requires larger refactoring, this will be beyond my ability, I'm sorry.

@jreback

This comment has been minimized.

Show comment
Hide comment
@jreback

jreback Oct 31, 2017

Contributor

no, this should be fixed in cython.

Contributor

jreback commented Oct 31, 2017

no, this should be fixed in cython.

@topper-123

This comment has been minimized.

Show comment
Hide comment
@topper-123

topper-123 Oct 31, 2017

Contributor

Tthis is a regression from v0.20.3. Maybe this has something to do with the new CategoricalDtype, @TomAugspurger ?

Contributor

topper-123 commented Oct 31, 2017

Tthis is a regression from v0.20.3. Maybe this has something to do with the new CategoricalDtype, @TomAugspurger ?

@TomAugspurger

This comment has been minimized.

Show comment
Hide comment
@TomAugspurger

TomAugspurger Oct 31, 2017

Contributor

I think it's more likely to be the changes to take_nd, but I may be wrong.

Contributor

TomAugspurger commented Oct 31, 2017

I think it's more likely to be the changes to take_nd, but I may be wrong.

@topper-123 topper-123 referenced this issue Nov 13, 2017

Closed

RLS 0.21.1 #18244

52 of 58 tasks complete

ghasemnaddaf pushed a commit to ghasemnaddaf/pandas that referenced this issue Nov 14, 2017

Fariba Aalamifar
Copy categorical codes if empty (fixes pandas-dev#18051)
If `old_categories` is empty (all nan categories) then `_recode_for_categories`
should return `codes.copy()` so that the writable flag is True.
@ghasemnaddaf

This comment has been minimized.

Show comment
Hide comment
@ghasemnaddaf
Contributor

ghasemnaddaf commented Nov 14, 2017

ghasemnaddaf pushed a commit to ghasemnaddaf/pandas that referenced this issue Nov 15, 2017

Ghasem Naddaf
BUG: Copy categorical codes if empty (fixes pandas-dev#18051)
rebased to remove conflicts in whats new

topper-123 pushed a commit to topper-123/pandas that referenced this issue Nov 22, 2017

topper-123 pushed a commit to topper-123/pandas that referenced this issue Nov 22, 2017

jreback added a commit that referenced this issue Nov 23, 2017

TomAugspurger added a commit to TomAugspurger/pandas that referenced this issue Dec 8, 2017

TomAugspurger added a commit that referenced this issue Dec 11, 2017

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment