Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

API: Add dtype parameter to Categorical.from_codes #24398

Merged
merged 10 commits into from Jan 8, 2019

Conversation

Projects
None yet
7 participants
@topper-123
Copy link
Contributor

commented Dec 22, 2018

  • tests added / passed
  • passes git diff upstream/master -u -- "*.py" | flake8 --diff
  • whatsnew entry

Added parameter dtype to Categorical.from_codes.

@topper-123 topper-123 force-pushed the topper-123:Categorical.from_codes branch from b026e50 to e2543df Dec 22, 2018

@codecov

This comment has been minimized.

Copy link

commented Dec 22, 2018

Codecov Report

Merging #24398 into master will increase coverage by <.01%.
The diff coverage is 100%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master   #24398      +/-   ##
==========================================
+ Coverage   92.37%   92.38%   +<.01%     
==========================================
  Files         166      166              
  Lines       52315    52323       +8     
==========================================
+ Hits        48327    48337      +10     
+ Misses       3988     3986       -2
Flag Coverage Δ
#multiple 90.8% <100%> (ø) ⬆️
#single 43.06% <55.55%> (-0.01%) ⬇️
Impacted Files Coverage Δ
pandas/core/arrays/categorical.py 95.67% <100%> (+0.24%) ⬆️
pandas/core/indexes/category.py 98.61% <100%> (ø) ⬆️
pandas/core/arrays/datetimelike.py 97.67% <0%> (-0.19%) ⬇️
pandas/io/formats/html.py 99.34% <0%> (ø) ⬆️
pandas/io/formats/format.py 97.98% <0%> (ø) ⬆️
pandas/core/indexes/datetimes.py 96.27% <0%> (+0.01%) ⬆️
pandas/core/arrays/datetimes.py 97.71% <0%> (+0.03%) ⬆️
pandas/util/testing.py 88.09% <0%> (+0.09%) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update be406f3...0459ad0. Read the comment docs.

@codecov

This comment has been minimized.

Copy link

commented Dec 22, 2018

Codecov Report

Merging #24398 into master will increase coverage by <.01%.
The diff coverage is 88.88%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master   #24398      +/-   ##
==========================================
+ Coverage    92.3%    92.3%   +<.01%     
==========================================
  Files         162      162              
  Lines       51875    51874       -1     
==========================================
+ Hits        47883    47884       +1     
+ Misses       3992     3990       -2
Flag Coverage Δ
#multiple 90.71% <88.88%> (ø) ⬆️
#single 42.99% <77.77%> (-0.01%) ⬇️
Impacted Files Coverage Δ
pandas/core/indexes/category.py 98.65% <100%> (ø) ⬆️
pandas/core/arrays/categorical.py 95.44% <87.5%> (+0.12%) ⬆️
pandas/util/testing.py 87.84% <0%> (+0.09%) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 3e0358d...e2543df. Read the comment docs.

@jschendel jschendel added this to the 0.24.0 milestone Dec 22, 2018

if dtype is not None:
if categories is not None or ordered is not None:
raise ValueError("Cannot specify both `dtype` and `categories`"
" or `ordered`.")

This comment has been minimized.

Copy link
@gfyoung

gfyoung Dec 23, 2018

Member

The error message confuses me. Are you saying: both "dtype" and ("categories" / "ordered") ?

I think this will need to be reworded for clarity.

This comment has been minimized.

Copy link
@topper-123

topper-123 Dec 23, 2018

Author Contributor

I copied that message from Categorical.__init__, but I agree, and have changed it in both locations.

This comment has been minimized.

Copy link
@gfyoung

gfyoung Dec 23, 2018

Member

Fantastic. Thanks for doing that!

@topper-123 topper-123 force-pushed the topper-123:Categorical.from_codes branch 2 times, most recently from fcf731b to c790639 Dec 23, 2018

@pep8speaks

This comment has been minimized.

Copy link

commented Dec 23, 2018

Hello @topper-123! Thanks for updating the PR.

Cheers ! There are no PEP8 issues in this Pull Request. 🍻

Comment last updated on January 08, 2019 at 14:54 Hours UTC

@topper-123 topper-123 force-pushed the topper-123:Categorical.from_codes branch from 6997fd8 to 31974af Dec 23, 2018

@jsexauer jsexauer referenced this pull request Dec 23, 2018

Open

DEPR: deprecations from prior versions #6581

0 of 100 tasks complete
Show resolved Hide resolved pandas/core/arrays/categorical.py Outdated
Show resolved Hide resolved pandas/core/arrays/categorical.py Outdated

@topper-123 topper-123 force-pushed the topper-123:Categorical.from_codes branch 5 times, most recently from 096c7a5 to 229b474 Dec 23, 2018

Show resolved Hide resolved pandas/core/arrays/categorical.py Outdated
Show resolved Hide resolved pandas/core/indexes/multi.py Outdated
Show resolved Hide resolved pandas/core/groupby/grouper.py Outdated
@@ -59,6 +59,7 @@
Categorical, CategoricalIndex, DataFrame, DatetimeIndex, Float64Index,
Index, Int64Index, Interval, IntervalIndex, MultiIndex, NaT, Panel, Period,
PeriodIndex, RangeIndex, Series, TimedeltaIndex, Timestamp)
from pandas.api.types import CategoricalDtype as CDT

This comment has been minimized.

Copy link
@jreback

jreback Dec 24, 2018

Contributor

same style

@@ -30,6 +30,7 @@
DataFrame, DatetimeIndex, Index, Int64Index, MultiIndex, Panel,
PeriodIndex, Series, SparseDataFrame, SparseSeries, TimedeltaIndex, compat,
concat, isna, to_datetime)
from pandas.api.types import CategoricalDtype

This comment has been minimized.

Copy link
@jreback

jreback Dec 24, 2018

Contributor

same

Show resolved Hide resolved pandas/tests/arrays/categorical/test_constructors.py Outdated
Show resolved Hide resolved pandas/tests/arrays/categorical/test_constructors.py Outdated
Show resolved Hide resolved pandas/tests/arrays/categorical/test_subclass.py Outdated
@jreback

This comment has been minimized.

Copy link
Contributor

commented Dec 26, 2018

merge master

@jreback

This comment has been minimized.

Copy link
Contributor

commented Dec 28, 2018

@topper-123 can you merge master and update

@topper-123 topper-123 force-pushed the topper-123:Categorical.from_codes branch 2 times, most recently from d6d3f81 to 8a6ec5d Dec 30, 2018

@topper-123 topper-123 force-pushed the topper-123:Categorical.from_codes branch from a4cf7a2 to 7002235 Jan 8, 2019

@topper-123

This comment has been minimized.

Copy link
Contributor Author

commented Jan 8, 2019

Ok, i’ve reverted the deprecation.

@TomAugspurger
Copy link
Contributor

left a comment

I may be missing it, but did you add a test for Categorical.from_codes(codes, categories, dtype=dtype) raising?

Show resolved Hide resolved pandas/core/arrays/categorical.py
Show resolved Hide resolved pandas/core/arrays/categorical.py Outdated
Show resolved Hide resolved pandas/core/arrays/categorical.py Outdated
Categorical.from_codes([0, 1], Categorical(['a', 'b', 'a']))
codes = np.random.choice([0, 1], 5, p=[0.9, 0.1])
dtype = CategoricalDtype(categories=["train", "test"])
Categorical.from_codes(codes, dtype=dtype)

This comment has been minimized.

Copy link
@TomAugspurger

TomAugspurger Jan 8, 2019

Contributor

Yeah, this test is duplicative with earlier ones (even on master). I'd be OK with removing it.

@jorisvandenbossche

This comment has been minimized.

Copy link
Member

commented Jan 8, 2019

Yes, +1 on not deprecating categories and ordered

@jreback

This comment has been minimized.

Copy link
Contributor

commented Jan 8, 2019

so ok with adding dtype in .from_codes as that promotes consistency, but why are folks not in favor of deprcating categories and ordered? this is just moving code away from the single point of using CDT for all operations.

@jreback
Copy link
Contributor

left a comment

see comments

@TomAugspurger

This comment has been minimized.

Copy link
Contributor

commented Jan 8, 2019

@jorisvandenbossche

This comment has been minimized.

Copy link
Member

commented Jan 8, 2019

so ok with adding dtype in .from_codes as that promotes consistency, but why are folks not in favor of deprcating categories and ordered?

We have exactly the same pattern in the main Categorical constructor as well: users have the option to just pass categories or ordered, or a full fledged dtype.
Deprecating it there of course has a much larger impact than for from_codes, but this means we have the machinery to handle the combination, so I don't really see the need to deprecate it here (certainly given that it is much more convenient to use).

@jreback

This comment has been minimized.

Copy link
Contributor

commented Jan 8, 2019

We have exactly the same pattern in the main Categorical constructor as well: users have the option to just pass categories or ordered, or a full fledged dtype.

ok I can see the argument for this then. But this is a tag confusing, maybe let's enhance the doc-strings slightly on the constructor & from_codes to make this even more cclear that you should pass (categories, ordered) or dtype (yes it errors, but a doc-string not will help).

@topper-123 can you raise an issue / PR for this.

@jreback

jreback approved these changes Jan 8, 2019

@jreback
Copy link
Contributor

left a comment

small comments and @TomAugspurger has some corrections

Show resolved Hide resolved doc/source/whatsnew/v0.24.0.rst Outdated
@TomAugspurger

This comment has been minimized.

Copy link
Contributor

commented Jan 8, 2019

6008c08 has some changes

  • added a test for raising when both categories / ordered & dtype are passed
  • updated whatsnew
  • updated versionchanged / added docs
@TomAugspurger

This comment has been minimized.

Copy link
Contributor

commented Jan 8, 2019

Buglet when neither categories nor dtype is provided.

In [1]: import pandas as pd

In [2]: pd.Categorical.from_codes([0, 1])
---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-2-e8a6a967ddf5> in <module>
----> 1 pd.Categorical.from_codes([0, 1])

~/sandbox/pandas/pandas/core/arrays/categorical.py in from_codes(cls, codes, categories, ordered, dtype)
    661
    662         if len(codes) and (
--> 663                 codes.max() >= len(dtype.categories) or codes.min() < -1):
    664             raise ValueError("codes need to be between -1 and "
    665                              "len(categories)-1")

TypeError: object of type 'NoneType' has no len()

fixing now.

Fixups
* Bug in test not using from_codes
* Raise when neither provided
@TomAugspurger

This comment has been minimized.

Copy link
Contributor

commented Jan 8, 2019

6008c08 also had a bug with the tests for raising when both categories and dtype were used. The test I added used Categorical() instead of Categorical.from_codes. That's fixed now.

@topper-123

This comment has been minimized.

Copy link
Contributor Author

commented Jan 8, 2019

+1 to the changes made by @TomAugspurger

@TomAugspurger

This comment has been minimized.

Copy link
Contributor

commented Jan 8, 2019

Thanks. Merging in a few hours if now objections.

@jreback

jreback approved these changes Jan 8, 2019

@jreback

This comment has been minimized.

Copy link
Contributor

commented Jan 8, 2019

yeah this is ok.

@TomAugspurger TomAugspurger merged commit 2897fca into pandas-dev:master Jan 8, 2019

2 checks passed

continuous-integration/travis-ci/pr The Travis CI build passed
Details
pandas-dev.pandas Build #20190108.18 succeeded
Details

@topper-123 topper-123 deleted the topper-123:Categorical.from_codes branch Jan 8, 2019

Pingviinituutti added a commit to Pingviinituutti/pandas that referenced this pull request Feb 28, 2019

Pingviinituutti added a commit to Pingviinituutti/pandas that referenced this pull request Feb 28, 2019

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.