Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CategoricalDtype construction: actually use fastpath #17891

Merged

Conversation

jorisvandenbossche
Copy link
Member

cc @TomAugspurger the current CategoricalDtype._from_fastpath(..) was not any different from CategoricalDtype(..) as far as I could see, as the fastpath argument in _finalize was not used. This seems the logical fix.

@jorisvandenbossche jorisvandenbossche added Categorical Categorical Data Type Internals Related to non-user accessible pandas implementation labels Oct 16, 2017
@jorisvandenbossche jorisvandenbossche added this to the 0.21.0 milestone Oct 16, 2017
@jreback
Copy link
Contributor

jreback commented Oct 16, 2017

passing dtype averts the need for fastpath

@jorisvandenbossche
Copy link
Member Author

That may be, but I suppose CategoricalDtype._from_fastpath(cats) exists for a reason? Currently it does exactly the same as CategoricalDtype(cats) because the kwarg is not passed through.
Note this is not fastpath in the Categorical constructor (#17562) but the internal _from_fastpath method

@jreback
Copy link
Contributor

jreback commented Oct 16, 2017

we shouldn’t have both (and deprecating fastpath=True)
so the constructor ._from_fastpath is a better usage generally

@jorisvandenbossche
Copy link
Member Author

so the constructor ._from_fastpath is a better usage generally

Sorry, Jeff, I really don't follow your comments. Please look at the diff / my explanation. It is exactly _from_fastpath that I am fixing, so what is the point you are trying to make?

@jreback
Copy link
Contributor

jreback commented Oct 16, 2017

hah didn’t see that

better to pass in dtype then as it’s the fastpath anyhow

@jorisvandenbossche
Copy link
Member Author

I think the point of _from_fastpath is to have a fastpath when starting from (known-to-be-valid) categories, not a dtype. As to pass a dtype as fastpath, you still first have to create the dtype :-)

@jreback
Copy link
Contributor

jreback commented Oct 16, 2017

I think the point of _from_fastpath is to have a fastpath when starting from (known-to-be-valid) categories, not a dtype. As to pass a dtype as fastpath, you still first have to create the dtype :-)

exactly the point. you should create the dtype and pass it (this is the general soln to removing the fastpath kw anyhow

@jorisvandenbossche
Copy link
Member Author

exactly the point. you should create the dtype and pass it

So how does this relate to this PR? As this PR exactly deals with this creation of the dtype.
Or do you mean: yes, this is something we should do more so this PR is fine! If so please say so :-) If not, please look at the diff (expand it a bit too above, so you see the actual _from_fastpath method where '_finalize' is used), and give a more specific comment.

@jreback
Copy link
Contributor

jreback commented Oct 16, 2017

actually this is ok. sorry was looking at something else.

@codecov
Copy link

codecov bot commented Oct 16, 2017

Codecov Report

Merging #17891 into master will decrease coverage by 0.01%.
The diff coverage is 100%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master   #17891      +/-   ##
==========================================
- Coverage   91.26%   91.24%   -0.02%     
==========================================
  Files         163      163              
  Lines       50105    50105              
==========================================
- Hits        45727    45718       -9     
- Misses       4378     4387       +9
Flag Coverage Δ
#multiple 89.05% <100%> (ø) ⬆️
#single 40.31% <100%> (-0.06%) ⬇️
Impacted Files Coverage Δ
pandas/core/dtypes/dtypes.py 95.14% <100%> (ø) ⬆️
pandas/io/gbq.py 25% <0%> (-58.34%) ⬇️
pandas/core/frame.py 97.75% <0%> (-0.1%) ⬇️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 34978a7...e883b3b. Read the comment docs.

@codecov
Copy link

codecov bot commented Oct 16, 2017

Codecov Report

Merging #17891 into master will decrease coverage by 0.01%.
The diff coverage is 100%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master   #17891      +/-   ##
==========================================
- Coverage   91.26%   91.24%   -0.02%     
==========================================
  Files         163      163              
  Lines       50105    50105              
==========================================
- Hits        45727    45718       -9     
- Misses       4378     4387       +9
Flag Coverage Δ
#multiple 89.05% <100%> (ø) ⬆️
#single 40.31% <100%> (-0.06%) ⬇️
Impacted Files Coverage Δ
pandas/core/dtypes/dtypes.py 95.14% <100%> (ø) ⬆️
pandas/io/gbq.py 25% <0%> (-58.34%) ⬇️
pandas/core/frame.py 97.75% <0%> (-0.1%) ⬇️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 34978a7...e883b3b. Read the comment docs.

@TomAugspurger
Copy link
Contributor

Thanks @jorisvandenbossche, sorry I missed that earlier.

@TomAugspurger TomAugspurger merged commit 9092445 into pandas-dev:master Oct 16, 2017
@jorisvandenbossche jorisvandenbossche deleted the cat-dtype-fastpath branch October 16, 2017 13:13
ghost pushed a commit to reef-technologies/pandas that referenced this pull request Oct 16, 2017
ghost pushed a commit to reef-technologies/pandas that referenced this pull request Oct 16, 2017
* upstream/master: (76 commits)
  CategoricalDtype construction: actually use fastpath (pandas-dev#17891)
  DEPR: Deprecate tupleize_cols in to_csv (pandas-dev#17877)
  BUG: Fix wrong column selection in drop_duplicates when duplicate column names (pandas-dev#17879)
  DOC: Adding examples to update docstring (pandas-dev#16812) (pandas-dev#17859)
  TST: Skip if no openpyxl in test_excel (pandas-dev#17883)
  TST: Catch read_html slow test warning (pandas-dev#17874)
  flake8 cleanup (pandas-dev#17873)
  TST: remove moar warnings (pandas-dev#17872)
  ENH: tolerance now takes list-like argument for reindex and get_indexer. (pandas-dev#17367)
  ERR: Raise ValueError when week is passed in to_datetime format witho… (pandas-dev#17819)
  TST: remove some deprecation warnings (pandas-dev#17870)
  Refactor index-as-string groupby tests and fix spurious warning (Bug 17383) (pandas-dev#17843)
  BUG: merging with a boolean/int categorical column (pandas-dev#17841)
  DEPR: Deprecate read_csv arguments fully (pandas-dev#17865)
  BUG: to_json - prevent various segfault conditions (GH14256) (pandas-dev#17857)
  CLN: Use pandas.core.common for None checks (pandas-dev#17816)
  BUG: set tz on DTI from fixed format HDFStore (pandas-dev#17844)
  RLS: v0.21.0rc1
  Whatsnew cleanup (pandas-dev#17858)
  DEPR: Deprecate the convert parameter completely (pandas-dev#17831)
  ...
yeemey pushed a commit to yeemey/pandas that referenced this pull request Oct 20, 2017
alanbato pushed a commit to alanbato/pandas that referenced this pull request Nov 10, 2017
No-Stream pushed a commit to No-Stream/pandas that referenced this pull request Nov 28, 2017
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Categorical Categorical Data Type Internals Related to non-user accessible pandas implementation
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants