Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

API: Always return DataFrame from get_dummies #24284

Merged
merged 5 commits into from Dec 15, 2018

Conversation

Projects
None yet
3 participants
@TomAugspurger
Copy link
Contributor

commented Dec 14, 2018

xref #19239

In preparation for hopefully deprecating SparseDataFrame / SparseSeries. Right now, I've just made this an API breaking change.

If we want to do this via deprecation, we'd need a new keyword argument to control the result type. I opted against that, because to get a warning-free get_dummies people would need

pd.get_dummies(..., sparse=True, result_type=None/'sparse_dataframe'/'dataframe'):
    ...

IMO, that's too big a burden for the common case, but would be curious to hear others' thoughts here.

@pep8speaks

This comment has been minimized.

Copy link

commented Dec 14, 2018

Hello @TomAugspurger! Thanks for submitting the PR.

@codecov

This comment has been minimized.

Copy link

commented Dec 14, 2018

Codecov Report

Merging #24284 into master will decrease coverage by <.01%.
The diff coverage is 100%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master   #24284      +/-   ##
==========================================
- Coverage   92.22%   92.22%   -0.01%     
==========================================
  Files         162      162              
  Lines       51828    51821       -7     
==========================================
- Hits        47799    47791       -8     
- Misses       4029     4030       +1
Flag Coverage Δ
#multiple 90.62% <100%> (-0.01%) ⬇️
#single 43% <0%> (ø) ⬆️
Impacted Files Coverage Δ
pandas/core/reshape/reshape.py 99.55% <100%> (-0.01%) ⬇️
pandas/io/json/json.py 92.61% <0%> (-0.48%) ⬇️
pandas/core/arrays/period.py 98.48% <0%> (-0.02%) ⬇️
pandas/util/testing.py 87.51% <0%> (+0.09%) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update d564c42...f4fa09e. Read the comment docs.

@codecov

This comment has been minimized.

Copy link

commented Dec 14, 2018

Codecov Report

Merging #24284 into master will decrease coverage by <.01%.
The diff coverage is 100%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master   #24284      +/-   ##
==========================================
- Coverage   92.28%   92.28%   -0.01%     
==========================================
  Files         162      162              
  Lines       51830    51827       -3     
==========================================
- Hits        47831    47827       -4     
- Misses       3999     4000       +1
Flag Coverage Δ
#multiple 90.68% <100%> (-0.01%) ⬇️
#single 43.01% <0%> (ø) ⬆️
Impacted Files Coverage Δ
pandas/core/reshape/reshape.py 99.55% <100%> (-0.01%) ⬇️
pandas/util/testing.py 87.48% <0%> (-0.1%) ⬇️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update e3b6683...b20609b. Read the comment docs.

@jreback
Copy link
Contributor

left a comment

minor comments

is not dummy encoded. When just ``["B", "C"]`` are passed to ``get_dummies``,
then all the columns are dummy-encoded, and a :class:`SparseDataFrame` was returned.

.. ipython:: python

This comment has been minimized.

Copy link
@jreback

jreback Dec 14, 2018

Contributor

code-block


# if all NaN
if not dummy_na and len(levels) == 0:
return get_empty_Frame(data, sparse)
return get_empty_Frame(data)

This comment has been minimized.

Copy link
@jreback

jreback Dec 14, 2018

Contributor

did you mean to capitalize? (or was that before)

@jreback

This comment has been minimized.

Copy link
Contributor

commented Dec 14, 2018

does that specific issues that it closes?

.. note::

There's no difference in memory usage between a :class:`SparseDataFrame`

This comment has been minimized.

Copy link
@jreback

jreback Dec 15, 2018

Contributor

you might need to update the existing docs slightly and/or change the usage in previous whatsnew notes.

@@ -865,19 +863,16 @@ def _get_dummies_1d(data, prefix, prefix_sep='_', dummy_na=False,
if is_object_dtype(dtype):
raise ValueError("dtype=object is not a valid dtype for get_dummies")

def get_empty_Frame(data, sparse):
def get_empty_Frame(data):

This comment has been minimized.

Copy link
@jreback

jreback Dec 15, 2018

Contributor

lowercase this, maybe make a module level function.

@TomAugspurger

This comment has been minimized.

Copy link
Contributor Author

commented Dec 15, 2018

No specific issue.

@jreback jreback merged commit 8e1a1a3 into pandas-dev:master Dec 15, 2018

3 checks passed

ci/circleci Your tests passed on CircleCI!
Details
continuous-integration/travis-ci/pr The Travis CI build passed
Details
pandas-dev.pandas Build #20181215.32 succeeded
Details
@jreback

This comment has been minimized.

Copy link
Contributor

commented Dec 15, 2018

thanks !

Pingviinituutti added a commit to Pingviinituutti/pandas that referenced this pull request Feb 28, 2019

Pingviinituutti added a commit to Pingviinituutti/pandas that referenced this pull request Feb 28, 2019

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.