Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

API: Always return DataFrame from get_dummies #24284

Merged
merged 5 commits into from
Dec 15, 2018

Conversation

TomAugspurger
Copy link
Contributor

xref #19239

In preparation for hopefully deprecating SparseDataFrame / SparseSeries. Right now, I've just made this an API breaking change.

If we want to do this via deprecation, we'd need a new keyword argument to control the result type. I opted against that, because to get a warning-free get_dummies people would need

pd.get_dummies(..., sparse=True, result_type=None/'sparse_dataframe'/'dataframe'):
    ...

IMO, that's too big a burden for the common case, but would be curious to hear others' thoughts here.

@TomAugspurger TomAugspurger added Reshaping Concat, Merge/Join, Stack/Unstack, Explode API Design Sparse Sparse Data Type labels Dec 14, 2018
@TomAugspurger TomAugspurger added this to the 0.24.0 milestone Dec 14, 2018
@pep8speaks
Copy link

Hello @TomAugspurger! Thanks for submitting the PR.

@codecov
Copy link

codecov bot commented Dec 14, 2018

Codecov Report

Merging #24284 into master will decrease coverage by <.01%.
The diff coverage is 100%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master   #24284      +/-   ##
==========================================
- Coverage   92.22%   92.22%   -0.01%     
==========================================
  Files         162      162              
  Lines       51828    51821       -7     
==========================================
- Hits        47799    47791       -8     
- Misses       4029     4030       +1
Flag Coverage Δ
#multiple 90.62% <100%> (-0.01%) ⬇️
#single 43% <0%> (ø) ⬆️
Impacted Files Coverage Δ
pandas/core/reshape/reshape.py 99.55% <100%> (-0.01%) ⬇️
pandas/io/json/json.py 92.61% <0%> (-0.48%) ⬇️
pandas/core/arrays/period.py 98.48% <0%> (-0.02%) ⬇️
pandas/util/testing.py 87.51% <0%> (+0.09%) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update d564c42...f4fa09e. Read the comment docs.

@codecov
Copy link

codecov bot commented Dec 14, 2018

Codecov Report

Merging #24284 into master will decrease coverage by <.01%.
The diff coverage is 100%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master   #24284      +/-   ##
==========================================
- Coverage   92.28%   92.28%   -0.01%     
==========================================
  Files         162      162              
  Lines       51830    51827       -3     
==========================================
- Hits        47831    47827       -4     
- Misses       3999     4000       +1
Flag Coverage Δ
#multiple 90.68% <100%> (-0.01%) ⬇️
#single 43.01% <0%> (ø) ⬆️
Impacted Files Coverage Δ
pandas/core/reshape/reshape.py 99.55% <100%> (-0.01%) ⬇️
pandas/util/testing.py 87.48% <0%> (-0.1%) ⬇️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update e3b6683...b20609b. Read the comment docs.

Copy link
Contributor

@jreback jreback left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

minor comments

is not dummy encoded. When just ``["B", "C"]`` are passed to ``get_dummies``,
then all the columns are dummy-encoded, and a :class:`SparseDataFrame` was returned.

.. ipython:: python
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

code-block


# if all NaN
if not dummy_na and len(levels) == 0:
return get_empty_Frame(data, sparse)
return get_empty_Frame(data)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

did you mean to capitalize? (or was that before)

@jreback
Copy link
Contributor

jreback commented Dec 14, 2018

does that specific issues that it closes?


.. note::

There's no difference in memory usage between a :class:`SparseDataFrame`
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you might need to update the existing docs slightly and/or change the usage in previous whatsnew notes.

@@ -865,19 +863,16 @@ def _get_dummies_1d(data, prefix, prefix_sep='_', dummy_na=False,
if is_object_dtype(dtype):
raise ValueError("dtype=object is not a valid dtype for get_dummies")

def get_empty_Frame(data, sparse):
def get_empty_Frame(data):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lowercase this, maybe make a module level function.

@TomAugspurger
Copy link
Contributor Author

No specific issue.

@jreback jreback merged commit 8e1a1a3 into pandas-dev:master Dec 15, 2018
@jreback
Copy link
Contributor

jreback commented Dec 15, 2018

thanks !

Pingviinituutti pushed a commit to Pingviinituutti/pandas that referenced this pull request Feb 28, 2019
Pingviinituutti pushed a commit to Pingviinituutti/pandas that referenced this pull request Feb 28, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
API Design Reshaping Concat, Merge/Join, Stack/Unstack, Explode Sparse Sparse Data Type
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants