API: Always return DataFrame from get_dummies #24284

TomAugspurger · 2018-12-14T20:10:59Z

xref #19239

In preparation for hopefully deprecating SparseDataFrame / SparseSeries. Right now, I've just made this an API breaking change.

If we want to do this via deprecation, we'd need a new keyword argument to control the result type. I opted against that, because to get a warning-free get_dummies people would need

pd.get_dummies(..., sparse=True, result_type=None/'sparse_dataframe'/'dataframe'):
    ...

IMO, that's too big a burden for the common case, but would be curious to hear others' thoughts here.

xref pandas-dev#19239

pep8speaks · 2018-12-14T20:11:03Z

Hello @TomAugspurger! Thanks for submitting the PR.

There are no PEP8 issues in the file pandas/core/reshape/reshape.py !
There are no PEP8 issues in the file pandas/tests/reshape/test_reshape.py !

codecov · 2018-12-14T23:08:39Z

Codecov Report

Merging #24284 into master will decrease coverage by <.01%.
The diff coverage is 100%.

@@            Coverage Diff             @@
##           master   #24284      +/-   ##
==========================================
- Coverage   92.22%   92.22%   -0.01%     
==========================================
  Files         162      162              
  Lines       51828    51821       -7     
==========================================
- Hits        47799    47791       -8     
- Misses       4029     4030       +1

Flag	Coverage Δ
#multiple	`90.62% <100%> (-0.01%)`	⬇️
#single	`43% <0%> (ø)`	⬆️

Impacted Files	Coverage Δ
pandas/core/reshape/reshape.py	`99.55% <100%> (-0.01%)`	⬇️
pandas/io/json/json.py	`92.61% <0%> (-0.48%)`	⬇️
pandas/core/arrays/period.py	`98.48% <0%> (-0.02%)`	⬇️
pandas/util/testing.py	`87.51% <0%> (+0.09%)`	⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update d564c42...f4fa09e. Read the comment docs.

codecov · 2018-12-14T23:08:47Z

Codecov Report

Merging #24284 into master will decrease coverage by <.01%.
The diff coverage is 100%.

@@            Coverage Diff             @@
##           master   #24284      +/-   ##
==========================================
- Coverage   92.28%   92.28%   -0.01%     
==========================================
  Files         162      162              
  Lines       51830    51827       -3     
==========================================
- Hits        47831    47827       -4     
- Misses       3999     4000       +1

Flag	Coverage Δ
#multiple	`90.68% <100%> (-0.01%)`	⬇️
#single	`43.01% <0%> (ø)`	⬆️

Impacted Files	Coverage Δ
pandas/core/reshape/reshape.py	`99.55% <100%> (-0.01%)`	⬇️
pandas/util/testing.py	`87.48% <0%> (-0.1%)`	⬇️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update e3b6683...b20609b. Read the comment docs.

jreback

minor comments

jreback · 2018-12-14T23:19:40Z

doc/source/whatsnew/v0.24.0.rst

+is not dummy encoded. When just ``["B", "C"]`` are passed to ``get_dummies``,
+then all the columns are dummy-encoded, and a :class:`SparseDataFrame` was returned.
+
+.. ipython:: python


jreback · 2018-12-14T23:20:33Z

pandas/core/reshape/reshape.py


    # if all NaN
    if not dummy_na and len(levels) == 0:
-        return get_empty_Frame(data, sparse)
+        return get_empty_Frame(data)


did you mean to capitalize? (or was that before)

jreback · 2018-12-14T23:22:58Z

does that specific issues that it closes?

jreback · 2018-12-15T19:19:57Z

doc/source/whatsnew/v0.24.0.rst

+
+.. note::
+
+   There's no difference in memory usage between a :class:`SparseDataFrame`


you might need to update the existing docs slightly and/or change the usage in previous whatsnew notes.

jreback · 2018-12-15T19:21:11Z

pandas/core/reshape/reshape.py

@@ -865,19 +863,16 @@ def _get_dummies_1d(data, prefix, prefix_sep='_', dummy_na=False,
    if is_object_dtype(dtype):
        raise ValueError("dtype=object is not a valid dtype for get_dummies")

-    def get_empty_Frame(data, sparse):
+    def get_empty_Frame(data):


lowercase this, maybe make a module level function.

TomAugspurger · 2018-12-15T20:17:24Z

No specific issue.

jreback · 2018-12-15T21:13:54Z

thanks !

API: Always return DataFrame from get_dummies

bfb3dfb

xref pandas-dev#19239

TomAugspurger added Reshaping Concat, Merge/Join, Stack/Unstack, Explode API Design Sparse Sparse Data Type labels Dec 14, 2018

TomAugspurger added this to the 0.24.0 milestone Dec 14, 2018

Update docsting

f4fa09e

jreback requested changes Dec 14, 2018

View reviewed changes

jreback requested changes Dec 15, 2018

View reviewed changes

TomAugspurger added 3 commits December 15, 2018 14:14

Merge remote-tracking branch 'upstream/master' into get_dummies-sparse

4a09d1d

fixup

da8a6cb

added issue

b20609b

jreback approved these changes Dec 15, 2018

View reviewed changes

jreback merged commit 8e1a1a3 into pandas-dev:master Dec 15, 2018

Pingviinituutti pushed a commit to Pingviinituutti/pandas that referenced this pull request Feb 28, 2019

API: Always return DataFrame from get_dummies (pandas-dev#24284)

7fa5775

Pingviinituutti pushed a commit to Pingviinituutti/pandas that referenced this pull request Feb 28, 2019

API: Always return DataFrame from get_dummies (pandas-dev#24284)

cd3af7e

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

API: Always return DataFrame from get_dummies #24284

API: Always return DataFrame from get_dummies #24284

TomAugspurger commented Dec 14, 2018

pep8speaks commented Dec 14, 2018

codecov bot commented Dec 14, 2018

codecov bot commented Dec 14, 2018 •

edited

Loading

jreback left a comment

jreback Dec 14, 2018

jreback Dec 14, 2018

jreback commented Dec 14, 2018

jreback Dec 15, 2018

jreback Dec 15, 2018

TomAugspurger commented Dec 15, 2018

jreback commented Dec 15, 2018


		.. note::

		There's no difference in memory usage between a :class:`SparseDataFrame`

API: Always return DataFrame from get_dummies #24284

API: Always return DataFrame from get_dummies #24284

Conversation

TomAugspurger commented Dec 14, 2018

pep8speaks commented Dec 14, 2018

codecov bot commented Dec 14, 2018

Codecov Report

codecov bot commented Dec 14, 2018 • edited Loading

Codecov Report

jreback left a comment

Choose a reason for hiding this comment

jreback Dec 14, 2018

Choose a reason for hiding this comment

jreback Dec 14, 2018

Choose a reason for hiding this comment

jreback commented Dec 14, 2018

jreback Dec 15, 2018

Choose a reason for hiding this comment

jreback Dec 15, 2018

Choose a reason for hiding this comment

TomAugspurger commented Dec 15, 2018

jreback commented Dec 15, 2018

codecov bot commented Dec 14, 2018 •

edited

Loading