Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

BUG/PERF: Sparse get_dummies uses concat #24372

Merged
merged 4 commits into from Dec 21, 2018

Conversation

Projects
None yet
3 participants
@TomAugspurger
Copy link
Contributor

commented Dec 20, 2018

Working around the DataFrame constructor perf issue in #24368

Fixes deprecation warnings in the ASV files so there's something to run.

Closes #24371

TomAugspurger added some commits Dec 19, 2018

Fixed warnings in asv files
(cherry picked from commit f566b46)
avoid series constructor
(cherry picked from commit eb219ac)
BUG: Fix concat(Series[sparse], axis=1)
* Preserve sparsity
* Preserve fill value
@pep8speaks

This comment has been minimized.

Copy link

commented Dec 20, 2018

Hello @TomAugspurger! Thanks for submitting the PR.

@TomAugspurger TomAugspurger added this to the 0.24.0 milestone Dec 20, 2018

@@ -1613,6 +1613,7 @@ Sparse
- Bug in :meth:`SparseArary.unique` not returning the unique values (:issue:`19595`)
- Bug in :meth:`SparseArray.nonzero` and :meth:`SparseDataFrame.dropna` returning shifted/incorrect results (:issue:`21172`)
- Bug in :meth:`DataFrame.apply` where dtypes would lose sparseness (:issue:`23744`)
- Bug in :func:`concat` when concatenating a list of :class:`Series` with all-sparse values changing the ``fill_value`` and converting to a dense Series (:issue:`24371`)

This comment has been minimized.

Copy link
@TomAugspurger

TomAugspurger Dec 20, 2018

Author Contributor

When the input to concat is a List[Series[Sparse]], we now return a DataFrame with sparse values. Previously this was a dense DataFrame (probably a bug), so it isn't API breaking.

@TomAugspurger TomAugspurger referenced this pull request Dec 20, 2018

Merged

ENH: Implemented lazy iteration #20796

4 of 4 tasks complete
@codecov

This comment has been minimized.

Copy link

commented Dec 20, 2018

Codecov Report

Merging #24372 into master will decrease coverage by 49.31%.
The diff coverage is 9.09%.

Impacted file tree graph

@@             Coverage Diff             @@
##           master   #24372       +/-   ##
===========================================
- Coverage   92.29%   42.97%   -49.32%     
===========================================
  Files         162      162               
  Lines       51832    51836        +4     
===========================================
- Hits        47839    22279    -25560     
- Misses       3993    29557    +25564
Flag Coverage Δ
#multiple ?
#single 42.97% <9.09%> (-0.01%) ⬇️
Impacted Files Coverage Δ
pandas/core/reshape/reshape.py 13.31% <0%> (-86.24%) ⬇️
pandas/core/dtypes/concat.py 57.35% <50%> (-39.25%) ⬇️
pandas/io/formats/latex.py 0% <0%> (-100%) ⬇️
pandas/core/categorical.py 0% <0%> (-100%) ⬇️
pandas/io/sas/sas_constants.py 0% <0%> (-100%) ⬇️
pandas/tseries/plotting.py 0% <0%> (-100%) ⬇️
pandas/tseries/converter.py 0% <0%> (-100%) ⬇️
pandas/io/formats/html.py 0% <0%> (-98.65%) ⬇️
pandas/core/groupby/categorical.py 0% <0%> (-95.46%) ⬇️
pandas/io/sas/sas7bdat.py 0% <0%> (-91.17%) ⬇️
... and 122 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update f6cf7d9...6a65cbc. Read the comment docs.

@codecov

This comment has been minimized.

Copy link

commented Dec 20, 2018

Codecov Report

Merging #24372 into master will increase coverage by <.01%.
The diff coverage is 100%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master   #24372      +/-   ##
==========================================
+ Coverage   92.29%   92.29%   +<.01%     
==========================================
  Files         162      162              
  Lines       51832    51836       +4     
==========================================
+ Hits        47839    47843       +4     
  Misses       3993     3993
Flag Coverage Δ
#multiple 90.7% <100%> (ø) ⬆️
#single 42.98% <9.09%> (ø) ⬆️
Impacted Files Coverage Δ
pandas/core/dtypes/concat.py 97.05% <100%> (+0.45%) ⬆️
pandas/core/reshape/reshape.py 99.56% <100%> (ø) ⬆️
pandas/util/testing.py 87.57% <0%> (-0.1%) ⬇️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update f6cf7d9...6a65cbc. Read the comment docs.

@TomAugspurger

This comment has been minimized.

Copy link
Contributor Author

commented Dec 20, 2018

n.b.
6a65cbc has an API breaking change for SparseSeries.unstack. With this PR that returns a DataFrame of sparse values instead of a SparseDataFrame.

@@ -909,7 +910,15 @@ def _make_col_name(prefix, prefix_sep, level):
index = None

if sparse:
sparse_series = {}

if is_integer_dtype(dtype):

This comment has been minimized.

Copy link
@jreback

jreback Dec 20, 2018

Contributor

we have a routine in pandas.core.dtypes.missing for this already

This comment has been minimized.

Copy link
@TomAugspurger

TomAugspurger Dec 20, 2018

Author Contributor

na_value_for_dtype, or something else? We need something a little different, since we want the 0 value for each dtype.

This comment has been minimized.

Copy link
@jreback

jreback Dec 20, 2018

Contributor

we have that too let’s try to not reinvent the wheel

This comment has been minimized.

Copy link
@TomAugspurger

TomAugspurger Dec 20, 2018

Author Contributor

I didn't a function like this in any of the dtypes modules.

@TomAugspurger

This comment has been minimized.

Copy link
Contributor Author

commented Dec 20, 2018

@TomAugspurger

This comment has been minimized.

Copy link
Contributor Author

commented Dec 21, 2018

I think this should be merged soon if possible. The CI failures are blocking #20796 and the explode PR.

I haven't made too much progress on fixing #24368 properly. Too many edge cases in our constructors.

@jreback jreback merged commit 0bb3772 into pandas-dev:master Dec 21, 2018

3 checks passed

ci/circleci Your tests passed on CircleCI!
Details
continuous-integration/travis-ci/pr The Travis CI build passed
Details
pandas-dev.pandas Build #20181220.20 succeeded
Details

@TomAugspurger TomAugspurger deleted the TomAugspurger:sparse-perf branch Jan 2, 2019

Pingviinituutti added a commit to Pingviinituutti/pandas that referenced this pull request Feb 28, 2019

Pingviinituutti added a commit to Pingviinituutti/pandas that referenced this pull request Feb 28, 2019

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.