Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DEPR: deprecate SparseArray.values #26421

Merged

Conversation

@jorisvandenbossche
Copy link
Member

commented May 16, 2019

Having a .values attribute on SparseArray is confusing, as .values is typically used on Series/DataFrame/Index and not on the array classes.

@codecov

This comment has been minimized.

Copy link

commented May 16, 2019

Codecov Report

Merging #26421 into master will decrease coverage by <.01%.
The diff coverage is 66.66%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master   #26421      +/-   ##
==========================================
- Coverage   91.69%   91.68%   -0.01%     
==========================================
  Files         174      174              
  Lines       50741    50743       +2     
==========================================
- Hits        46529    46526       -3     
- Misses       4212     4217       +5
Flag Coverage Δ
#multiple 90.19% <66.66%> (ø) ⬆️
#single 41.16% <0%> (-0.18%) ⬇️
Impacted Files Coverage Δ
pandas/core/sparse/frame.py 95.63% <100%> (ø) ⬆️
pandas/core/ops.py 94.68% <100%> (ø) ⬆️
pandas/util/testing.py 90.6% <100%> (-0.11%) ⬇️
pandas/core/arrays/sparse.py 92.71% <40%> (+0.01%) ⬆️
pandas/io/gbq.py 78.94% <0%> (-10.53%) ⬇️
pandas/core/frame.py 97.02% <0%> (-0.12%) ⬇️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 421ae9d...1865863. Read the comment docs.

@codecov

This comment has been minimized.

Copy link

commented May 16, 2019

Codecov Report

Merging #26421 into master will increase coverage by <.01%.
The diff coverage is 70%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master   #26421      +/-   ##
==========================================
+ Coverage   91.74%   91.75%   +<.01%     
==========================================
  Files         174      174              
  Lines       50763    50754       -9     
==========================================
- Hits        46575    46567       -8     
+ Misses       4188     4187       -1
Flag Coverage Δ
#multiple 90.26% <70%> (ø) ⬆️
#single 41.71% <10%> (-0.08%) ⬇️
Impacted Files Coverage Δ
pandas/core/internals/managers.py 93.93% <ø> (ø) ⬆️
pandas/core/sparse/frame.py 95.63% <100%> (ø) ⬆️
pandas/util/testing.py 90.7% <100%> (+0.1%) ⬆️
pandas/core/ops.py 94.68% <100%> (ø) ⬆️
pandas/core/arrays/sparse.py 93.08% <50%> (+0.38%) ⬆️
pandas/io/gbq.py 78.94% <0%> (-10.53%) ⬇️
pandas/core/frame.py 97.02% <0%> (-0.12%) ⬇️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 44d5498...fb3aebe. Read the comment docs.

@@ -2272,10 +2272,10 @@ def _cast_sparse_series_op(left, right, opname):
# TODO: This should be moved to the array?
if is_integer_dtype(left) and is_integer_dtype(right):
# series coerces to float64 if result should have NaN/inf
if opname in ('floordiv', 'mod') and (right.values == 0).any():
if opname in ('floordiv', 'mod') and (right.to_dense() == 0).any():

This comment has been minimized.

Copy link
@jreback

jreback May 16, 2019

Contributor

should we not be using np.asarry? generally rather than .to_dense()?

This comment has been minimized.

Copy link
@jorisvandenbossche

jorisvandenbossche May 16, 2019

Author Member

Both are equivalent (although to_dense actually does a bit less as it specified the dtype and asarray does some inference (not sure for that difference though)).

@jorisvandenbossche

This comment has been minimized.

Copy link
Member Author

commented May 16, 2019

cc @TomAugspurger since you are most familiar with Sparse nowadays .. (although reluctantly :-))

Removing this here also further entangles a bit the get_values / values mess, as SparseArray is still the only array with .values, and in some places we do hasattr or getattr on 'values', which then catches SparseArray ..

@TomAugspurger

This comment has been minimized.

Copy link
Contributor

commented May 16, 2019

+1

Looks like a few warnings still https://dev.azure.com/pandas-dev/pandas/_build/results?buildId=11542&view=logs&jobId=521b7dfd-2989-5ff8-bc8c-7481906480fa&taskId=07b8d9d4-6363-5e2d-bc2b-146a30521256&lineStart=154&lineEnd=154&colStart=109&colEnd=115

My other PR is adding

filterwarnings =
    error:Sparse:FutureWarning

to our setup.cfg. If you make the error message something like SparseArray.values, these warnings would be elevated to errors too (not sure if we want that or not).

@jorisvandenbossche

This comment has been minimized.

Copy link
Member Author

commented May 17, 2019

Ah, I missed the apply ones.
It's quite annoying that the output on our CI does not show which tests is causing it ... (due to using xdist).

There is one (that I actually already knew about, but for now ignored) that is not that easy to solve: the json code (ujson/python/objToJSON.c) checks in C for a 'values' attribute to get the values out of dataframe / series / index etc.

@jorisvandenbossche

This comment has been minimized.

Copy link
Member Author

commented May 20, 2019

@TomAugspurger @jreback can you have a new look? I added some extra compat code in cython/c code

@@ -28,6 +28,14 @@ cdef _get_result_array(object obj, Py_ssize_t size, Py_ssize_t cnt):
return np.empty(size, dtype='O')


cdef bint _is_sparse_array(object obj):

This comment has been minimized.

Copy link
@TomAugspurger

TomAugspurger May 20, 2019

Contributor

Would this be better-suited for pandas._libs.util? Or keep here since this is the only file using it and it's temporary?

This comment has been minimized.

Copy link
@jorisvandenbossche

jorisvandenbossche May 20, 2019

Author Member

Yes, exactly for those reasons (It's only used here, and should be removed again once we get rid of this deprecation), I would keep it here (it's not mean to be a general utility)

This comment has been minimized.

Copy link
@jreback

jreback May 21, 2019

Contributor

this is not the right location not should be in util
your argument is not correct ; just because we eventually will remove it does not mean it should. it be with similar code

@TomAugspurger

This comment has been minimized.

Copy link
Contributor

commented May 20, 2019

@jorisvandenbossche jorisvandenbossche merged commit d3a1912 into pandas-dev:master May 21, 2019

11 checks passed

continuous-integration/travis-ci/pr The Travis CI build passed
Details
pandas-dev.pandas Build #20190520.18 succeeded
Details
pandas-dev.pandas (Checks_and_doc) Checks_and_doc succeeded
Details
pandas-dev.pandas (Linux py35_compat) Linux py35_compat succeeded
Details
pandas-dev.pandas (Linux py36_locale_slow) Linux py36_locale_slow succeeded
Details
pandas-dev.pandas (Linux py36_locale_slow_old_np) Linux py36_locale_slow_old_np succeeded
Details
pandas-dev.pandas (Linux py37_locale) Linux py37_locale succeeded
Details
pandas-dev.pandas (Linux py37_np_dev) Linux py37_np_dev succeeded
Details
pandas-dev.pandas (Windows py36_np15) Windows py36_np15 succeeded
Details
pandas-dev.pandas (Windows py37_np141) Windows py37_np141 succeeded
Details
pandas-dev.pandas (macOS py35_macos) macOS py35_macos succeeded
Details

@jorisvandenbossche jorisvandenbossche deleted the jorisvandenbossche:depr-sparse-values branch May 21, 2019

@jreback
Copy link
Contributor

left a comment

not really sure of the urgency here @jorisvandenbossche

i have some comments - and will fully review at some point

@@ -28,6 +28,14 @@ cdef _get_result_array(object obj, Py_ssize_t size, Py_ssize_t cnt):
return np.empty(size, dtype='O')


cdef bint _is_sparse_array(object obj):

This comment has been minimized.

Copy link
@jreback

jreback May 21, 2019

Contributor

this is not the right location not should be in util
your argument is not correct ; just because we eventually will remove it does not mean it should. it be with similar code

@@ -28,6 +28,14 @@ cdef _get_result_array(object obj, Py_ssize_t size, Py_ssize_t cnt):
return np.empty(size, dtype='O')


cdef bint _is_sparse_array(object obj):
# TODO can be removed one SparseArray.values is removed (GH26421)
if hasattr(obj, '_subtyp'):

This comment has been minimized.

Copy link
@jreback

jreback May 21, 2019

Contributor

this idiom should be getattr

@jorisvandenbossche

This comment has been minimized.

Copy link
Member Author

commented May 21, 2019

Sorry, there was no urgency at all. Just thought for a moment that the review of Tom was enough, and wanted to get over with this PR. Will wait on your full review then before doing any fixup.

@jreback

This comment has been minimized.

Copy link
Contributor

commented May 26, 2019

@jorisvandenbossche my main comment was the Is_sparse_array needs to be in util.pyx (doesn't matter that we will eventually remove it), its in the wrong place.

slight confusion between whether we recommend .to_dense() or np.array() for conversions; we should try to be consistent (maybe just deprecate .to_dense()) but another issue (maybe create one).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
3 participants
You can’t perform that action at this time.