New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DEPR/CLN: Remove pd.rolling_*, pd.expanding* and pd.ewm* #18723

Merged
merged 3 commits into from Feb 1, 2018

Conversation

Projects
None yet
6 participants
@topper-123
Contributor

topper-123 commented Dec 11, 2017

  • xref #18601, #18668 and #11603
  • passes git diff upstream/master -u -- "*.py" | flake8 --diff
  • whatsnew entry

This PR removes pd.rolling_*, pd.expanding_* pd.ewm* and the related pd.stats subpackage from the code base.

I have some doubts about some stuff:

  • there are modules doc.plots.stats.* that I'm not sure of if they should be changed or removed. Currently they're not touched.
  • In asv_bench.benchmarks.gil.py there are calls to pd.rolling_* etc. I've changed this to be method-based.
  • pandas.tests.test_windows.py I felt was a bit complex to clean up.

I would appreciate inputs.

@jreback

you are removing a lot of tests that should remain

@@ -216,7 +216,9 @@ Removal of prior version deprecations/changes
- The ``pandas.io.wb`` and ``pandas.io.data`` stub modules have been removed (:issue:`13735`)
- ``Categorical.from_array`` has been removed (:issue:`13854`)
- The ``freq`` and ``how`` parameters have been removed from the ``rolling``/``expanding``/``ewm`` methods of DataFrame
and Series (deprecated since v0.18). Instead, resample before calling the methods. (:issue:18601 & :issue:18668)
and Series (deprecated since v0.18). Instead, resample before calling the methods. (:issue:`18601`)
- The top-level functions ``pd.rolling_*``, ``pd.expanding_*`` and ``pd.ewm*`` have been removed (deprecated since v0.18).

This comment has been minimized.

@jreback

jreback Dec 11, 2017

Contributor

don’t remove any issue numbers

This comment has been minimized.

@topper-123

topper-123 Dec 11, 2017

Contributor

Hmm, this and some other stuff that’s unrelated to the pd.rolling stuff seems to be a rebase issue. I’ll fix it tomorrow.

If you want, you can limit the review to only the latest commit, the other commits are because of the rebase issue.

@@ -331,5 +333,5 @@ Categorical
Other
^^^^^
- Improved error message when attempting to use a Python keyword as an identifier in a ``numexpr`` backed query (:issue:`18221`)
- Improved error message when attempting to use a Python keyword as an identifier in a numexpr query (:issue:`18221`)

This comment has been minimized.

@jreback

jreback Dec 11, 2017

Contributor

don’t change other things

12.952, np.nan, np.nan])
with catch_warnings(record=True):
rs = mom.rolling_window(vals, 5, 'boxcar', center=True)

This comment has been minimized.

@jreback

jreback Dec 11, 2017

Contributor

don’t remove tests just because we are catching the warning

you may need to rewrite them to use the new syntax
unless they duplicate current tests

@codecov

This comment has been minimized.

codecov bot commented Dec 11, 2017

Codecov Report

Merging #18723 into master will increase coverage by 0.06%.
The diff coverage is n/a.

Impacted file tree graph

@@            Coverage Diff             @@
##           master   #18723      +/-   ##
==========================================
+ Coverage   91.61%   91.67%   +0.06%     
==========================================
  Files         153      151       -2     
  Lines       51339    51120     -219     
==========================================
- Hits        47034    46865     -169     
+ Misses       4305     4255      -50
Flag Coverage Δ
#multiple 89.53% <ø> (+0.07%) ⬆️
#single 40.69% <ø> (-0.17%) ⬇️
Impacted Files Coverage Δ
pandas/io/gbq.py 25% <0%> (-58.34%) ⬇️
pandas/core/indexes/category.py 97.2% <0%> (-0.31%) ⬇️
pandas/core/window.py 96.08% <0%> (-0.24%) ⬇️
pandas/util/testing.py 82.34% <0%> (-0.2%) ⬇️
pandas/io/formats/format.py 96.03% <0%> (-0.15%) ⬇️
pandas/core/dtypes/dtypes.py 95.14% <0%> (-0.14%) ⬇️
pandas/core/frame.py 97.81% <0%> (-0.1%) ⬇️
pandas/core/indexes/timedeltas.py 91.21% <0%> (-0.06%) ⬇️
pandas/core/indexes/numeric.py 97.33% <0%> (-0.04%) ⬇️
pandas/core/indexes/period.py 92.9% <0%> (-0.04%) ⬇️
... and 6 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 2aa4aa9...e311f50. Read the comment docs.

@codecov

This comment has been minimized.

codecov bot commented Dec 11, 2017

Codecov Report

Merging #18723 into master will increase coverage by 0.07%.
The diff coverage is n/a.

Impacted file tree graph

@@            Coverage Diff             @@
##           master   #18723      +/-   ##
==========================================
+ Coverage   91.62%   91.69%   +0.07%     
==========================================
  Files         150      148       -2     
  Lines       48728    48542     -186     
==========================================
- Hits        44645    44513     -132     
+ Misses       4083     4029      -54
Flag Coverage Δ
#multiple 90.06% <ø> (+0.07%) ⬆️
#single 41.71% <ø> (-0.03%) ⬇️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update ca4ae4f...1443818. Read the comment docs.

@topper-123

This comment has been minimized.

Contributor

topper-123 commented Dec 11, 2017

The rebase issue has been solved now. I'll look at your comments now,. If you've got additional comments, I'd be thakful, and I'll look at them

@topper-123 topper-123 referenced this pull request Dec 12, 2017

Merged

CLN: remove doc/plots directory (dead code) #18751

1 of 2 tasks complete
@pep8speaks

This comment has been minimized.

pep8speaks commented Dec 14, 2017

Hello @topper-123! Thanks for updating the PR.

Cheers ! There are no PEP8 issues in this Pull Request. 🍻

Comment last updated on February 01, 2018 at 06:53 Hours UTC
@topper-123

This comment has been minimized.

Contributor

topper-123 commented Dec 14, 2017

Hi @jreback. An issue is that dask apparently still uses pd.rolling_*. This means that pandas/tests/test_downstream.py currently fails. Should this PR be put on hold, or would notifying the dask guys about this removal be enough? This stuff has been deprecated since 0.18, so for quite a long time.

This PR doesn't touch too many files, so it would be ok to wait a bit, but given the simplification to the pandas API that this PR offers, it should IMO be included when 0.22 releases.

I've updated the PR, so be'd grateful for a review, so it'll be ready for an eventual pull.

@jreback

This comment has been minimized.

Contributor

jreback commented Dec 14, 2017

@TomAugspurger

This comment has been minimized.

Contributor

TomAugspurger commented Dec 14, 2017

The dask versions are deprecated, just like pandas. I believe the best solution would be to

  1. Use this PR to update all the test to the new syntax
  2. Update dask to remove the dd.rolling, etc. methods (@topper-123 could I convince you to work on that? :) It'd be nice to have you as a contributor if you haven't already.)
  3. Remove them from pandas once dask has a release

Depending on the timing of pandas 0.22, dask may not have a release out.

@topper-123

This comment has been minimized.

Contributor

topper-123 commented Dec 14, 2017

@TomAugspurger , the test_dask function doesn't explicitly call pd.rolling_*, so changes must be made in the dask code base, correct?

I've never used dask, and have a feeling I'd have to learn some intricancies of that lib to make such a PR, and I could easily go down a rabbit's hole that I never intended to enter... Could someone familiar with dask not take it upon themselves to do those changes? I can sit on this PR until dask is updated, and rebase when this is ready to merge.

TomAugspurger added a commit to TomAugspurger/dask that referenced this pull request Dec 14, 2017

@TomAugspurger TomAugspurger referenced this pull request Dec 14, 2017

Merged

Remove rolling #2995

3 of 3 tasks complete
@TomAugspurger

This comment has been minimized.

Contributor

TomAugspurger commented Dec 14, 2017

@topper-123 dask was importing them at the top level if dask.dataframe, which caused the failure. Next time dask wants to mirror a pandas deprecation, we'll want to put it in a try / except block.

dask/dask#2995

@topper-123

This comment has been minimized.

Contributor

topper-123 commented Dec 15, 2017

@TomAugspurger , thanks for picking up on this, I will wait with my PR a while, until dask is updated.

My PR lowers the number of top-level functions of pandas by some 30, so it would IMO be very beneficial to pandas' users to have this merged by 0.22, as it would make pandas look much more focused. Could dask have a release before pandas 0.22, eventually just a minor release?

@gfyoung gfyoung added the Clean label Dec 16, 2017

@jreback jreback referenced this pull request Dec 16, 2017

Open

DEPR: deprecations log for removed issues #13777

115 of 115 tasks complete

jcrist added a commit to dask/dask that referenced this pull request Dec 21, 2017

Remove rolling (#2995)
* Remove rolling

* Remove deprecated rolling methods

xref pandas-dev/pandas#18723

* PEP8

* Added reference
@topper-123

This comment has been minimized.

Contributor

topper-123 commented Jan 10, 2018

dask 0.16.1 has now been released, and all tests now pass locally with dask version 0.16.1. The travis failure has to do with IO/S3 which I didn't touch. That's the only failure in travis.

So, I think everything is ok with my PR. Is there anything I can do to make this pass that travis error, or is this travis failure unrelated to my PR?

@topper-123

This comment has been minimized.

Contributor

topper-123 commented Jan 13, 2018

All green.

@jorisvandenbossche

This comment has been minimized.

Member

jorisvandenbossche commented Jan 13, 2018

Can you add a whatsnew notice?

@jreback

small comments. looks pretty good.

@@ -1148,17 +1108,11 @@ def scoreatpercentile(a, per):
return retval
for q in qs:

This comment has been minimized.

@jreback

jreback Jan 13, 2018

Contributor

you can parametrize this test on qs

This comment has been minimized.

@topper-123

topper-123 Jan 13, 2018

Contributor

Good catch, changed.

result = mom.rolling_std(np.array([1., 2., 3., 4., 5.]),
1, min_periods=1)
expected = np.array([np.nan] * 5)
tm.assert_almost_equal(result, expected)

This comment has been minimized.

@jreback

jreback Jan 13, 2018

Contributor

there are 2 tests here (diff ddof)

This comment has been minimized.

@topper-123

topper-123 Jan 13, 2018

Contributor

Not sure I understand, I'm replicating what was there before.

This comment has been minimized.

@jreback

jreback Jan 13, 2018

Contributor

ahh I c, ok then.

def test_ewma_alpha_old_api(self):
# GH 10789
with catch_warnings(record=True):

This comment has been minimized.

@jreback

jreback Jan 13, 2018

Contributor

where is the test for this one?

This comment has been minimized.

@topper-123

topper-123 Jan 13, 2018

Contributor

There's a test test_ewm_alpha that is the same as this.

This comment has been minimized.

@jreback

jreback Jan 13, 2018

Contributor

ok great

@jreback

lgtm. ping on green.

@topper-123

This comment has been minimized.

Contributor

topper-123 commented Jan 14, 2018

All green

@jorisvandenbossche

Question about one of the tests

assert result2.dtype == np.float_
def _check_ew_structures(self, func, name):
def _check_ew(self, name=None):

This comment has been minimized.

@jorisvandenbossche

jorisvandenbossche Jan 14, 2018

Member

Previously this function consisted of _check_ew_ndarray and _check_ew_structures subfunctions to test both interfaces. You removed _check_ew_ndarray and only kept the content of _check_ew_structures as this is testing the new interface.
But, as far as I can see, the actual testing was done in _check_ew_ndarray (_check_ew_structures is only testing the output class type). Shouldn't this testing be kept? (changed to the new interface)

(might good be I am missing something, or that it is covered in other tests)

This comment has been minimized.

@topper-123

topper-123 Jan 14, 2018

Contributor

I think you are right: So rather than test this function call, test the ewm method call on a series. I'll look into it tomorrow.

@topper-123

This comment has been minimized.

Contributor

topper-123 commented Jan 14, 2018

@jorisvandenbossche , I've now replicated the code from _check_ew_ndarray into check_ew. Could you review it?

@jorisvandenbossche

This comment has been minimized.

Member

jorisvandenbossche commented Jan 15, 2018

@topper-123 that specific one looks good now I think.
Did you double check the other tests to see there were not similar problems?

@topper-123

This comment has been minimized.

Contributor

topper-123 commented Jan 15, 2018

I've added test_expanding_func (parametrized, was called test_parametrized_max previously) and test_expanding_apply. Everything is now included AFAICT.

@topper-123

This comment has been minimized.

Contributor

topper-123 commented Jan 17, 2018

The tests have been running for 9 hours now and the Travis-CI tests have not yet finished. Same thing happened yesterday where after 20 hours one travis test-suite failed without console output (my guess is it timed out). Is anything wrong with the travis suite?

Everything passes locally and the errors that travis reported two days ago have been fixed (but the fixes not confirmed by travis, because I can't get travis to go through them...).

What should I do? Could a core dev run the travis tests locally and confirm they're ok, and merge them?

@TomAugspurger

This comment has been minimized.

Contributor

TomAugspurger commented Jan 17, 2018

TravisCI had an outage yesterday, so all the builds were canceled and rescheduled. Looks like yours is 4th on deck at the moment: https://travis-ci.org/pandas-dev/pandas/pull_requests

@topper-123

This comment has been minimized.

Contributor

topper-123 commented Jan 17, 2018

Ok, thanks. I'll wait a bit more then.

@topper-123

This comment has been minimized.

Contributor

topper-123 commented Jan 17, 2018

All green now🤓

@jreback

This comment has been minimized.

Contributor

jreback commented Jan 18, 2018

when @jorisvandenbossche approves can merge.

@topper-123

This comment has been minimized.

Contributor

topper-123 commented Jan 19, 2018

@jorisvandenbossche, I'd really appreciate if you could do this.

@jorisvandenbossche

sorry @topper-123, one more question
(it's a very complex file / diff (not your fault!) to review)

result = get_result(arr, 50, min_periods=30)
tm.assert_almost_equal(result[-1], static_comp(arr[10:-10]))
# min_periods is working correctly

This comment has been minimized.

@jorisvandenbossche

jorisvandenbossche Jan 19, 2018

Member

Do we need to move parts of _check_ndarray like this part on checking min_periods to _check_moments_func ?

From a superficial look, it seems that _check_ndarray did more than _check_structures (eg in _check_structures I don't see any assertion of NaNs when min_periods is specified like here)

I also don't know if test_stable and test_window make sense for the new interface

@topper-123

This comment has been minimized.

Contributor

topper-123 commented Jan 21, 2018

Changes suggested by @jorisvandenbossche implemented and all green.

r = obj.rolling(window=window, min_periods=min_periods,
center=center)
return getattr(r, name)(**kwargs)
self._check_series(name, static_comp=static_comp,

This comment has been minimized.

@jreback

jreback Jan 21, 2018

Contributor

I am not sure of the point of adding _check_series, it seems that you are already checking the series & frame results (line 1344), yes?

This comment has been minimized.

@jreback

jreback Jan 21, 2018

Contributor

if there IS a difference, IOW you are checking something different, then certainly don't make a new function, just in-line it here.

@topper-123

This comment has been minimized.

Contributor

topper-123 commented Jan 31, 2018

Hi @jreback. I uploaded a new PR 12 days ago.

There were a few issues that were checked in _check_ndarray but were not in _check_moments previously:

  • tests for whether Nans were excluded correctly
  • tests for whether center worked correctly
  • tests for what happens when the window is larger than the series

All of the above are inlined in _check_moments now (see 37abd25).

I've rebased a moments ago, without making code changes otherwise, so everything should turn up green & rebased when the tests are finished.

@jreback

looks fine. just that 1 question.

@@ -863,72 +848,55 @@ def test_centered_axis_validation(self):
.rolling(window=3, center=True, axis=2).mean())
def test_rolling_sum(self):
self._check_moment_func(mom.rolling_sum, np.nansum, name='sum',

This comment has been minimized.

@jreback

jreback Feb 1, 2018

Contributor

hmm, shouldn't this be nansum?

This comment has been minimized.

@topper-123

topper-123 Feb 1, 2018

Contributor

Yes. fixed.

@topper-123

This comment has been minimized.

Contributor

topper-123 commented Feb 1, 2018

all green.

@jorisvandenbossche jorisvandenbossche merged commit 3597de0 into pandas-dev:master Feb 1, 2018

3 checks passed

ci/circleci Your tests passed on CircleCI!
Details
continuous-integration/appveyor/pr AppVeyor build succeeded
Details
continuous-integration/travis-ci/pr The Travis CI build passed
Details
@jorisvandenbossche

This comment has been minimized.

Member

jorisvandenbossche commented Feb 1, 2018

@topper-123 Thanks a lot for this!

@topper-123 topper-123 deleted the topper-123:remove_pd_stats branch Feb 1, 2018

harisbal pushed a commit to harisbal/pandas that referenced this pull request Feb 28, 2018

DEPR/CLN: Remove pd.rolling_*, pd.expanding* and pd.ewm* (pandas-dev#…
…18723)

* remove pd.running_*, pd.expanding_* and pd.ewm* and related code

* added test_expanding_func and test_expanding_apply

* recreate _check_ndarray inline in _check_moment_func
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment