Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DEPR: Series.first() and DataFrame.first() #53419

Merged
merged 10 commits into from
Jun 1, 2023

Conversation

DeaMariaLeon
Copy link
Member

I don't know which "whatsnew" I need to edit (which version)

Copy link
Member

@MarcoGorelli MarcoGorelli left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nice one! just got some minor comments

also, we'll need a whatsnew note for 2.1.0

@@ -9162,6 +9162,12 @@ def first(self, offset) -> Self:
3 days observed in the dataset, and therefore data for 2018-04-13 was
not returned.
"""
# GH45908 & GH#52487
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think we need the issue numbers here

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks

@@ -9162,6 +9162,12 @@ def first(self, offset) -> Self:
3 days observed in the dataset, and therefore data for 2018-04-13 was
not returned.
"""
# GH45908 & GH#52487
warnings.warn(
"first is deprecated and will be removed in a future version",
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

shall we advise users what to do instead? something like "please create a mask and filter using .loc instead"

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes


class TestFirst:
def test_first_subset(self, frame_or_series):
ts = tm.makeTimeDataFrame(freq="12h")
ts = tm.get_obj(ts, frame_or_series)
result = ts.first("10d")
assert len(result) == 20
with tm.assert_produces_warning(FutureWarning, match=deprecated_msg):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

could we perhaps just do something like

        ts = tm.makeTimeDataFrame(freq="12h")
        ts = tm.get_obj(ts, frame_or_series)
        with tm.assert_produces_warning(FutureWarning, match=deprecated_msg):
            result = ts.first("10d")

        ts = tm.makeTimeDataFrame(freq="D")
        ts = tm.get_obj(ts, frame_or_series)
        with tm.assert_produces_warning(FutureWarning, match=deprecated_msg):
            result = ts.first("10d")

so that within each assert_produces_warning, there is just a single statement?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure 🫣

Comment on lines 47 to 49
with tm.assert_produces_warning(FutureWarning, match=deprecated_msg):
with pytest.raises(TypeError, match=msg): # index is not a DatetimeIndex
obj.first("1D")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is fine, though an alternative would be

with (
    tm.assert_produces_warning(FutureWarning, match=deprecated_msg),
    pytest.raises(TypeError, match=msg)
):
    obj.first('1D')

There's a ruff check for this https://beta.ruff.rs/docs/rules/multiple-with-statements/ , in case you wanted to make a separate issue about it and we can enable it globally in a separate PR

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I opened a new issue.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Imo this looks very ugly as soon as one of both statements gets longer. I would prefer keeping them separate for tests at least

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@phofl Separate like this you mean?:

with tm.assert_produces_warning(FutureWarning, match=deprecated_msg):
            with pytest.raises(TypeError, match=msg):  # index is not a DatetimeIndex
                obj.first("1D")

Comment on lines 83 to 86
expected = frame_or_series(
[1] * periods, index=bdate_range(start, periods=periods)
)
tm.assert_equal(result, expected)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

let's take this out of the with context

# GH#51032
df = DataFrame(index=pd.DatetimeIndex([]))
result = getattr(df, func)(offset=1)
result = getattr(df, "last")(offset=1)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if we're doing getattr with a literal, we may as well just write df.last

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, thanks

Comment on lines 30 to 31
ts = tm.makeTimeDataFrame(freq="D")
ts = tm.get_obj(ts, frame_or_series)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

doesn't look like this was in the original, could we keep it the same please (except for the with context)?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry - copied/pasted more lines than needed. I wasn't adding more tests, but still an error.. :(

Copy link
Member

@MarcoGorelli MarcoGorelli left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looks good to me, thanks @DeaMariaLeon ! leaving open a bit in case anyone has a comment, then will merge

EDIT: sorry there's some failing unit tests - I should've waited for all jobs to finish before approving

doc/source/whatsnew/v2.1.0.rst Outdated Show resolved Hide resolved
pandas/core/generic.py Outdated Show resolved Hide resolved
@MarcoGorelli MarcoGorelli added Deprecate Functionality to remove in pandas Frequency DateOffsets labels May 29, 2023
@MarcoGorelli MarcoGorelli added this to the 2.1 milestone May 29, 2023
@DeaMariaLeon
Copy link
Member Author

@MarcoGorelli I think the last failing test is not related to these changes.

I separated the test for first() on test_finalize.py thinking that it will make it easy to remove it when it is time.

Copy link
Member

@MarcoGorelli MarcoGorelli left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is close! we just have a test failure in the docstring validation CI job

================================== FAILURES ===================================
_________________ [doctest] pandas.core.generic.NDFrame.first __________________
9147         >>> ts
9148                     A
9149         2018-04-09  1
9150         2018-04-11  2
9151         2018-04-13  3
9152         2018-04-15  4
9153 
9154         Get the rows for the first 3 days:
9155 
9156         >>> ts.first('3D')
UNEXPECTED EXCEPTION: FutureWarning('first is deprecated and will be removed in a future version. Please create a mask and filter using `.loc` instead')
Traceback (most recent call last):
  File "/home/runner/micromamba/envs/test/lib/python3.10/doctest.py", line 1350, in __run
    exec(compile(example.source, filename, "single",
  File "<doctest pandas.core.generic.NDFrame.first[3]>", line 1, in <module>
  File "/home/runner/micromamba/envs/test/lib/python3.10/site-packages/pandas/core/generic.py", line 9165, in first
    warnings.warn(
FutureWarning: first is deprecated and will be removed in a future version. Please create a mask and filter using `.loc` instead

https://github.com/pandas-dev/pandas/actions/runs/5119510437/jobs/9204861758?pr=53419

Could you check what was done in the past to deal with docstrings of deprecated methods please? Then we can do something similar here

pd.DataFrame({"A": [1, 1, 1, 1]}, pd.date_range("2000", periods=4)),
],
)
def test_finalize_first(data):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nice! good call to split this into a separate test

@DeaMariaLeon
Copy link
Member Author

Friendly ping again @MarcoGorelli - It looks green

Copy link
Member

@MarcoGorelli MarcoGorelli left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nice one, thanks @DeaMariaLeon !

@MarcoGorelli
Copy link
Member

merging then, do you want to do a similar one for last?

@MarcoGorelli MarcoGorelli merged commit 3e40c3a into pandas-dev:main Jun 1, 2023
34 checks passed
@DeaMariaLeon
Copy link
Member Author

merging then, do you want to do a similar one for last?

Yes.. Thank you @MarcoGorelli.

@DeaMariaLeon DeaMariaLeon deleted the deprfirst branch June 2, 2023 07:38
topper-123 pushed a commit to topper-123/pandas that referenced this pull request Jun 5, 2023
* Deprecating first()

* Added requested changes

* removed repeated ts

* Update doc/source/whatsnew/v2.1.0.rst

* Update pandas/core/generic.py

* pre-commit

* Separated test for first() on test_finalize.py

* Avoiding docstring on CI

---------

Co-authored-by: Marco Edward Gorelli <marcogorelli@protonmail.com>
Co-authored-by: MarcoGorelli <33491632+MarcoGorelli@users.noreply.github.com>
Daquisu pushed a commit to Daquisu/pandas that referenced this pull request Jul 8, 2023
* Deprecating first()

* Added requested changes

* removed repeated ts

* Update doc/source/whatsnew/v2.1.0.rst

* Update pandas/core/generic.py

* pre-commit

* Separated test for first() on test_finalize.py

* Avoiding docstring on CI

---------

Co-authored-by: Marco Edward Gorelli <marcogorelli@protonmail.com>
Co-authored-by: MarcoGorelli <33491632+MarcoGorelli@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Deprecate Functionality to remove in pandas Frequency DateOffsets
Projects
None yet
Development

Successfully merging this pull request may close these issues.

BUG: DataFrame.first has unexpected behavior when passing a DateOffset
3 participants