Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

implement+test mean for datetimelike EA/Index/Series #24757

Merged
merged 44 commits into from Jun 10, 2019

Conversation

@jbrockmendel
Copy link
Member

commented Jan 13, 2019

This takes over from #23890, implementing+testing reductions one at a time.

  • closes #xxxx
  • tests added / passed
  • passes git diff upstream/master -u -- "*.py" | flake8 --diff
  • whatsnew entry
@jreback

This comment has been minimized.

Copy link
Contributor

commented Jan 13, 2019

we are still treating datetimes as non-numeric yes? meaning that by default they will be exlcuded from say .mean() and you have to pass numeric_only=False?

@jreback
Copy link
Contributor

left a comment

can you see if we have coverage of this on mixed dtype frames as well? (see my comment about numeric_only). Also need to assert that other ops automatically exclude (e.g. std).

@codecov

This comment has been minimized.

Copy link

commented Jan 14, 2019

Codecov Report

Merging #24757 into master will increase coverage by <.01%.
The diff coverage is 100%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master   #24757      +/-   ##
==========================================
+ Coverage   92.38%   92.38%   +<.01%     
==========================================
  Files         166      166              
  Lines       52363    52378      +15     
==========================================
+ Hits        48376    48391      +15     
  Misses       3987     3987
Flag Coverage Δ
#multiple 90.81% <100%> (ø) ⬆️
#single 42.91% <20%> (-0.01%) ⬇️
Impacted Files Coverage Δ
pandas/core/arrays/datetimelike.py 97.72% <100%> (+0.04%) ⬆️
pandas/core/indexes/datetimelike.py 98.52% <100%> (ø) ⬆️
pandas/core/series.py 93.69% <100%> (+0.01%) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 453fa85...30eeb64. Read the comment docs.

@codecov

This comment has been minimized.

Copy link

commented Jan 14, 2019

Codecov Report

Merging #24757 into master will decrease coverage by <.01%.
The diff coverage is 100%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master   #24757      +/-   ##
==========================================
- Coverage   91.88%   91.87%   -0.01%     
==========================================
  Files         174      174              
  Lines       50692    50708      +16     
==========================================
+ Hits        46576    46588      +12     
- Misses       4116     4120       +4
Flag Coverage Δ
#multiple 90.41% <100%> (ø) ⬆️
#single 41.79% <17.64%> (-0.1%) ⬇️
Impacted Files Coverage Δ
pandas/core/indexes/datetimes.py 96.36% <ø> (ø) ⬆️
pandas/core/indexes/timedeltas.py 90.96% <ø> (ø) ⬆️
pandas/core/arrays/datetimelike.py 97.93% <100%> (+0.04%) ⬆️
pandas/core/indexes/datetimelike.py 98.14% <100%> (ø) ⬆️
pandas/core/series.py 93.62% <100%> (+0.01%) ⬆️
pandas/io/gbq.py 78.94% <0%> (-10.53%) ⬇️
pandas/core/frame.py 97% <0%> (-0.12%) ⬇️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 8d124ea...111c345. Read the comment docs.

@jreback
Copy link
Contributor

left a comment

need the tests from the OP (e.g. a mixed DataFrame)

pandas/core/series.py Show resolved Hide resolved
@TomAugspurger

This comment has been minimized.

Copy link
Contributor

commented Jan 14, 2019

we are still treating datetimes as non-numeric yes? meaning that by default they will be exlcuded from say .mean() and you have to pass numeric_only=False?

I think that makes the most sense.

@TomAugspurger

This comment has been minimized.

Copy link
Contributor

commented Jan 14, 2019

Does this close #24752?

@jbrockmendel

This comment has been minimized.

Copy link
Member Author

commented Jan 14, 2019

Does this close #24752?

I don't think so, will need to double-check.

@jbrockmendel

This comment has been minimized.

Copy link
Member Author

commented Jan 14, 2019

No, no it does not.

pandas/core/series.py Show resolved Hide resolved
@@ -47,6 +47,27 @@ def test_ops_properties_basic(self):
assert s.day == 10
pytest.raises(AttributeError, lambda: s.weekday)

# TODO: figure out where in tests.reductions this belongs
def test_mean(self, tz_naive_fixture):

This comment has been minimized.

Copy link
@jreback

jreback Jan 16, 2019

Contributor

seems very duplicative of the test below, is there a reason?

This comment has been minimized.

Copy link
@jbrockmendel

jbrockmendel Jan 16, 2019

Author Member

bc at the time when this was written Index reduction tests had not been collected in tests.reductions. Will move/de-duplicate.

@jbrockmendel

This comment has been minimized.

Copy link
Member Author

commented Jan 16, 2019

I guess these have to be uniform periods so mean is just another period, though I think this could easily be a non-sensical return values, e.g. what is the mean of pd.date_range('2012', freq='M', periods=4)

Eyeballing it at about 2012-03-14. Why would that be nonsensical?

@jreback

This comment has been minimized.

Copy link
Contributor

commented Jan 16, 2019

it’s no longer a M freq Period. my point is the mean if a span is an odd thing

@jorisvandenbossche
Copy link
Member

left a comment

To repeat what I said before: to move forward on this, I would suggest to simply remove the axis keyword in this PR. We can always add it later when if we actually want fix the numpy compatibility, or need it for another reason.

if is_period_dtype(self):
# See discussion in GH#24757
raise TypeError(
"mean is not implemented for {cls} since the meaning may be "

This comment has been minimized.

Copy link
@jorisvandenbossche

jorisvandenbossche Apr 5, 2019

Member
Suggested change
"mean is not implemented for {cls} since the meaning may be "
"mean is not implemented for {cls} since the meaning is "

This comment has been minimized.

Copy link
@jbrockmendel

jbrockmendel Apr 12, 2019

Author Member

The thing here is that it isn't ambiguous for Tick-frequencies, so "may be" is more accurate.

This comment has been minimized.

Copy link
@jorisvandenbossche

jorisvandenbossche Apr 12, 2019

Member

What is the unambiguous result of the mean of this?

In [12]: idx = pd.period_range('2012-01-01', periods=4, freq='1s')

In [13]: idx
Out[13]: 
PeriodIndex(['2012-01-01 00:00:00', '2012-01-01 00:00:01',
             '2012-01-01 00:00:02', '2012-01-01 00:00:03'],
            dtype='period[S]', freq='S')

This comment has been minimized.

Copy link
@jbrockmendel

jbrockmendel Apr 12, 2019

Author Member

Period('2012-01-01 00:00:01', freq='S'). We already discussed rounding error in this thread.

This comment has been minimized.

Copy link
@jorisvandenbossche

jorisvandenbossche Apr 13, 2019

Member

We mainly discussed that for datetime64, where I am fine with that.
But for Period, you implicitly used the start_times for taking the mean, while you could also have taken the end_times. So I still find this ambiguous.

(anyway, I think we agree to disallow it all-together for all Periods, so this is only about the error message)

@jorisvandenbossche

This comment has been minimized.

Copy link
Member

commented Apr 13, 2019

@jbrockmendel instead of keeping rebasing this, can you actually answer to my request for changes? (unless it is just lack of time that you didn't make any changes yet, that's no problem)

@jreback

This comment has been minimized.

Copy link
Contributor

commented Apr 20, 2019

can you merge master and fix the conflicts.

@jorisvandenbossche this has been outstanding a while, any remaining issues?

@jreback

This comment has been minimized.

Copy link
Contributor

commented Apr 20, 2019

@jbrockmendel also pls respond to open questions from @jorisvandenbossche

@@ -1435,6 +1435,50 @@ def max(self, axis=None, skipna=True, *args, **kwargs):
# Don't have to worry about NA `result`, since no NA went in.
return self._box_func(result)

def mean(self, axis=None, skipna=True):

This comment has been minimized.

Copy link
@jreback

jreback Apr 20, 2019

Contributor

don'e we also need out=? meaning we should just accept kwargs like we do for Series.mean

Parameters
----------
axis : None
Dummy parameter to match NumPy signature

This comment has been minimized.

Copy link
@jreback

jreback Apr 20, 2019

Contributor

would just add kwargs for this purpose; we don't add them explicity anywhere else

@jreback

This comment has been minimized.

Copy link
Contributor

commented May 12, 2019

can you merge master; this looks fine otherwise, though I think @jorisvandenbossche had a question.

@TomAugspurger
Copy link
Contributor

left a comment

This seems good for now. There's still an outstanding discussion about accepting additional arguments so that methods like np.mean work correctly. These are

  • axis
  • dtype
  • out
  • keepdims (optional)

but I'm fine with leaving that as a followup when someone wants it.

@jorisvandenbossche

This comment has been minimized.

Copy link
Member

commented May 14, 2019

Given the lack of response of @jbrockmendel, I added a commit that removes the axis keyword (single commit, so can be easily removed again if that is a problem).
Rationale: since we don't do any compat-handling of numpy keyword arguments, let's also not do it for axis. We can always add that later.

Fine with merging it like this.

@jreback

This comment has been minimized.

Copy link
Contributor

commented May 14, 2019

ok, pls merge on green. @jorisvandenbossche I guess worthwile to create a followup issue.

@jbrockmendel

This comment has been minimized.

Copy link
Member Author

commented May 14, 2019

Given the lack of response of @jbrockmendel,

My bad. Starting next month I'm working FT on pandas, so until then I'm focusing on statsmodels and spending time outdoors. Will rebase momentarily.

@jreback

This comment has been minimized.

Copy link
Contributor

commented May 16, 2019

I think this will pass if you merge master?

@jorisvandenbossche

This comment has been minimized.

Copy link
Member

commented May 16, 2019

until then I'm focusing on statsmodels and spending time outdoors.

Enjoy your time off!


Failures were actually real ones: pushed fixes for a problem with the axis (that I caused) and a docstring lint error

@jreback

This comment has been minimized.

Copy link
Contributor

commented Jun 3, 2019

@jbrockmendel one more merge master pls.

@jbrockmendel

This comment has been minimized.

Copy link
Member Author

commented Jun 10, 2019

gentle ping

@jreback jreback merged commit efc7f2f into pandas-dev:master Jun 10, 2019

11 checks passed

continuous-integration/travis-ci/pr The Travis CI build passed
Details
pandas-dev.pandas Build #20190603.41 succeeded
Details
pandas-dev.pandas (Checks) Checks succeeded
Details
pandas-dev.pandas (Linux py35_compat) Linux py35_compat succeeded
Details
pandas-dev.pandas (Linux py36_locale_slow) Linux py36_locale_slow succeeded
Details
pandas-dev.pandas (Linux py36_locale_slow_old_np) Linux py36_locale_slow_old_np succeeded
Details
pandas-dev.pandas (Linux py37_locale) Linux py37_locale succeeded
Details
pandas-dev.pandas (Linux py37_np_dev) Linux py37_np_dev succeeded
Details
pandas-dev.pandas (Windows py36_np15) Windows py36_np15 succeeded
Details
pandas-dev.pandas (Windows py37_np141) Windows py37_np141 succeeded
Details
pandas-dev.pandas (macOS py35_macos) macOS py35_macos succeeded
Details
@jreback

This comment has been minimized.

Copy link
Contributor

commented Jun 10, 2019

thanks @jbrockmendel

@jbrockmendel jbrockmendel deleted the jbrockmendel:means branch Jun 27, 2019

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
6 participants
You can’t perform that action at this time.