Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ENH-19629: Adding numpy nansun/nanmean, etc etc to _cython_table #19670

Conversation

AaronCritchley
Copy link
Contributor

@AaronCritchley AaronCritchley commented Feb 13, 2018

As per the issue, here's proof of solution:

Python 3.6.4 |Anaconda, Inc.| (default, Jan 16 2018, 12:04:33) 
[GCC 4.2.1 Compatible Clang 4.0.1 (tags/RELEASE_401/final)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> import pandas as pd
>>> import numpy as np
>>> pd.Series([1,2,3,4]).agg(np.sum)
10
>>> pd.Series([1,2,3,4]).agg(np.nansum)
10

Where should I add tests for these changes? I wasn't sure on the best fit and how to most effectively test. Also, is a whatsnew entry needed here? If yes, any guidance on what it should be?

@TomAugspurger
Copy link
Contributor

Best is probably pandas/tests/test_nanops.py.

A whatsnew entry under bug fixes would be good, under the "Numeric" bug fixes section, saying that

:meth:`~DataFrame.agg` now correctly handles numpy NaN-aware methods like :meth:`numpy.nansum` (:issue:`19629`)

@TomAugspurger TomAugspurger added Missing-data np.nan, pd.NaT, pd.NA, dropna, isnull, interpolate Numeric Operations Arithmetic, Comparison, and Logical operations labels Feb 13, 2018
@TomAugspurger TomAugspurger added this to the 0.23.0 milestone Feb 13, 2018
@pep8speaks
Copy link

pep8speaks commented Feb 13, 2018

Hello @AaronCritchley! Thanks for updating the PR.

Cheers ! There are no PEP8 issues in this Pull Request. 🍻

Comment last updated on April 25, 2018 at 11:23 Hours UTC

@codecov
Copy link

codecov bot commented Feb 13, 2018

Codecov Report

Merging #19670 into master will increase coverage by 0.06%.
The diff coverage is 100%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master   #19670      +/-   ##
==========================================
+ Coverage   91.77%   91.84%   +0.06%     
==========================================
  Files         153      153              
  Lines       49257    49300      +43     
==========================================
+ Hits        45207    45279      +72     
+ Misses       4050     4021      -29
Flag Coverage Δ
#multiple 90.23% <100%> (+0.06%) ⬆️
#single 41.9% <100%> (+0.02%) ⬆️
Impacted Files Coverage Δ
pandas/core/base.py 96.83% <100%> (+0.03%) ⬆️
pandas/util/testing.py 84.59% <0%> (-0.21%) ⬇️
pandas/core/internals.py 95.53% <0%> (-0.05%) ⬇️
pandas/core/indexes/datetimes.py 95.73% <0%> (-0.04%) ⬇️
pandas/core/accessor.py 98.7% <0%> (-0.02%) ⬇️
pandas/core/dtypes/concat.py 99.16% <0%> (-0.02%) ⬇️
pandas/core/strings.py 98.32% <0%> (-0.02%) ⬇️
pandas/core/indexes/multi.py 95.06% <0%> (-0.02%) ⬇️
pandas/core/window.py 96.29% <0%> (-0.01%) ⬇️
pandas/core/frame.py 97.16% <0%> (-0.01%) ⬇️
... and 13 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 7ec74e5...c9ddaff. Read the comment docs.

@AaronCritchley
Copy link
Contributor Author

Hey @TomAugspurger - thanks so much for your help. Are the tests I've put in OK or would you rather me be more exhaustive?

I considered adding in some tests with np.nan values but in some functions, like np.nancumsum, that changes the output and I thought it would be messy adding in np.nan tests on some, but not all functions.

@@ -1004,6 +1004,95 @@ def prng(self):
return np.random.RandomState(1234)


class TestNumpyNaNFunctions(object):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this can be a single function that is parameterized over all of the methods

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, will do this soon - could you help with why the build is failing?
From looking at CircleCI it seems like it's not recognizing that np.nanprod is valid, do I need to remove the nanprod case for compat or something? Sorry if I'm being dense here.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you might need to skip for older versions of numpy

not sure when certain ones were added

@jreback jreback removed this from the 0.23.0 milestone Feb 15, 2018
@topper-123
Copy link
Contributor

topper-123 commented Feb 19, 2018

np.nanprod was added in numpy 1.10, while pandas supports numpy 1.9.

So in the parametrization you need to add a skipif part wrt. np.nanprod. See here for an example.

}

# np.nanprod was added in np version 1.10.0, we currently support >= 1.9
try:
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you have a preferred implementation for this, let me know and I'll happily change, explicitly checking version seemed ugly 😄

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you can just check _np_version_under1p10 and add it conditionally

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Awesome, didn't know this was a thing, thank you

@AaronCritchley
Copy link
Contributor Author

AaronCritchley commented Mar 2, 2018

@topper-123 thank you so much - I ended up changing the implementation rather than just xfailing the test, as the failure in the build was being hit in a non-test scenario, if you have better suggestions I'm happy to take them on!

(np.cumsum, np.nancumsum)
]

def test_np_nan_functions(self):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

parametrize these

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My bad, didn't realise you meant pytest.mark.parametrize, implemented your suggestion now 😄

data.agg(nan_method),
check_exact=True)

@pytest.mark.parametrize("standard, nan_method", [
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

u can parametrize over df and series (or make into fixtures)

then can inline the compare function

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure on a good way to handle the np compat tests if we took this approach, I'd be open to suggestions though

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@pytest.fixture(params=[Series(....), DataFrame()])
def obj(request):
    return request.param

@jreback
Copy link
Contributor

jreback commented Mar 7, 2018

can you rebase

@TomAugspurger
Copy link
Contributor

@AaronCritchley are these failures related to your changes? https://travis-ci.org/pandas-dev/pandas/jobs/351535340#L2351

@AaronCritchley
Copy link
Contributor Author

Hey @TomAugspurger, yep, I believe they are, I need to figure out the issue as everything passes locally. Open to suggestions if it's super obvious to you but happy to dive further if not. Seems like different behaviour in np.nanmax in that particular ci configuration.

Also trying to get in the suggestions made by Jeff around using fixtures for the series / df once I've figured it out the above 😄

@AaronCritchley
Copy link
Contributor Author

Refactored to make use of fixtures and rebased, still need to dig into the 3.5 build failure and figure out what's happening, help appreciated 😄

])
def test_np_nan_functions(standard, nan_method, nan_test_object):
_compare_nan_method_output(nan_test_object, standard, nan_method)
_compare_nan_method_output(nan_test_object, standard, nan_method)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are these two lines identical?

I think it'd be clearly to just write out the tm.assert_almost_equal(nan_test_object.agg(standard), nan_test_object.agg(nan_method))

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep, this was me being silly, fixed!

@jreback
Copy link
Contributor

jreback commented Mar 22, 2018

looks fine. ping on green.

@jreback
Copy link
Contributor

jreback commented Apr 11, 2018

can you rebase

@jreback
Copy link
Contributor

jreback commented Apr 21, 2018

can you rebase

@AaronCritchley
Copy link
Contributor Author

AaronCritchley commented Apr 25, 2018

Hey @jreback, I've rebased, the CI pipeline is failing a single test on a single env - I'm looking into it but was having a hard time recreating the failed test. If it's anything obvious help would be much appreciated, I'll continue to dig and try to figure it out.

EDIT: I can recreate the test now, which is good, but still haven't figured out the cause, will update if I find anything.

@AaronCritchley
Copy link
Contributor Author

Going to close this in case any other contributors are able to pick it up, if it doesn't get picked up I'll try to fix again at a later date 😄

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Missing-data np.nan, pd.NaT, pd.NA, dropna, isnull, interpolate Numeric Operations Arithmetic, Comparison, and Logical operations
Projects
None yet
Development

Successfully merging this pull request may close these issues.

series.agg(np.nansum) etc. returns a series, scalar expected
5 participants