Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

BUG: DataFrame[td64].sum(skipna=False) #37148

Merged
merged 8 commits into from
Oct 24, 2020

Conversation

jbrockmendel
Copy link
Member

The same underlying problem affects mean, so this fixes that too.

@jbrockmendel jbrockmendel added Bug Reduction Operations sum, mean, min, max, etc. labels Oct 16, 2020
Copy link
Contributor

@jreback jreback left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think need to respect min_count in sum

return the_sum


def mask_datetimelike_result(result, axis, mask, orig_dtype):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you type

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

will do

pandas/tests/test_nanops.py Show resolved Hide resolved
@jbrockmendel
Copy link
Member Author

i think roughly the same fix will end up being used for #36907

return the_sum


def _mask_datetimelike_result(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why don;t you have this take the original values (rather than just the dtype) and compute the mask if needed (e.g. make it optional), rn the caller is responsible for that in multiple places.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if we get to here, we always need the mask

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you didn't answer the question. you are adding multiple code blocks which do the same thing; if you are going to consolidate to a function then it makes sense to avoid that yes?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

apparently i dont understand the question. IIUC the alternative you're suggesting looks like

def _mask_datetimelike_result(result, axis, mask, orig_values):
    if mask is None:
        mask = isna(orig_values)
    [what we have here now]

and remove the if mask is None and not skipna: mask = isna(orig_values) on L516-517. This is 2 lines of code either way, so not a big deal. ill change it if you really care.

longer-term this should probably go into _get_values, but i want to do that carefully since that may affect other functions

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok i guess - i am thinking u r going to refactor this anyhow as this adds a fair amount of duplication

return _wrap_results(ret, dtype)

# otherwise return a scalar value
return _wrap_results(get_median(values) if notempty else np.nan, dtype)


def get_empty_reduction_result(shape, axis: int, dtype, fill_value) -> np.ndarray:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you type dtype, fill_value

"""
The result from a reduction on an empty ndarray.
"""
shp = np.array(shape)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you add Parametes to the doc-string

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sure, just pushed

@jreback jreback added this to the 1.2 milestone Oct 18, 2020
@jbrockmendel
Copy link
Member Author

updated per requests + green. several followups in the pipeline

@jbrockmendel
Copy link
Member Author

gentle ping; id like to re-use the helpers implemented here in PR(s) fixing other reductions

@jreback
Copy link
Contributor

jreback commented Oct 24, 2020

i guess this needs a whats new note but can be a follow on

@jreback jreback merged commit 8610299 into pandas-dev:master Oct 24, 2020
@jbrockmendel jbrockmendel deleted the bug-nanops branch October 24, 2020 03:08
JulianWgs pushed a commit to JulianWgs/pandas that referenced this pull request Oct 26, 2020
* BUG: DataFrame[td64].sum(skipna=False)

* annotate, privatize

* annotate

* calculate mask in mask_datetimelike_result
kesmit13 pushed a commit to kesmit13/pandas that referenced this pull request Nov 2, 2020
* BUG: DataFrame[td64].sum(skipna=False)

* annotate, privatize

* annotate

* calculate mask in mask_datetimelike_result
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Bug Reduction Operations sum, mean, min, max, etc.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

BUG: Summation of NaT in a DataFrame with axis=1 does not return NaT
2 participants