Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

BUG: quantile segfaults on invalid quantile values #27473

Closed
wants to merge 4 commits into from
Closed

BUG: quantile segfaults on invalid quantile values #27473

wants to merge 4 commits into from

Conversation

ghost
Copy link

@ghost ghost commented Jul 19, 2019

@ghost ghost changed the title BUG: quantile segfaults on invalid quantil values BUG: quantile segfaults on invalid quantile values Jul 19, 2019
@jbrockmendel
Copy link
Member

fixes this here makes sense. is the underlying problem in numpy or in one of our cython functions?

@jbrockmendel jbrockmendel added the Segfault Non-Recoverable Error label Jul 19, 2019
Copy link
Member

@WillAyd WillAyd left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We do a similar validation for NDFrame input via the _check_percentile method - if we were to move that to PandasObject to share across NDFrame and GroupBy would that be even simpler?

pandas/tests/groupby/test_transform.py Outdated Show resolved Hide resolved
@WillAyd WillAyd added this to the 0.25.1 milestone Jul 19, 2019
@ghost
Copy link
Author

ghost commented Jul 19, 2019

is the underlying problem in numpy or in one of our cython functions?

I think this line is the problem
https://github.com/pydata/pandas/blob/d48306e77c0a708e0ee33a2aa1da7e267df52ef6/pandas/_libs/groupby.pyx#L758-L759

it calculates an index by using the quantile value as a fraction of array length.

@ghost
Copy link
Author

ghost commented Jul 19, 2019

if we were to move that to PandasObject to share across NDFrame and GroupBy would that be even
simpler?

Yeah, _check_percentile is what this does. I don't think it fits in PandasObject, maybe in lib.pyx instead? it lives on NDFrame but is actually a standalone static method.

@jbrockmendel
Copy link
Member

Let's put the check in the cython function, so as to preclude any chance of the segfault showing up, e.g. if we call it from somewhere else

@ghost
Copy link
Author

ghost commented Jul 19, 2019

Logical, but thorny. Is the overall approach in pandas to put input validation in cython in the python caller? could also usea python wrapper which is the only cython caller.

@WillAyd suggests reusing NDFrame._check_percentiles instead. It's often inconsistent standards about where validation should happen that creates these sort of bugs, so maybe a bigger issue here.

Whatever there's agreement on is fine by me.

@jbrockmendel
Copy link
Member

but thorny

what is thorny about it? I haven't looked at this too specifically, just in principle would rather segfault-prevention happen as close to the fault as possible

pandas/core/groupby/groupby.py Outdated Show resolved Hide resolved
@ghost
Copy link
Author

ghost commented Jul 23, 2019

blocked by #27526

@ghost
Copy link
Author

ghost commented Jul 25, 2019

waiting for #27584

@jreback jreback mentioned this pull request Jul 25, 2019
5 tasks
@ghost
Copy link
Author

ghost commented Jul 31, 2019

This was a simple fix, but it now depends on too many other small things. I'm sure someone else will get it eventually.

@ghost ghost closed this Jul 31, 2019
@ghost ghost deleted the fix_quantile_Segfault branch July 31, 2019 14:18
This pull request was closed.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Groupby Segfault Non-Recoverable Error
Projects
None yet
Development

Successfully merging this pull request may close these issues.

BUG: Groupby.quantile segfaults for quantile > 1
3 participants