Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

PERF: interpolate_1d returns function to apply columnwise #34728

Closed

Conversation

simonjayhawkins
Copy link
Member

This doesn't provide a significant improvement for the existing asv due to the bulk of the time creating python sets which is in the function applied columnwise. see #34727

even without #34727 this could provide perf improvements for other index types ( and for unsorted indexes with creating a column-wise function for the numpy call.

will look at adding asvs for these cases.

@simonjayhawkins simonjayhawkins added Missing-data np.nan, pd.NaT, pd.NA, dropna, isnull, interpolate Algos Non-arithmetic algos: value_counts, factorize, sorting, isin, clip, shift, diff labels Jun 12, 2020
@simonjayhawkins simonjayhawkins marked this pull request as draft June 12, 2020 08:32
order=order,
**kwargs,
)
def func(yvalues: np.ndarray) -> np.ndarray:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you make this a module level function and give it a nice name

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

doc-string as well

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it needs to be a closure

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

really? can you pass any needed arguments, this makes it really hard to grok

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the code in this function is applied columnwise, the perf gains will come from doing some things once instead of 100x for the asv.

I prefer this functional approach to pre-validation as it keeps related code together. There is some more cleaning which could move more into the outer function once it's called less often

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I understand, but having this an outer function with passing args should not have any impact on performance. It simply a more understandable approach. ok with merging then refactoring to be simpler later though.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

its basically the processing of the index which only needs to be done once. However, with the bulk of the time on the preserve_nans set logic, there is no sig perf gains yet, hence draft for now.

originally grouped the validation see #34628. So can do it that way if preferred.

i.e. method, xvalues = missing.clean_interp_method(method, index, **kwargs)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

kk sure, i think you have 1 PR pending that we should merge before this i think. but lmk what you prefer.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah, trying not to affect the work by @cchwala, I have an alternative implementation of limit_direction that does not re-use the preserve_nans set logic, see #34628

so i'm now happy #34727 need not affect that, but still need to look at the max_gap algo

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sorry #34749 not #34628

@simonjayhawkins
Copy link
Member Author

I'll close this to clear the queue for now. Some deeper changes now planned so will reopen once complete.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Algos Non-arithmetic algos: value_counts, factorize, sorting, isin, clip, shift, diff Missing-data np.nan, pd.NaT, pd.NA, dropna, isnull, interpolate
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants