Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Ensure that iq and all number feature preprocessing normalizations work on dask backends. #3297

Merged
merged 3 commits into from
Apr 4, 2023

Conversation

justinxzhao
Copy link
Collaborator

iq preprocessing for number features fails when using a ray/dask backend, with the following error:

RuntimeError: Caught exception during model preprocessing: 'numpy.float64' object has no attribute 'compute' (type: RayTaskError(RuntimeError), retryable: true)

The InterQuartileTransformer uses np.percentile, which computes the percentile directly without deferring to the dask backend to execute the computation. This seems like the right thing to do as it looks like unlike column.mean(), dask does not have a parallelized implementation of np.percentile.

This PR removes the call to .compute() in InterQuartileTransformer.fit_transform_params(), and adds tests to make sure that all preprocessing options in the registry succeed on both pandas and ray/dask backends.

…transformer about why we compute percentiles directly, without using the backend.
@github-actions
Copy link

Unit Test Results

    6 files  ±0      6 suites  ±0   1h 50m 7s ⏱️ - 2m 23s
158 tests +7  141 ✔️ +3  17 💤 +4  0 ±0 
198 runs  +7  173 ✔️ +3  25 💤 +4  0 ±0 

Results for commit aba970b. ± Comparison against base commit 87a56fa.

@justinxzhao justinxzhao merged commit db99e3c into master Apr 4, 2023
@justinxzhao justinxzhao deleted the fix_iq_dask branch April 4, 2023 19:54
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants