Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fixed Large memory requirements for SimpleImputer strategy median #4794 #4817

Merged
merged 10 commits into from Aug 11, 2022

Conversation

erikrene
Copy link
Contributor

@erikrene erikrene commented Jul 18, 2022

I have implemented a fix for [BUG] Large memory requirements for SimpleImputer strategy median #4794. I narrowed down the issue to _masked_column_median. As expected, the extra memory results from the unnecessary copy of the array (in the case where NaN is the masked value). However, in the other case (where NaN isn't the masked value) this copy is necessary. To fix this, I used in-place sorting. However, in both cases the memory usage goes from 3000 MiB (size of original array) to 13000. From my understanding, sorting should only take up an additional 3000 MiB. Is it possible to reduce memory usage further? Still, this fix still reduces the memory used by over 5000 MiB.

@erikrene erikrene requested a review from a team as a code owner July 18, 2022 12:58
@github-actions github-actions bot added the Cython / Python Cython or Python issue label Jul 18, 2022
@wphicks wphicks added bug Something isn't working non-breaking Non-breaking change labels Jul 18, 2022
@dantegd dantegd added this to PR-WIP in v22.08 Release via automation Jul 18, 2022
@wphicks
Copy link
Contributor

wphicks commented Jul 20, 2022

rerun tests

3 similar comments
@wphicks
Copy link
Contributor

wphicks commented Jul 20, 2022

rerun tests

@erikrene
Copy link
Contributor Author

erikrene commented Aug 1, 2022

rerun tests

@wphicks
Copy link
Contributor

wphicks commented Aug 3, 2022

rerun tests

@codecov-commenter
Copy link

Codecov Report

Merging #4817 (f23b72c) into branch-22.08 (97941e3) will increase coverage by 0.67%.
The diff coverage is 100.00%.

@@               Coverage Diff                @@
##           branch-22.08    #4817      +/-   ##
================================================
+ Coverage         77.62%   78.30%   +0.67%     
================================================
  Files               180      180              
  Lines             11384    11526     +142     
================================================
+ Hits               8837     9025     +188     
+ Misses             2547     2501      -46     
Flag Coverage Δ
dask 45.93% <0.00%> (+0.41%) ⬆️
non-dask 67.68% <100.00%> (+0.42%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files Coverage Δ
python/cuml/thirdparty_adapters/adapters.py 91.60% <100.00%> (+0.11%) ⬆️
python/cuml/common/import_utils.py 59.82% <0.00%> (+0.85%) ⬆️
python/cuml/feature_extraction/_vectorizers.py 93.36% <0.00%> (+3.80%) ⬆️
.../dask/extended/linear_model/logistic_regression.py 92.00% <0.00%> (+57.33%) ⬆️

Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here.

python/cuml/thirdparty_adapters/adapters.py Outdated Show resolved Hide resolved
python/cuml/thirdparty_adapters/adapters.py Outdated Show resolved Hide resolved
v22.08 Release automation moved this from PR-WIP to PR-Reviewer approved Aug 4, 2022
@erikrene
Copy link
Contributor Author

erikrene commented Aug 4, 2022

rerun tests

2 similar comments
@erikrene
Copy link
Contributor Author

erikrene commented Aug 5, 2022

rerun tests

@erikrene
Copy link
Contributor Author

erikrene commented Aug 6, 2022

rerun tests

@erikrene erikrene changed the base branch from branch-22.08 to branch-22.10 August 6, 2022 00:12
@erikrene
Copy link
Contributor Author

erikrene commented Aug 8, 2022

rerun tests

@caryr35 caryr35 added this to PR-WIP in v22.10 Release via automation Aug 9, 2022
@caryr35 caryr35 moved this from PR-WIP to PR-Needs review in v22.10 Release Aug 9, 2022
@caryr35 caryr35 moved this from PR-Needs review to PR-Reviewer approved in v22.10 Release Aug 9, 2022
@caryr35 caryr35 removed this from PR-Reviewer approved in v22.08 Release Aug 9, 2022
@wphicks
Copy link
Contributor

wphicks commented Aug 11, 2022

@gpucibot merge

@rapids-bot rapids-bot bot merged commit 9418e65 into rapidsai:branch-22.10 Aug 11, 2022
v22.10 Release automation moved this from PR-Reviewer approved to Done Aug 11, 2022
jakirkham pushed a commit to jakirkham/cuml that referenced this pull request Feb 27, 2023
…idsai#4794 (rapidsai#4817)

I have implemented a fix for [BUG] Large memory requirements for SimpleImputer strategy median rapidsai#4794. I narrowed down the issue to _masked_column_median. As expected, the extra memory results from the unnecessary copy of the array (in the case where NaN is the masked value). However, in the other case (where NaN isn't the masked value) this copy is necessary. To fix this, I used in-place sorting. However, in both cases the memory usage goes from 3000 MiB (size of original array) to 13000. From my understanding, sorting should only take up an additional 3000 MiB. Is it possible to reduce memory usage further? Still, this fix still reduces the memory used by over 5000 MiB.

Authors:
  - https://github.com/erikrene

Approvers:
  - William Hicks (https://github.com/wphicks)

URL: rapidsai#4817
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working Cython / Python Cython or Python issue non-breaking Non-breaking change
Projects
No open projects
Development

Successfully merging this pull request may close these issues.

None yet

3 participants