Optimize dtype conversion for FIL #4070

dantegd · 2021-07-19T23:02:45Z

After doing a little bit of profiling for the slowness we observed for FIL when data had to be converted from int16 to float32 I found out that ~98% of the time was being spent in the check of whether there would be information lost to under/overflows in

cuml/python/cuml/common/input_utils.py

Line 590 in 3c11ebd

return ((X < target_dtype_range.min) |

Created this proposal PR to add a parameter to skip that check in methods that are fast enough to benefit from fast dtype conversion like FIL as well as skip the check when upcasting.

In my first quick benchmarks the dtype conversion was between 4x~14x faster when disabling the checks in some cases. PR is still in progress, but figured I'd ping you in case you wanted to take a look

levsnv · 2021-07-20T02:17:01Z

nice! much better than current state

canonizer · 2021-07-22T16:13:28Z

Thanks for looking into this!

Thinking further: I think the conversion (implemented on the GPU) can be fast even with the accuracy check enabled.

wphicks

Looks great! Very nice improvement. Just a grammatical issue in 2 places in the docstrings, but otherwise everything looks perfect to me.

python/cuml/fil/fil.pyx

Co-authored-by: William Hicks <wphicks@users.noreply.github.com>

dantegd · 2021-07-28T15:08:44Z

rerun tests

cjnolet

LGTM

dantegd · 2021-07-28T16:53:29Z

@gpucibot merge

codecov-commenter · 2021-07-28T18:24:23Z

Codecov Report

❗ No coverage uploaded for pull request base (branch-21.08@3c11ebd). Click here to learn what that means.
The diff coverage is n/a.

@@               Coverage Diff               @@
##             branch-21.08    #4070   +/-   ##
===============================================
  Coverage                ?   85.83%           
===============================================
  Files                   ?      231           
  Lines                   ?    18272           
  Branches                ?        0           
===============================================
  Hits                    ?    15684           
  Misses                  ?     2588           
  Partials                ?        0

Flag	Coverage Δ
dask	`48.20% <0.00%> (?)`
non-dask	`78.31% <0.00%> (?)`

Flags with carried forward coverage won't be shown. Click here to find out more.

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 3c11ebd...5ac3cb8. Read the comment docs.

@canonizer

cc @canonizer @levsnv @wphicks After doing a little bit of profiling for the slowness we observed for FIL when data had to be converted from int16 to float32 I found out that ~98% of the time was being spent in the check of whether there would be information lost to under/overflows in https://github.com/rapidsai/cuml/blob/c3c4376ab8a3af3e0c92aa2c9ae5d8b8fe116b8c/python/cuml/common/input_utils.py#L590 Created this proposal PR to add a parameter to skip that check in methods that are fast enough to benefit from fast dtype conversion like FIL as well as skip the check when upcasting. In my first quick benchmarks the dtype conversion was between **4x~14x** faster when disabling the checks in some cases. PR is still in progress, but figured I'd ping you in case you wanted to take a look Authors: - Dante Gama Dessavre (https://github.com/dantegd) Approvers: - William Hicks (https://github.com/wphicks) - Corey J. Nolet (https://github.com/cjnolet) URL: rapidsai#4070

ENH Improve typecasting information losing check to avoid slowdown

dfd00d0

dantegd added the 2 - In Progress Currenty a work in progress label Jul 19, 2021

dantegd added this to PR-WIP in v21.08 Release via automation Jul 19, 2021

github-actions bot added the Cython / Python Cython or Python issue label Jul 19, 2021

dantegd added improvement Improvement / enhancement to an existing function non-breaking Non-breaking change labels Jul 19, 2021

ENH Expose parameter to user API and improve name

ae13f35

dantegd changed the title ~~Optimize dtype conversion~~ Optimize dtype conversion for FIL Jul 28, 2021

dantegd marked this pull request as ready for review July 28, 2021 01:47

dantegd requested a review from a team as a code owner July 28, 2021 01:47

DOC docstring correction

3677251

wphicks approved these changes Jul 28, 2021

View reviewed changes

python/cuml/fil/fil.pyx Outdated Show resolved Hide resolved

python/cuml/fil/fil.pyx Outdated Show resolved Hide resolved

dantegd changed the title ~~Optimize dtype conversion for FIL~~ Optimize dtype conversion for FIL [skip-ci] Jul 28, 2021

dantegd and others added 2 commits July 28, 2021 10:04

Update python/cuml/fil/fil.pyx

426b145

Co-authored-by: William Hicks <wphicks@users.noreply.github.com>

DOC docstring improvements from PR review

5ac3cb8

dantegd changed the title ~~Optimize dtype conversion for FIL [skip-ci]~~ Optimize dtype conversion for FIL Jul 28, 2021

cjnolet approved these changes Jul 28, 2021

View reviewed changes

v21.08 Release automation moved this from PR-WIP to PR-Reviewer approved Jul 28, 2021

rapids-bot bot merged commit f2e4652 into rapidsai:branch-21.08 Jul 28, 2021

v21.08 Release automation moved this from PR-Reviewer approved to Done Jul 28, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Optimize dtype conversion for FIL #4070

Optimize dtype conversion for FIL #4070

dantegd commented Jul 19, 2021 •

edited

levsnv commented Jul 20, 2021

canonizer commented Jul 22, 2021

wphicks left a comment

dantegd commented Jul 28, 2021

cjnolet left a comment

dantegd commented Jul 28, 2021

codecov-commenter commented Jul 28, 2021

Optimize dtype conversion for FIL #4070

Optimize dtype conversion for FIL #4070

Conversation

dantegd commented Jul 19, 2021 • edited

levsnv commented Jul 20, 2021

canonizer commented Jul 22, 2021

wphicks left a comment

Choose a reason for hiding this comment

dantegd commented Jul 28, 2021

cjnolet left a comment

Choose a reason for hiding this comment

dantegd commented Jul 28, 2021

codecov-commenter commented Jul 28, 2021

Codecov Report

dantegd commented Jul 19, 2021 •

edited