Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix unnecessary f32/f64 conversions in t-SNE KL calc #4331

Merged
merged 1 commit into from
Nov 9, 2021

Conversation

zbjornson
Copy link
Contributor

The old code compiles to

cvt.f64.f32     %fd1, %f1;
add.f64         %fd2, %fd1, 0d3FF0000000000000;
mul.f64         %fd3, %fd2, 0d3FE0000000000000;
cvt.rn.f32.f64  %f2, %fd3;

instead of just

add.f32         %f2, %f1, 0f3F800000;
mul.f32         %f3, %f2, 0f3F000000;

The old code compiles to
cvt.f64.f32     %fd1, %f1;
add.f64         %fd2, %fd1, 0d3FF0000000000000;
mul.f64         %fd3, %fd2, 0d3FE0000000000000;
cvt.rn.f32.f64  %f2, %fd3;

instead of just
add.f32         %f2, %f1, 0f3F800000;
mul.f32         %f3, %f2, 0f3F000000;
@zbjornson zbjornson requested a review from a team as a code owner November 8, 2021 07:05
@GPUtester
Copy link
Contributor

Can one of the admins verify this patch?

Copy link
Member

@dantegd dantegd left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the fix @zbjornson ! The CI issue will be solved in #4333 so will merge your PR just after that is in

@dantegd
Copy link
Member

dantegd commented Nov 8, 2021

ok to test

@dantegd
Copy link
Member

dantegd commented Nov 8, 2021

rerun tests

@dantegd dantegd added improvement Improvement / enhancement to an existing function non-breaking Non-breaking change labels Nov 8, 2021
@dantegd dantegd added this to PR-WIP in v21.12 Release via automation Nov 8, 2021
@codecov-commenter
Copy link

Codecov Report

❗ No coverage uploaded for pull request base (branch-21.12@287df1a). Click here to learn what that means.
The diff coverage is n/a.

Impacted file tree graph

@@               Coverage Diff               @@
##             branch-21.12    #4331   +/-   ##
===============================================
  Coverage                ?   86.05%           
===============================================
  Files                   ?      231           
  Lines                   ?    18778           
  Branches                ?        0           
===============================================
  Hits                    ?    16159           
  Misses                  ?     2619           
  Partials                ?        0           
Flag Coverage Δ
dask 47.17% <0.00%> (?)
non-dask 78.77% <0.00%> (?)

Flags with carried forward coverage won't be shown. Click here to find out more.


Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 287df1a...db66d02. Read the comment docs.

@dantegd
Copy link
Member

dantegd commented Nov 9, 2021

@gpucibot merge

@rapids-bot rapids-bot bot merged commit b6c8bc8 into rapidsai:branch-21.12 Nov 9, 2021
v21.12 Release automation moved this from PR-WIP to Done Nov 9, 2021
@zbjornson zbjornson deleted the bug-kl-cvt branch November 14, 2021 20:10
vimarsh6739 pushed a commit to vimarsh6739/cuml that referenced this pull request Oct 9, 2023
The old code compiles to
```asm
cvt.f64.f32     %fd1, %f1;
add.f64         %fd2, %fd1, 0d3FF0000000000000;
mul.f64         %fd3, %fd2, 0d3FE0000000000000;
cvt.rn.f32.f64  %f2, %fd3;
```
instead of just
```asm
add.f32         %f2, %f1, 0f3F800000;
mul.f32         %f3, %f2, 0f3F000000;
```

Authors:
  - Zach Bjornson (https://github.com/zbjornson)

Approvers:
  - Dante Gama Dessavre (https://github.com/dantegd)

URL: rapidsai#4331
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
CUDA/C++ improvement Improvement / enhancement to an existing function non-breaking Non-breaking change
Projects
No open projects
Development

Successfully merging this pull request may close these issues.

None yet

4 participants