CUDA BFloat16 TopK #44755

zasdfgbnm · 2020-09-15T23:03:46Z

No description provided.

dr-ci · 2020-09-15T23:18:38Z

💊 CI failures summary and remediations

As of commit ce32659 (more details on the Dr. CI page):

Commit ce32659 was recently pushed. Waiting for builds...

This comment was automatically generated by Dr. CI (expand for details).

Follow this link to opt-out of these comments for your Pull Requests.

Please report bugs/suggestions on the GitHub issue tracker or post in the (internal) Dr. CI Users group.

See how this bot performed.

This comment has been revised 5 times.

ngimel · 2020-09-16T03:14:41Z

aten/src/THC/THCDeviceUtils.cuh

@@ -39,6 +41,15 @@ __device__ __forceinline__ T doLdg(const T* p) {
 #endif
 }

+template <>
+__device__ __forceinline__ c10::BFloat16 doLdg<c10::BFloat16>(const c10::BFloat16* p) {
+#if __CUDA_ARCH__ >= 350


do you need #if here? torch is only supported on CUDA_ARCH >= 350

I think this is actually equivalent to #ifndef __HIP_PLATFORM_HCC__?

Then it should say so? __ldg doesn't provide performance benefit these days, but I guess you still need to load short and construct bfloat16 from bits on cuda, and hip is able to handle it natively?

Yes, I will change this to #ifndef __HIP_PLATFORM_HCC__, and maybe remove it later (needs benchmark). On HIP, it is just *p, so it's OK.

Just to let you know, changing the #if does break the build for NVIDIA GRID K520 GPU. I understand that is not a supported CUDA architecture though.

Sorry about that, but as you note it is not a supported architecture.

codecov · 2020-09-16T08:29:53Z

Codecov Report

Merging #44755 into master will increase coverage by 0.00%.
The diff coverage is 96.22%.

@@           Coverage Diff           @@
##           master   #44755   +/-   ##
=======================================
  Coverage   68.07%   68.08%           
=======================================
  Files         384      384           
  Lines       49765    49774    +9     
=======================================
+ Hits        33879    33890   +11     
+ Misses      15886    15884    -2

Impacted Files	Coverage Δ
torch/optim/lr_scheduler.py	`88.73% <ø> (-0.05%)`	⬇️
torch/fx/proxy.py	`92.66% <91.30%> (-0.45%)`	⬇️
torch/fx/__init__.py	`100.00% <100.00%> (ø)`
torch/fx/graph.py	`96.66% <100.00%> (+0.28%)`	⬆️
torch/fx/graph_module.py	`97.43% <100.00%> (ø)`
torch/fx/symbolic_trace.py	`95.34% <100.00%> (+1.65%)`	⬆️
torch/testing/_internal/expecttest.py	`78.57% <0.00%> (+1.02%)`	⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 993b465...f1d0fbe. Read the comment docs.

mcarilli · 2020-09-16T15:47:21Z

if you observe weird numerical behavior with bfloat16 topk, https://github.com/pytorch/pytorch/blame/b85568a54a9c60986235ad1e0cc5dffc71b9d5b1/aten/src/ATen/native/cuda/SortingRadixSelect.cuh#L147-L163 is the main suspect. @ngimel you remember our adventures with that for fp16. The same fix was also necessary for bfloat16, and @gchanan included the fix for bfloat16 in his PR, but we had no way to test bfloat16 on cuda at the time.

zasdfgbnm · 2020-09-16T20:39:22Z

@mcarilli Tests on CI are passing, so it should be OK? Do you think we need more tests beyond the existing unit tests?

facebook-github-bot

@ngimel has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

ngimel · 2020-09-16T20:43:55Z

CI is enough, get_all_dtypes is testing bfloat16, right?

zasdfgbnm · 2020-09-16T20:46:29Z

@ngimel Yes, by default it include everything, unless you say include_bfloat=False:

>>> torch.testing.get_all_dtypes()
[torch.uint8, torch.int8, torch.int16, torch.int32, torch.int64, torch.float32, torch.float64, torch.float16, torch.bfloat16, torch.bool, torch.complex64, torch.complex128]
>>> torch.testing.get_all_fp_dtypes()
[torch.float32, torch.float64, torch.float16, torch.bfloat16]

ngimel · 2020-09-17T03:33:53Z

There are internal build failures

stderr: caffe2/aten/src/THC/THCDeviceUtils.cuh(47): error: calling a constexpr __host__ function("from_bits") from a __device__ function("doLdg") is not allowed. The experimental flag '--expt-relaxed-constexpr' can be used to allow this.
caffe2/c10/util/TypeCast.h(27): warning: calling a constexpr __host__ function("real") from a __host__ __device__ function("apply") is not allowed. The experimental flag '--expt-relaxed-constexpr' can be used to allow this.
          detected during:
            instantiation of "decltype(auto) c10::maybe_real<true, src_t>::apply(src_t) [with src_t=c10::complex<double>]" 
(57): here
            instantiation of "uint8_t c10::static_cast_with_inter_type<uint8_t, src_t>::apply(src_t) [with src_t=c10::complex<double>]" 
(157): here
            instantiation of "To c10::convert<To,From>(From) [with To=uint8_t, From=c10::complex<double>]" 
(169): here
            instantiation of "To c10::checked_convert<To,From>(From, const char *) [with To=uint8_t, From=c10::complex<double>]"

zasdfgbnm · 2020-09-17T04:05:01Z

Let me benchmark and remove ldg

ngimel · 2020-09-17T05:37:05Z

The problem is not __ldg, I believe, it's fromBits. I have no idea why --expt-relaxed-constexpr is not passed in internal builds and how it used to work. Maybe just returning CUDA_ARCH guard is the way to go ;-)
Edit: oh, fromBits never worked, you've just added it.

zasdfgbnm · 2020-09-18T22:29:54Z

The solution for the __ldg is at #44925, I will rebase and fix this after that PR is merged.

…topk

facebook-github-bot

@ngimel has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

facebook-github-bot · 2020-10-04T20:10:32Z

@ngimel merged this pull request in e1ff46b.

CUDA BFloat16 TopK

fbd83d1

pytorchbot added the open source label Sep 15, 2020

ngimel reviewed Sep 16, 2020

View reviewed changes

ngimel added the triaged This issue has been looked at a team member, and triaged and prioritized into an appropriate module label Sep 16, 2020

Update THCDeviceUtils.cuh

f1d0fbe

ngimel approved these changes Sep 16, 2020

View reviewed changes

facebook-github-bot reviewed Sep 16, 2020

View reviewed changes

zasdfgbnm added 3 commits October 2, 2020 18:05

Merge branch 'master' of github.com:pytorch/pytorch into bfloat-topk

56cf998

fix ldg

cb8ea5d

Merge branch 'bfloat-topk' of github.com:pytorch/pytorch into bfloat-…

ce32659

…topk

facebook-github-bot reviewed Oct 4, 2020

View reviewed changes

facebook-github-bot closed this in e1ff46b Oct 4, 2020

facebook-github-bot added the merged label Oct 4, 2020

zasdfgbnm deleted the bfloat-topk branch October 5, 2020 08:17

mruberry added the Merged label Oct 28, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

CUDA BFloat16 TopK #44755

CUDA BFloat16 TopK #44755

zasdfgbnm commented Sep 15, 2020

dr-ci bot commented Sep 15, 2020 •

edited

ngimel Sep 16, 2020

zasdfgbnm Sep 16, 2020

ngimel Sep 16, 2020

zasdfgbnm Sep 16, 2020

zasdfgbnm Sep 16, 2020

nickbeckwith Oct 5, 2020 •

edited

ngimel Oct 5, 2020

codecov bot commented Sep 16, 2020 •

edited

mcarilli commented Sep 16, 2020

zasdfgbnm commented Sep 16, 2020

facebook-github-bot left a comment

ngimel commented Sep 16, 2020

zasdfgbnm commented Sep 16, 2020

ngimel commented Sep 17, 2020

zasdfgbnm commented Sep 17, 2020

ngimel commented Sep 17, 2020 •

edited

zasdfgbnm commented Sep 18, 2020

facebook-github-bot left a comment

facebook-github-bot commented Oct 4, 2020

CUDA BFloat16 TopK #44755

CUDA BFloat16 TopK #44755

Conversation

zasdfgbnm commented Sep 15, 2020

dr-ci bot commented Sep 15, 2020 • edited

💊 CI failures summary and remediations

ngimel Sep 16, 2020

Choose a reason for hiding this comment

zasdfgbnm Sep 16, 2020

Choose a reason for hiding this comment

ngimel Sep 16, 2020

Choose a reason for hiding this comment

zasdfgbnm Sep 16, 2020

Choose a reason for hiding this comment

zasdfgbnm Sep 16, 2020

Choose a reason for hiding this comment

nickbeckwith Oct 5, 2020 • edited

Choose a reason for hiding this comment

ngimel Oct 5, 2020

Choose a reason for hiding this comment

codecov bot commented Sep 16, 2020 • edited

Codecov Report

mcarilli commented Sep 16, 2020

zasdfgbnm commented Sep 16, 2020

facebook-github-bot left a comment

Choose a reason for hiding this comment

ngimel commented Sep 16, 2020

zasdfgbnm commented Sep 16, 2020

ngimel commented Sep 17, 2020

zasdfgbnm commented Sep 17, 2020

ngimel commented Sep 17, 2020 • edited

zasdfgbnm commented Sep 18, 2020

facebook-github-bot left a comment

Choose a reason for hiding this comment

facebook-github-bot commented Oct 4, 2020

dr-ci bot commented Sep 15, 2020 •

edited

nickbeckwith Oct 5, 2020 •

edited

codecov bot commented Sep 16, 2020 •

edited

ngimel commented Sep 17, 2020 •

edited