support for fp8 allgather FSDP #109654

jspark1105 · 2023-09-19T21:59:53Z

Combo with facebookresearch/fairscale#1136

pytorch-bot · 2023-09-19T21:59:57Z

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/109654

📄 Preview Python docs built from this PR
📄 Preview C++ docs built from this PR
❓ Need help or want to give feedback on the CI? Visit the bot commands wiki or our office hours

Note: Links to docs will display an error until the docs builds have been completed.

✅ No Failures

As of commit dd4c868 with merge base cff8bf4 ():
💚 Looks good so far! There are no failures yet. 💚

This comment was automatically generated by Dr. CI and updates every 15 minutes.

drisspg · 2023-10-12T20:13:17Z

aten/src/ATen/native/TensorFactories.h

+  if (tensor.scalar_type() == ScalarType::Float8_e5m2 ||
+      tensor.scalar_type() == ScalarType::Float8_e4m3fn) {
+    AT_DISPATCH_FP8_TYPES(
+      tensor.scalar_type(), "fill_empty_deterministic_", [&]() {


does fill_empty_deterministic not work for float8?
cc @malfet

An issue is "tensor.is_floating_point() || tensor.is_complex()" is True for fp8 but then AT_DISPATCH_FLOATING_AND_COMPLEX_TYPES_AND2 don't handle fp8 so raises an error. And not sure if fp8 has quite_NaN.

drisspg · 2023-10-12T20:15:04Z

torch/csrc/distributed/c10d/ProcessGroupNCCL.cpp

@@ -60,6 +60,10 @@ std::map<at::ScalarType, ncclDataType_t> ncclDataType = {
    {at::kLong, ncclInt64},
    {at::kHalf, ncclHalf},
    {at::kBool, ncclUint8},
+    // TODO: need per collective handling
+    // (e.g., fp8 allgather OK, reduce-scatter NO)
+    {at::kFloat8_e5m2, ncclInt8},


For provenance, @awgu said that these look fine

Or my bad, I meant more that reviewing / figuring out the right way to do this should be fine. I am not too knowledgeable on ProcessGroupNCCL.

Let me cc: @H-Huang @kwen2501 @wconstab

looks fine to me, for robustness we will probably need to add validity checks in the reduce ops to ensure these dtypes arent being used, something like:

pytorch/torch/csrc/distributed/c10d/ProcessGroupNCCL.cpp

Line 1404 in 8162f41

void check_gpu_single_tensor(const at::Tensor& tensor) {

either that or add another argument to getNcclDataType to pass in the collective to check there

pytorch/torch/csrc/distributed/c10d/ProcessGroupNCCL.cpp

Line 69 in 8162f41

ncclDataType_t getNcclDataType(at::ScalarType type) {

wconstab · 2023-10-12T21:39:05Z

tag @kwen2501 @H-Huang for additional insight on the PGNccl changes

facebook-github-bot · 2023-10-14T20:27:50Z

@jspark1105 has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

facebook-github-bot · 2023-10-14T21:50:16Z

@jspark1105 has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

gchanan · 2023-11-07T20:04:35Z

aten/src/ATen/native/TensorFactories.h

+  } else if (tensor.scalar_type() == ScalarType::Float8_e4m3fn) {
+    at::Float8_e4m3fn nan(FP8_NAN, at::Float8_e4m3fn::from_bits_t{});
+    tensor.fill_(nan);
+  } else if (tensor.is_floating_point() || tensor.is_complex()) {
    AT_DISPATCH_FLOATING_AND_COMPLEX_TYPES_AND2(
      kBFloat16, kHalf, tensor.scalar_type(), "fill_empty_deterministic_", [&]() {
        tensor.fill_(std::numeric_limits<scalar_t>::quiet_NaN());


IIRC for other types that aren't standard C++ types (i.e. float16, bfloat16) we just fill out the numerics_limit so this code works throughout the codebase and we don't need to special case each place.

github-actions · 2024-01-06T20:33:44Z

Looks like this PR hasn't been updated in a while so we're going to go ahead and mark this as Stale.
Feel free to remove the Stale label if you feel this was a mistake.
If you are unable to remove the Stale label please contact a maintainer in order to do so.
If you want the bot to never mark this PR stale again, add the no-stale label.
Stale pull requests will automatically be closed after 30 days of inactivity.

rahul003 · 2024-01-10T21:52:16Z

Any update on this?

wconstab · 2024-01-12T20:48:32Z

torch/csrc/distributed/c10d/ProcessGroupNCCL.cpp

+            input.scalar_type() == at::kFloat8_e4m3fn) {
+          nccl_dtype = ncclInt8;
+        } else {
+          nccl_dtype = getNcclDataType(input.scalar_type());


just curious, why not add the type mapping into ncclDataType above so that getNcclDataType returns it?

Are we wanting to potentially have different behavior for the type depending on which collective we're in?

Yes this is based on feedback from someone in the PyTorch team (I think it was in workplace chat).

ping @kwen2501 @H-Huang to comment further- do we want it this way? or inside ncclDataType

jspark1105 requested a review from vkuzo September 19, 2023 21:59

pytorch-bot bot added the release notes: distributed (c10d) release notes category label Sep 19, 2023

jspark1105 mentioned this pull request Sep 19, 2023

Fp8 all gather hack facebookresearch/fairscale#1136

Open

jspark1105 force-pushed the jspark_fp8 branch from 506e6a5 to c5d413b Compare October 4, 2023 23:30

drisspg reviewed Oct 12, 2023

View reviewed changes

fp8 support in cat_cuda_out

7a93005

jspark1105 force-pushed the jspark_fp8 branch 2 times, most recently from 1ce229e to 2f89908 Compare October 14, 2023 20:25

jspark1105 changed the title ~~WIP support for fp8 allgather FSDP~~ support for fp8 allgather FSDP Oct 14, 2023

jspark1105 marked this pull request as ready for review October 14, 2023 20:27

jspark1105 requested review from mrshenli, zhaojuanmao, rohan-varma, kwen2501, wanchaol, fegin, fduwjj, kiukchung, d4l3k and wz337 as code owners October 14, 2023 20:27

jspark1105 added 2 commits October 14, 2023 14:46

fp8 support in fill_empty_deterministic

6b0edba

fp8 support in all_gather

dd4c868

jspark1105 force-pushed the jspark_fp8 branch from 2f89908 to dd4c868 Compare October 14, 2023 21:47

gchanan reviewed Nov 7, 2023

View reviewed changes

github-actions bot added the Stale label Jan 6, 2024

wconstab reviewed Jan 12, 2024

View reviewed changes

github-actions bot closed this Feb 16, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

support for fp8 allgather FSDP #109654

support for fp8 allgather FSDP #109654

jspark1105 commented Sep 19, 2023

pytorch-bot bot commented Sep 19, 2023 •

edited

drisspg Oct 12, 2023

jspark1105 Oct 12, 2023

drisspg Oct 12, 2023

awgu Oct 12, 2023

H-Huang Oct 13, 2023

wconstab commented Oct 12, 2023

facebook-github-bot commented Oct 14, 2023

facebook-github-bot commented Oct 14, 2023

gchanan Nov 7, 2023

github-actions bot commented Jan 6, 2024

rahul003 commented Jan 10, 2024

wconstab Jan 12, 2024

jspark1105 Jan 12, 2024

wconstab Jan 17, 2024

support for fp8 allgather FSDP #109654

support for fp8 allgather FSDP #109654

Conversation

jspark1105 commented Sep 19, 2023

pytorch-bot bot commented Sep 19, 2023 • edited

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/109654

✅ No Failures

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

wconstab commented Oct 12, 2023

facebook-github-bot commented Oct 14, 2023

facebook-github-bot commented Oct 14, 2023

Choose a reason for hiding this comment

github-actions bot commented Jan 6, 2024

rahul003 commented Jan 10, 2024

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

pytorch-bot bot commented Sep 19, 2023 •

edited