Validations for 2.2 Release. Cherrry Pick Validation and Manual #4855

atalman · 2024-01-04T20:43:17Z

atalman · 2024-01-18T19:54:13Z

huydhn · 2024-01-18T22:10:02Z

For pytorch/pytorch#115193 issue with the launching of distributed device mesh API, I follow https://github.com/pytorch/pytorch/blob/main/torch/distributed/_tensor/README.md to run the DTensor example on devgpu torchrun --standalone --nnodes=1 --nproc-per-node=4 dtensor_example.py and it works fine:

$ torchrun --standalone --nnodes=1 --nproc-per-node=4 dtensor_example.py

[2024-01-18 14:08:13,419] torch.distributed.run: [WARNING]
[2024-01-18 14:08:13,419] torch.distributed.run: [WARNING] *****************************************
[2024-01-18 14:08:13,419] torch.distributed.run: [WARNING] Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed.
[2024-01-18 14:08:13,419] torch.distributed.run: [WARNING] *****************************************
NCCL version 2.19.3+cuda12.3
...
DTensor(local_tensor=tensor([[-0.9938,  1.6568, -0.0712,  ..., -0.7047,  0.1956,  0.7011],
        [ 0.0633, -0.0818,  0.0865,  ...,  0.6208, -1.3616,  0.4402],
        [ 0.7410,  0.3713, -1.0218,  ..., -0.6000, -0.3061,  0.0240],
        ...,
        [-0.2041, -0.4914, -1.4949,  ..., -0.6163, -0.6493,  0.5180],
        [ 2.5286, -0.3243,  0.5991,  ...,  0.7855,  0.3508, -0.1411],
        [ 1.6220,  1.5745,  0.4140,  ...,  0.6092, -0.7156,  1.0645]],
       device='cuda:0'), device_mesh=DeviceMesh([0, 1, 2, 3]), placements=(Shard(dim=0),))

For the purpose of doing 2.2.0 release, I think that would be good enough.

atalman · 2024-01-31T13:14:03Z

Post Release Poetry test:

curl -s https://pypi.org/pypi/torch/2.2.0/json | jq '.info.requires_dist'
[
  "filelock",
  "typing-extensions >=4.8.0",
  "sympy",
  "networkx",
  "jinja2",
  "fsspec",
  "nvidia-cuda-nvrtc-cu12 ==12.1.105 ; platform_system == \"Linux\" and platform_machine == \"x86_64\"",
  "nvidia-cuda-runtime-cu12 ==12.1.105 ; platform_system == \"Linux\" and platform_machine == \"x86_64\"",
  "nvidia-cuda-cupti-cu12 ==12.1.105 ; platform_system == \"Linux\" and platform_machine == \"x86_64\"",
  "nvidia-cudnn-cu12 ==8.9.2.26 ; platform_system == \"Linux\" and platform_machine == \"x86_64\"",
  "nvidia-cublas-cu12 ==12.1.3.1 ; platform_system == \"Linux\" and platform_machine == \"x86_64\"",
  "nvidia-cufft-cu12 ==11.0.2.54 ; platform_system == \"Linux\" and platform_machine == \"x86_64\"",
  "nvidia-curand-cu12 ==10.3.2.106 ; platform_system == \"Linux\" and platform_machine == \"x86_64\"",
  "nvidia-cusolver-cu12 ==11.4.5.107 ; platform_system == \"Linux\" and platform_machine == \"x86_64\"",
  "nvidia-cusparse-cu12 ==12.1.0.106 ; platform_system == \"Linux\" and platform_machine == \"x86_64\"",
  "nvidia-nccl-cu12 ==2.19.3 ; platform_system == \"Linux\" and platform_machine == \"x86_64\"",
  "nvidia-nvtx-cu12 ==12.1.105 ; platform_system == \"Linux\" and platform_machine == \"x86_64\"",
  "triton ==2.2.0 ; platform_system == \"Linux\" and platform_machine == \"x86_64\"",
  "opt-einsum >=3.3 ; extra == 'opt-einsum'",
  "optree >=0.9.1 ; extra == 'optree'"
]

atalman changed the title ~~Cherrry Pick Validation~~ Validations for 2.2 Release. Cherrry Pick Validation and Manual Jan 18, 2024

atalman closed this as completed Jan 31, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Validations for 2.2 Release. Cherrry Pick Validation and Manual #4855

Validations for 2.2 Release. Cherrry Pick Validation and Manual #4855

atalman commented Jan 4, 2024 •

edited by huydhn

Loading

atalman commented Jan 18, 2024 •

edited

Loading

huydhn commented Jan 18, 2024 •

edited

Loading

atalman commented Jan 31, 2024

Validations for 2.2 Release. Cherrry Pick Validation and Manual #4855

Validations for 2.2 Release. Cherrry Pick Validation and Manual #4855

Comments

atalman commented Jan 4, 2024 • edited by huydhn Loading

atalman commented Jan 18, 2024 • edited Loading

huydhn commented Jan 18, 2024 • edited Loading

atalman commented Jan 31, 2024

atalman commented Jan 4, 2024 •

edited by huydhn

Loading

atalman commented Jan 18, 2024 •

edited

Loading

huydhn commented Jan 18, 2024 •

edited

Loading