-
Notifications
You must be signed in to change notification settings - Fork 9.8k
Description
This example is based on FSDP1 which is deprecated but I still wanna see it working:
https://github.com/pytorch/examples/blob/main/mnist/main.py
But mnist fetch command failing, I believe here:
https://github.com/pytorch/examples/blob/acc295dc7b90714f1bf47f06004fc19a7fe235c4/mnist/main.py#L120C4-L121C44
WIth error output:
mp.spawn(fsdp_main,
File "/home/nonroot/condaforge_src/envs/torch/lib/python3.12/site-packages/torch/multiprocessing/spawn.py", line 364, in spawn
return start_processes(fn, args, nprocs, join, daemon, start_method="spawn")
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/nonroot/condaforge_src/envs/torch/lib/python3.12/site-packages/torch/multiprocessing/spawn.py", line 320, in start_processes
while not context.join():
^^^^^^^^^^^^^^
File "/home/nonroot/condaforge_src/envs/torch/lib/python3.12/site-packages/torch/multiprocessing/spawn.py", line 220, in join
raise ProcessRaisedException(msg, error_index, failed_process.pid)
torch.multiprocessing.spawn.ProcessRaisedException:
-- Process 0 terminated with the following error:
Traceback (most recent call last):
File "/home/nonroot/condaforge_src/envs/torch/lib/python3.12/site-packages/torch/multiprocessing/spawn.py", line 95, in _wrap
fn(i, *args)
File "/home/nonroot/extdir/gg/git/codelab/gpu/ml/pytorch/distributed/tutorials/3-fsdp/ex1-fsdp.py", line 138, in fsdp_main
dataset1 = datasets.MNIST('../data', train=True, download=True,
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/nonroot/condaforge_src/envs/torch/lib/python3.12/site-packages/torchvision/datasets/mnist.py", line 100, in init
self.download()
File "/home/nonroot/condaforge_src/envs/torch/lib/python3.12/site-packages/torchvision/datasets/mnist.py", line 188, in download
download_and_extract_archive(url, download_root=self.raw_folder, filename=filename, md5=md5)
File "/home/nonroot/condaforge_src/envs/torch/lib/python3.12/site-packages/torchvision/datasets/utils.py", line 388, in download_and_extract_archive
download_url(url, download_root, filename, md5)
File "/home/nonroot/condaforge_src/envs/torch/lib/python3.12/site-packages/torchvision/datasets/utils.py", line 137, in download_url
raise RuntimeError("File not found or corrupted.")
RuntimeError: File not found or corrupted.