Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion .circleci/test.sh
Original file line number Diff line number Diff line change
Expand Up @@ -19,7 +19,7 @@ echo "Running Python Tests"
./test/run_tests.sh

# echo "Running MNIST Test"
# python test/test_train_mnist.py --tidy
# python test/test_train_mp_mnist.py --tidy
# if [ -x "$(command -v nvidia-smi)" ]; then
# python test/test_train_mp_mnist_amp.py --fake_data
# fi
Expand Down
2 changes: 1 addition & 1 deletion README.md
Original file line number Diff line number Diff line change
Expand Up @@ -200,7 +200,7 @@ Training on pods can be broken down to largely 3 different steps:
If you prefer to not use an [instance group](#create-your-instance-group), you can decide to use a list of VM instances that you may have already created (or can create individually). Make sure that you create all the VM instances in the same zone as the TPU node, and also make sure that the VMs have the same configuration (datasets, VM size, disk size, etc.). Then you can [start distributed training](#start-distributed-training) after creating your TPU pod. The difference is in the `python -m torch_xla.distributed.xla_dist` command. For example, to use a list of VMs run the following command (ex. conda with v3-32):
```
(torch-xla-1.7)$ cd /usr/share/torch-xla-1.7/pytorch/xla
(torch-xla-1.7)$ python -m torch_xla.distributed.xla_dist --tpu=$TPU_POD_NAME --vm $VM1 --vm $VM2 --vm $VM3 --vm $VM4 --conda-env=torch-xla-1.7 --env=XLA_USE_BF16=1 -- python test/test_train_imagenet.py --fake_data
(torch-xla-1.7)$ python -m torch_xla.distributed.xla_dist --tpu=$TPU_POD_NAME --vm $VM1 --vm $VM2 --vm $VM3 --vm $VM4 --conda-env=torch-xla-1.7 --env=XLA_USE_BF16=1 -- python test/test_train_mp_imagenet.py --fake_data
```

### Datasets for distributed training
Expand Down
2 changes: 1 addition & 1 deletion docker/common.sh
Original file line number Diff line number Diff line change
Expand Up @@ -5,7 +5,7 @@ function run_deployment_tests() {
export XRT_WORKERS="localservice:0;grpc://localhost:40934"
export CC=clang-8 CXX=clang++-8

time python /pytorch/xla/test/test_train_mnist.py
time python /pytorch/xla/test/test_train_mp_mnist.py --fake_data
time bash /pytorch/xla/test/run_tests.sh
time bash /pytorch/xla/test/cpp/run_tests.sh
}
Expand Down
266 changes: 0 additions & 266 deletions test/test_train_cifar.py

This file was deleted.

Loading