I want to run the Resnet on two different machines , how to run the main.py
When i change the code by add the follow
`# on rank 0
dist.init_process_group(
backend = "gloo",
init_method = 'tcp://172.16.8.196:8864',
rank = 0,
world_size = 2
)
on rank 1
dist.init_process_group(
backend = "gloo",
init_method = 'tcp://172.16.8.196:8864',
rank = 1,
world_size = 2
)`
In machine 1/2, the command is python main.py
Then an error occurs, RuntimeError: Socket Timeout.
How to fix it ?