#  Getting Started with Intel® oneCCL Bindings for PyTorch
This code sample will guide users how to run a PyTorch DDP distributed workload on both GPU and CPU by using oneAPI AI Analytics Toolkit.

## Simple PyTorch distributed workload on both GPU and CPU
***
This section shows users how to run simple PyTorch distributed on both GPU and CPU with some code changes.

### prerequisites

In [None]:
# ignore all warning messages
import warnings
warnings.filterwarnings('ignore')


Set the installation path of your oneAPI AI Analytics toolkit

In [None]:
%env ONEAPI_INSTALL=/opt/intel/oneapi

Download a simple demo.py sample from torch-ccl github repository

In [None]:
!wget https://raw.githubusercontent.com/intel/torch-ccl/master/demo/demo.py

Check PyTorch and IPEX verson in current ipython kernel

In [None]:
run ../../version_check.py

### Run simple PyTorch distributed DDP workload on GPU and CPU

#### Run on CPU
There is a **pytorch** conda environment with oneCCL Bindings installation for CPU in current AI Kit installation.  
Users could run a PyTorch DDP workload on CPU in this conda environment.

In [None]:
%%writefile run.sh
#!/bin/bash
source $ONEAPI_INSTALL/setvars.sh --force > /dev/null 2>&1
source activate pytorch
echo "########## Executing the run"
mpirun -n 2 -l python demo.py > cpu.csv
echo "########## Done with the run"

##### Submitting build.sh and run.sh to the job queue

Now we can submit build.sh and run.sh to the job queue.

NOTE - it is possible to execute any of the build and run commands in local environments.
To enable users to run their scripts either on the Intel DevCloud or in local environments, this and subsequent training checks for the existence of the job submission command qsub. If the check fails, it is assumed that build/run will be local.

In [None]:
! chmod 755 ../../q; chmod 755 run.sh;if [ -x "$(command -v qsub)" ];  then  ./q run.sh; else ./run.sh; fi

#### Run on GPU 
There is a **pytorch-gpu** conda environment with oneCCL Bindings installation for GPU in current AI Kit installation.  
Users could run a PyTorch DDP workload on GPU in this conda environment with one line code change.

The gpu.patch file under codes_for_ipynb contains the needed modifications to oneCCL Binding sample on GPU.
We show the patch below, and users only need to make change device from cpu to xpu in their codes.

In [None]:
!cat codes_for_ipynb/gpu.patch

In [None]:
!patch < ./codes_for_ipynb/gpu.patch

In [None]:
%%writefile run.sh
#!/bin/bash
source $ONEAPI_INSTALL/setvars.sh --force > /dev/null 2>&1
source activate pytorch-gpu
echo "########## Executing the run"
mpirun -n 2 -l python demo.py > gpu.csv
echo "########## Done with the run"

##### Submitting build.sh and run.sh to the job queue

Now we can submit build.sh and run.sh to the job queue.

NOTE - it is possible to execute any of the build and run commands in local environments.
To enable users to run their scripts either on the Intel DevCloud or in local environments, this and subsequent training checks for the existence of the job submission command qsub. If the check fails, it is assumed that build/run will be local.

In [None]:
! chmod 755 ../../q; chmod 755 run.sh;if [ -x "$(command -v qsub)" ];  then  ./q run.sh; else ./run.sh; fi

In [None]:
print('[CODE_SAMPLE_COMPLETED_SUCCESFULLY]')