# Step 1 - Get Access to RC Computing Cluster

Go to http://rc.rit.edu/. Click on Apply on the top right corner and then apply again in the following page.
![RC Computing Page](images/rc.png "RC Computing Page")

# Step 2 - Copy your code to the cluster

![Copy to Linux](images/copy1.png "Copy to Linux")

![Copy to Mac](images/copy2.png "Copy to Mac")

![Copy from Windows](images/copy3.png "Copy from Windows")

# Step 3 - SSH into the cluster

If you are using windows, follow the directions in this link: https://wiki.rit.edu/pages/viewpage.action?pageId=126190121. If you are using unix system, open a terminal and execute the following command, replacing <username> with your RIT username and entering your password when prompted for it.

```ssh <username>@ion.main.ad.rit.edu```

# Step 4 - Create a script

RIT's RC cluster is run through slurm which is a software orchestration tool and works by getting jobs from its user. Think of slurm as like an admin to whom you are submitting code that runs. ***NOTE*** that the cluster is not meant for debugging. You should only submit jobs to the cluster when you are sure that your code runs locally. The cluster is meant to run your code over multiple cpu and gpu cores, which you probably don't have available in your local computer. And that is the reason, the cluster uses SLURM.

Open a text editor and type the following script

In [None]:
#!/bin/bash -l
# NOTE the -l flag!
#

# This is an example job file for a Serial Multi-Process job.
# Note that all of the following statements below that begin
# with #SBATCH are actually commands to the SLURM scheduler.
# Please copy this file to your home directory and modify it
# to suit your needs.
# 
# If you need any help, please email rc-help@rit.edu
#

# Name of the job - You'll probably want to customize this.
#SBATCH -J tensorflow_example

# Standard out and Standard Error output files
#SBATCH -o test_run_1.output
#SBATCH -e test_run_1.output

#To send emails, set the adcdress below
#SBATCH --mail-user sxa1056@rit.edu

# notify on state change: BEGIN, END, FAIL or ALL
#SBATCH --mail-type=ALL

# Request 5 hours run time MAX, anything over will be KILLED
#SBATCH -t 5:0:0

# Put the job in the "work" partition and request one core
#  "work" is the default partition so it can be omitted without issue.
#  Note that no number of nodes is specified here.  We do that in the
#  other script.
#SBATCH -p work

# Job memory requirements in MB
#SBATCH --mem=6000

# Explicitly state you are a free user
#SBATCH --qos=free

#
# Your job script goes below this line.  
#
# Since we have 4 cpus on each node, we want to tell
#  our program to use '4*the-number-of-nodes' cpus.
cores=`echo $SLURM_NNODES*4 | bc`

echo "Here I need to put code to tell slurm job $SLURM_JOB_ID"
echo "  which has been allocated $SLURM_NNODES 'nodes', and"
echo "  which means it can make use $cores cores to actually"
echo "  execute my program specified to use only $cores cores."
echo ""

# Sleep for 10 seconds, then keep outputting stuff with 'echo'
# load modules required for running your program
module load cuda/7.5
module load cudnn/6.5
module load module_future
module load python/2.7.12
module load singularity

cd /home/sxa1056/my_code
singularity run /opt/singularity/images/tensorflow_0.0.1_82f00df1c07bc4a3ad242da2a272c116a4cbede3.img python mnist.py

echo ""

# Step 4 - Submit the job

Run the following command in the cluster, where example.sh is the script you wrote above.

```sbatch --qos=free --gres=gpu example.sh```

# References

1. https://wiki.rit.edu/display/rc/Getting+Started
2. https://wiki.rit.edu/display/kgcoeuserdocs/Singularity