Created by Mariam Miari on 4/20/2021

## This tutorial will describe the steps needed to login and use Alvis cluster 
### All information about the cluster are available [here](https://www.c3se.chalmers.se/about/Alvis/).
GPU cost on Alvis:

|Type|VRAM|System memory per CPU|CPU cores per GPU|Cost|
|---|---|---|---|---|
|V100|32GB|96 or 192 GB | 8 |8 | 
|T4| 16GB| 72 or 192 GB | 4 | 2|
|A100| 40GB| 192GB| 8 | 16|




### All information about getting access are available [here](https://www.c3se.chalmers.se/documentation/getting_access/).

## Steps required:
If you **have** a SUPR account and already a member of a specific project, **skip** the next 4 steps:

- Join [SUPR](https://supr.snic.se)
- Login using SWAMID
- Approve [User agreement](https://supr.snic.se/person/)
- Ask your PI to add you to one of the existing projects.



### To request an account on Alvis (now that you have SUPR account, signed the user agreement, and added to a project):

- Go to 'https://supr.snic.se/account/' under "Account Requests" --> you will see pending account requests depending on your projects memberships.
- Click on "Request account".
- This process will take ~ 1 working day.
- You will receive an email with your user account and another email with your one time password.
- Change your password through the link provided in the email as soon as possible.





### To login to Alvis, two ways are available:
1. Through computer's terminal
2. Through ThinLinc web : https://alvis1.c3se.chalmers.se:300

**It's important to know that logging in is only permitted when connected to a network that can access the system (see [here](https://www.c3se.chalmers.se/documentation/intro-alvis/slides/#connecting)).** Therefore, you must tunnel your connection through a university, i.e. you must use a VPN.

For using VPN at Lund University check [VPN guide to the guides](https://luservicedesk.service-now.com/support_en?id=kb_article_en&sys_id=aa073a82dbc520d020681ea605961987).



Since I have macOS I used instructions written in [this page](https://luservicedesk.service-now.com/support_en?id=kb_article_en&sys_id=6d1d7a71dbb0ac506452cd4d0b9619f7), in principle do the following:

- Open **System Preferences** and click **Network**.
- Click on the plus sign in the lower left corner to create a new service.
- In the window that appears, select the following:


1. Interface: select VPN
2. VPN type: select L2TP over IPSec
3. Name of the Service: type optional name (eg LU VPN )
4. Click Create


- Server address: type vpn.lu.se
- Under Username, enter your LucatID or StudentID
- Click Advanced
- **Check** Send all traffic over VPN connection and **click OK**
- Click Authentication Settings

- Fill in the following information:


1. Password: enter your Lucat password or Student password
2. Shared Secret: type luvpn123


- Click OK
- Click Finish
- **Check** Show VPN status in the **menu bar**. That way you can connect and disconnect the VPN in the menu bar in a simple way.
- Click the VPN symbol to connect to the VPN or disconnect the connection.


**You can now log in to Alvis using the provided username (e.g. mariammi) through the terminal and ThinLinc web: <br>
ssh username@alvis1.c3se.chalmers.se** <br>


**The current path to our storage folder is:** <br>
/cephyr/NOBACKUP/groups/snic2021-23-312/

# Submitting Jobs

### All Information about job submission and memory usage can be found [here]('https://www.c3se.chalmers.se/about/Alvis/'), [here]('https://www.c3se.chalmers.se/documentation/running_jobs/'), and [here]('https://www.c3se.chalmers.se/documentation/intro-alvis/presentation.html#/example-multi-node')

**On Alvis, you also have a choice between NVidia Tesla A100, V100 and T4 GPUs.**
#### Add '#' sign before each of the SBATCH lines below
>SBATCH --gpus-per-node=V100:1 # allocates 1 V100 GPU (and 8 cores)<br>
>SBATCH --gpus-per-node=T4:1   # allocates 1 T4 GPU (and 4 cores, but you only pay for 2)<br>
>SBATCH --gpus-per-node=A100:1 # allocates 1 A100 GPU (and 8 cores) <br>



**Note that on Alvis the V100 GPU is 4 times more expensive than a T4 GPU and a A100 is 8 times more expensive, which reflects the cost of the hardware.**

Always remember to load the necessary modules before running the job. <br>
Start by loading **fosscuda/2019b**. 
To check how to load TensorFlow for instance: 
> module spider TensorFlow<br>

This will give different versions: <br>

        TensorFlow/1.15.2-Python-3.7.4 
        TensorFlow/2.1.0-Python-3.7.4
        TensorFlow/2.2.0-Python-3.7.4
        TensorFlow/2.3.1-Python-3.7.4
        TensorFlow/2.3.1-Python-3.8.2
        TensorFlow/2.4.1
        
Load the necessary version using:
> ml (or module load) TensorFlow/1.15.2-Python-3.7.4 .

**Example of my bash script:**

#### add '#' sign before each of the SBATCH lines below and '!/bin':
> !/bin/bash <br>
> SBATCH -A SNIC2021-7-54 -p alvis #add your AI/ML project<br>
> SBATCH -n 4<br>
> SBATCH -J Model15_effnet<br>
> SBATCH --gpus-per-node=A100:2 #You can specify V100:2 if your job isn't as computationally expensive.<br>
> SBATCH --time=12:00:00<br>
> SBATCH --mail-user=ma8244mi-s@student.lu.se<br>
> SBATCH --mail-type=END<br>
> SBATCH -o train.out<br>
> SBATCH -e train.err<br>

module load fosscuda/2019b Python/3.7.4 matplotlib/3.1.1-Python-3.7.4 TensorFlow/2.3.1-Python-3.7.4 OpenCV/4.2.0-Python-3.7.4  <br>
python3 eff_training.py > logFile_train15_effnet.txt

### Check the different flags available for a bash script (if needed) from [here]('https://slurm.schedmd.com/sbatch.html')

### If you want to install packages
create your virtual environment. The **virtualenv command is included in the Python modules. Load your favourite version of Python (and everything else you need, e.g. SciPy-bundle) from the module system first**. The first time, we create a new virtual environment **(only done once)**. Do the following steps:
- load any python version, e.g. Python-3.7.4
- virtualenv --system-site-packages ~/your_env_name
- source ~/your_env_name/bin/activate #activates your environment
- Install the packages you want (e.g. pip install imutils)

### Run the script
> sbatch job_script.sh

### Check your submitted job status
> squeue -u your_user_name