### Introduction

This notebook covers connecting to and scheduling jobs on the university's high-performance computing facility, Viking. A useful site to reference is the official viking documentation found [here](https://vikingdocs.york.ac.uk/index.html). Some of the templates and information was used from here when making the notebook.

From the previous notebook you will be familiar with the Teaching and Research machines avaliable at the University for computing jobs. For many applications this is enough but for large-scale or long running jobs larger computing solutions are needed.

The Universitys solution to this is Viking. This provides greater processing resources than a single node and is specifically for processing large volumes of data. To access vikings computing abilities users must submit jobs to a queuing system which allocates these jobs to a compute note in a fair manner.

The viking gateway can be accessed on campus using `USERNAME@viking` or off campus using `USERNAME@viking.york.ac.uk`. Another option is to connect to the york gateway by `USERNAME@ssh.york.ac.uk` then select the machine to be viking.

The gateway is used to write and submit Submission Scripts and is NOT designed to have full scale jobs ran on it.

In order to submit jobs you will need a viking account setup and a project code to use. For this course this will be `pet-nucleartec-2025`

As well as computing resource intensive jobs, viking can also be used for storage and management of large datasets. It has specalised high performance filestores avaliable to users for this purpose with multiple filestore locations avalible to use for certain applications they are specalised for

### Slurm


Slurm the **Simple Linus Utility for Resource Management** is the software used to manage job submissions on systems like Viking. A range of user commands are avaliable in Slurm to the user  to submit, manage and monitor jobs:

- **sbatch** - Submits a job to Slurm with options specified in the terminal or in a script
- **squeue** - View information about the jobs in the queue. Using `squeue -u USERNAME` will list all your queued jobs
- **scancel** - Signal jobs or job steps under the control of Slurm (e.g. run without options to cancel a job)
- **scontrol** - View or modify Slurm job state
- **sinfo** - View information about slurm nodes and partitions
- **srun** - Submit a job for execution in real time
- **salloc** - Allocate resources for a job in real time
- **sattach** - Attach standard input, output and error to a currently running job

As a basic requirement, a Slurm job must have the following:

- **account** - This is the Slurm account associated with your username, e.g. *pet-nucleartec-2025*
- **partition** - This specifies the pertition the job will be ran on. Depending on the job the correct one should be chosen to ensure correct resource allocation and quickest execution time. Usually *test* will be used for a small job or *nodes* for full scale jobs. A detailed list can be found [here on the viking wiki](https://vikingdocs.york.ac.uk/using_viking/resource_partitions.html)
- **Resources** - This includes the number of CPUs for a task, memory per job, etc. If needed a whole node can be requested
- **Time** - This is specific time the job will be executed for. Each partition has an upper limit but should be specified to improve the time it takes for the job to be scheduled

These paramaters can be specified in a file and executed using `sbatch`. This is seen in the example code below:

In [1]:
#!/usr/bin/env bash
#SBATCH --job-name=testScript           # Job name
#SBATCH --ntasks=1                      # Number of MPI tasks to request
#SBATCH --cpus-per-task=1               # Number of CPU cores per MPI task
#SBATCH --mem=1G                        # Total memory to request
#SBATCH --time=0-00:05:00               # Time limit (DD-HH:MM:SS)
#SBATCH --account=pet-nucleartec-2025   # Project account to use

set -e

touch vikingTest.txt

echo 'This is a test of using viking' >> vikingTest.txt

`set -e` is included to abort the submission of any errors occur

Saving this as a `testScript.job` file and submitting to viking using `sbatch jobscript.job` will execute the following:

After execution this script creates a new file called "vikingTest.txt" with the text contained within.

Any outputs to the shell enviroment are writen to a file, in this case it would be written to `slurm-23000250.out`

#### Task 1

Create a job script using the example above for the pet-nuclear-2025 account to run on the test partition. Give this a title of task_1_job with 1 task with 1 cpu per task and 1G of memory. Set the time to a reasonable amount recalling that lower execution times are likely to be scheduled quicker. 

Include a shell script to execute that prints the current system time, sleeps for 5 seconds then outputs the system time again to the shell enviroment. Submit this to viking and verify the program outputs as expected.

- A useful command will be `date` which prints the current system time and date

### Transfering Files to Viking

There are two main methods used to transfer files to and from viking. One is to use file transfer protocols such as scp and rsync while connected to viking. This notebook covers this method of file transfer to the university data store.

The other is to use the globus software where the setup information can be found [here](https://vikingdocs.york.ac.uk/globus/getting_started.html)


There multiple filestores avaliable at the University, the most common used are the shared filestores and the personal filestores.

- `sftp.york.ac.uk:/shared/storage/FILESTORE` is the address to access the shared filestore using sftp
- `sftp.york.ac.uk:/home/userfs/A/USERNAME` is the address to access the personal user filestore using sftp. The USERNAME needs to be replaced with your username and A replaced with the first letter of your username.

Below is an example carried out on a viking node accessing a personal filestore to retrieve a file.

Using this, files generated by viking can be transfered to personal filestores and scripts created can be transfered from filestores to viking for execution. This means you can create job scripts in a preffered editor instead of needing to create them from within the viking terminal. The addrsses for scp and rsync are the same as the ones found within the remote systems notebook.

#### Task 2

From viking, try connecting to a filestore to transfer and retrieve a file of your choice. Try using scp, rsync and sftp to test you understand how each one works as not all systems support these like viking does.

### Viking Storage

When logining in you are placed in your `home` directory. The home directory has a size of 100GB and file limit of 400,000. From here you can access the `scratch` and `localtmp` directories. 

- `scratch` - This is a high performance filestore with a default size of 2TB and no limit on the number of files
- `localtmp` - This is storage space on the current viking node. This area is not backed up so should only be used for processing data. No data can be stored here long term.
- `longship` - If data is not needed within 90 days but needs to be kept on viking can be stored here. It isnt high performance but can be used to save data to avoid copying datasets back and forth from campus. It is read/write on the login nodes and read only on compute nodes

Longship can be accessed using the following paths:

From Viking:
- `/mnt/longship/users`
- `/mnt/longhsip/projects`

From Campus (sftp/linux):
- `/shared/longship/users`
- `/shared/longship/projects`

From Campus (windows):
- `\\longhsip.york.ac.uk\users`
- `\\longship.york.ac.uk\projects`

Your storage quota can be checked using `myquota`

None of the filestores on viking are backed up, this means that files must be transfered elsewhere to keep copies. If a catastrophic failure occurs all data could be lost. **For many grants keeping backups of data is vital so ensure that files are transfered off viking so they are not lost**

Each file store has its own deletion policy, some of these are as follows:
- **Home** - Never Deleated
- **Scratch** - Data not touched in 90 days (Unless in scratch projects and discussed with viking team
- **Longship** - Never Deleted
- **Localtmp** - Clean up period TBD

For long term storage the Vault is also avaliable. This is used for archival purposes where copies of data are needed but are unlikely to be accessed. Information on the Vault can be found [here](https://vikingdocs.york.ac.uk/data_management/vault.html)

### General Viking Workflow



The best practice for developing code on viking is as follows:

- Develop code locally and debug before transfering to viking using a file transfer method.
- Run a small scale test of the code on the viking gateway to ensure that the script works on the viking architecture.
- Create a small scale job request to test the script on the test partition. Use a small number of events to ensure the output matches the expected and that the requested resources are accurate
- Add any needed changes to the job script, change the partition to the most suitable for the job and then submit the full scale job to viking to compute.

### Summary Task

Create a script to estimate pi by using Monte Carlo Methods. This involves plotting random points within a square and determining how many fall within the radius of a circle. A simulation of this method can be found [here](https://mste.illinois.edu/activity/estpi/)

Run this on viking with 10,000, 1,000,000 and 10,000,000 events, for each print the estimation of pi to a text file to act as the output. The script should take one input being the number of points to generate. The output file should contain the number of events generated for that estimation.

This can be written in a language of your choice but ensure that the versions of any modules loaded are consistent between the local development enviroment and what you load to use on viking.
