# Terminals

The ``terminal`` or command line is a text-based interface to your computer. It is a program that takes commands as input, and prints text as output. It is a very powerful tool for interacting with your computer, and *is the primary way to interact with remote computers*.

In particular, to use the Discovery cluster, you need to use a terminal. This notebook will introduce you to the basics of using a terminal, and essential discovery commands.

Some essential terminal commands are:
- ``ls``: list the contents of the current directory
- ``cd``: change directory
- ``mkdir``: make a new directory
- ``rm``: remove a file or directory
- ``cp``: copy a file or directory
- ``mv``: move a file or directory
- ``cat``: print the contents of a file

The first three commands are about nagivating the file system, and the last four are about manipulating files and directories.

Also essential to manipulating a file is a text-editor. Some of the most popular are :
- ``vim``: a terminal-based editor that is very powerful, but has a steep learning curve
- ``emacs``: a terminal-based editor that is also very powerful and has a learning curve. Compared to ``vim``, some people think it is more intuitive, but it is also less efficient.
- ``nano``: for beginners, this is the place to start. We'll use ``nano`` today.

# Discovery

Northeastern's computing cluster is called Discovery. You have an account now, and I'd like to make sure you know the basics. It uses SLURM to manage jobs. You can read more about SLURM [here](https://slurm.schedmd.com/overview.html).

Today I'll teach you some of the basic interactions that I have with Discovery. The more advanced among you might find my techniques suboptimal, but they work for me. I encourage you to explore and find your own way of interacting with Discovery.

### Logging In

``ssh jhhalverson@login.discovery.neu.edu``

### .bashrc

I'm going to begin in my home directory and ``nano .bashrc`` to show you my ``.bashrc`` file. This file is run every time I open a terminal. It sets up my conda environment, and crucially in mine I have a folder called localbin added to the ``$PATH``.

```
export PATH="/home/jhhalverson/localbin:$PATH"
```

### localbin

Let's check out my directory localbin.

![Alt text](image.png)

These are little commands I've written over the years to make interacting with Discovery easier. You could also do it with aliases, but I like having the commands in a folder. Let's look at some of them

Some of my local bin commands are:
- ``short``: ``srun --pty -p short /bin/bash``, requests access to the short parition
- ``me``: ``squeue -l -u jhhalverson``, shows my jobs
- ``killmyjobs``: ``scancel -u jhhalverson``, kills all my jobs.

# Discovery Interaction Types

### Computing on a Node

Login to a particular type of node, e.g. with ``short`` above. If you want to lock down the entire node because you're going to use all of the CPUs or all of the RAM, throw the ``--exclusive`` flag.

To see on a given node the RAM and CPU info, do `lsmem` and `lscpu`.

### Submitting a Job

Here's an example of a job file,

```
#!/bin/bash
#SBATCH --job-name=my_job_20
#SBATCH --output=my_job_20.out
#SBATCH --error=my_job_20.err
#SBATCH --ntasks=1
#SBATCH --cpus-per-task=28
#SBATCH --partition=long
#SBATCH --exclusive
#SBATCH --output=workdir/my_job_20.out
#SBATCH --error=workdir/my_job_20.err
#SBATCH --time=00:30:00

cd for_sneh
python to_call_many_times.py --arg-to-print 20
```

It was written by the submitter I describe below and was called ``run_script_arg_20.sh``. 

To submit it to the cluster, I would do ``sbatch run_script_arg_20.sh``.

### Writing a Script that Submits Many Jobs


#### DO NOT MISUSE THIS!

Working with job files is easy, but sometimes you have *many* jobs you want to run, and it can be useful to write a script that writes and submits jobs.

This is my submitter.py:
```
import os
import subprocess

def generate_slurm_script(arg_value):
    script_name = f"run_script_arg_{arg_value}.sh"
    with open(script_name, 'w') as f:
        f.write(f"#!/bin/bash\n")
        f.write(f"#SBATCH --job-name=my_job_{arg_value}\n")
        f.write(f"#SBATCH --output=my_job_{arg_value}.out\n")
        f.write(f"#SBATCH --error=my_job_{arg_value}.err\n")
        f.write(f"#SBATCH --ntasks=1\n")
        f.write(f"#SBATCH --cpus-per-task=28\n")
        f.write(f"#SBATCH --partition=long\n")  # Replace with your desired partition
        f.write(f"#SBATCH --exclusive")
        f.write(f"#SBATCH --output=workdir/my_job_{arg_value}.out\n")
        f.write(f"#SBATCH --error=workdir/my_job_{arg_value}.err\n")
        f.write(f"#SBATCH --time=00:30:00\n")
        f.write(f"\n")
        f.write(f"cd for_sneh\n")
        f.write(f"python to_call_many_times.py --arg-to-print {arg_value}\n")

    print(f"SLURM script '{script_name}' generated.")

    # Submit the SLURM script
    subprocess.run(['sbatch', script_name])

if __name__ == '__main__':
    arg_values = [10, 20, 30, 40]
    for value in arg_values:
        generate_slurm_script(value)
```

and the associated file that it calls, ``to_call_many_times.py``:

```
import argparse
import multiprocessing as mp
import os
import uuid

def my_func(arg_to_print):
    unique_filename = str(uuid.uuid4())[:6]  # Generate a unique string of length 6
    output_file = os.path.join('./exp', f'unique_file_name_{unique_filename}_arg_{arg_to_print}.txt')
    
    with open(output_file, 'w') as f:
        f.write(str(arg_to_print))
    
    print(f"Result saved to {output_file}")

if __name__ == '__main__':
    parser = argparse.ArgumentParser(description='Process an integer and save it to a unique file')
    parser.add_argument('--arg-to-print', type=int, help='The integer to print')
    args = parser.parse_args()
    
    pool = mp.Pool(processes=28)
    pool.map(my_func, [args.arg_to_print] * 40)
    pool.close()
    pool.join()
```

# Data Storage

On the ``scratch" space, you have a lot more room to save files than in home directory.

Mine is in ``/scratch/jhhalverson``. Replace your username for mine.

# Cluster Etiquette

Be careful, lots of people are trying to use these **free** resources. Be a good citizen. 

A few ways to do that are: 
- Don't use the login node for computing.
- **Request reasonable resources** based on the job. e.g., don't request 100 GB of RAM if you only need 1 GB. e.g., don't request 100 CPUs if you only need 1 CPU. But if you're going to either use all of the RAM or all of the CPUs, then use ``--exclusive``.

Note: if your code finishes after submitting a job, it'll exit out. If you've just run it from a fixed node that you logged into, it'll keep running. So if you're going to run a job for a long time, submit it as a job.