# Getting Started with pyUIT

This notebook shows how get up and running with pyUIT. It covers initial configuration and some of the most common commands.

## Configuration

Before you can use pyUIT to interact with the HPC, you first need to register a client application with UIT+ (see the [UIT+ documentation](https://www.uitplus.hpc.mil/files/README.pdf)). Be sure to save the client ID and the client secret keys. Create a UIT configuration file in your home directory ```~/.uit``` and copy the client ID and client secret keys into this file in the following format:

```
client_id: <YOUR_CLIENT_ID_HERE>
client_secret: <YOUR_CLIENT_SECRET_HERE>
```

Once you have a registered client and have the configuration file set up, then you can proceed with this notebook.

In [1]:
import uit

## Authenticating and Connecting

The first step in using pyUIT is to create a `uit.Client` and authenticate a user to the UIT+ server. Users must have a pIE account to access the HPC. If your pIE account was created recently (after 2018ish) then you must request that your account be synced to the UIT+ server. 

Note: By adding `notebook=True` as an argument to the authentication call the output will be a Ipython IFrame which displays the OAuth authentication page for UIT+. If you omit this argument then the page is opened up as a new tab in your system browser.

In [2]:
c = uit.Client()
c.authenticate(notebook=True)

127.0.0.1 - - [21/May/2019 14:07:03] "GET /save_token?state=e79c8d9961905ff0030d411a6d21cb7c&code=ffc44ee52a6df7751403fba3e21f400e HTTP/1.1" 200 -


Access Token request succeeded.


Next, we need to connect to a specific HPC system. Currently `topaz` is the only available system. Soon `onyx` will be added, and if there is a need other DSRC systems can be added as well. 

In [3]:
c.connect('topaz')

Connected successfully to topaz05 on topaz


We are now connected to a login node and can make calls, upload or retrieve files and submit jobs to the queue.

## Basic Usage

By default, the `call` method will execute the command in the users $HOME directory. You can optionally pass in a `work_dir` argument to specify a different directory.

In [4]:
c.call('pwd', working_dir=c.WORKDIR)

'/p/work2/rditlsc9\n'

Notice that we passed in `c.WORKDIR` as the value for the `working_dir` argument. The `uit.Client` object has a few properties for common environment variables that are returned as `PosixPath` objects. Other environment variables that can be accesed as properties include:

In [5]:
c.HOME

PosixPath('/p/home/rditlsc9')

In [6]:
c.CENTER

PosixPath('/gpfs/cwfs/rditlsc9')

You can access other environment variables directly through the `uit.Client.env` attribute:

In [9]:
c.env.MODULEPATH

'/p/home/apps/modulefiles/devel:/p/home/apps/modulefiles/apps:/p/home/apps/modulefiles/unsupported:/usr/share/modules:/usr/share/Modules/$MODULE_VERSION/modulefiles:/usr/share/modules/modulefiles'

The `call` method, by default, returns a raw string of the `stdout` and `stderr` output from the HPC.

In [10]:
c.call('ls -la')

'total 69036\ndrwx------   10 rditlsc9 0089JENQ     4096 May 20 15:26 .\ndrwxr-xr-x 4237 root     root       237568 May 21 10:19 ..\n-rw-------    1 rditlsc9 0089JENQ      377 May 21 13:28 .bash_history\n-rw-------    1 rditlsc9 0089JENQ       18 Oct  3  2016 .bash_logout\n-rw-------    1 rditlsc9 0089JENQ      224 Apr 30 16:25 .bash_profile\n-rw-------    1 rditlsc9 0089JENQ     1749 Apr 30 16:20 .bashrc\ndrwxrwsr-x    2 rditlsc9 0089JENQ     4096 Apr 30 16:15 .conda\ndrwxr-----    3 rditlsc9 0089JENQ     4096 Apr 30 16:21 .config\n-rw-------    1 rditlsc9 0089JENQ      169 Apr 30 16:18 .cshrc\n-rw-------    1 rditlsc9 0089JENQ     1637 Oct  3  2016 .emacs\ndrwxr-----    3 rditlsc9 0089JENQ     4096 Apr 29 17:13 helios\n-rw-r--r--    1 rditlsc9 0089JENQ      311 May 20 15:26 hello_world.pbs\n-rw-------    1 rditlsc9 0089JENQ      178 Oct  3  2016 .kshrc\ndrwxr-----   15 rditlsc9 0089JENQ     4096 Apr 23 15:19 miniconda\n-rw-r-----    1 rditlsc9 0089JENQ 70348401 Apr 19 10:47 Miniconda

To make it a little easier to visually parse the output it is recommended to `print` it:

In [11]:
print(c.call('ls -la'))

total 69036
drwx------   10 rditlsc9 0089JENQ     4096 May 20 15:26 .
drwxr-xr-x 4237 root     root       237568 May 21 10:19 ..
-rw-------    1 rditlsc9 0089JENQ      377 May 21 13:28 .bash_history
-rw-------    1 rditlsc9 0089JENQ       18 Oct  3  2016 .bash_logout
-rw-------    1 rditlsc9 0089JENQ      224 Apr 30 16:25 .bash_profile
-rw-------    1 rditlsc9 0089JENQ     1749 Apr 30 16:20 .bashrc
drwxrwsr-x    2 rditlsc9 0089JENQ     4096 Apr 30 16:15 .conda
drwxr-----    3 rditlsc9 0089JENQ     4096 Apr 30 16:21 .config
-rw-------    1 rditlsc9 0089JENQ      169 Apr 30 16:18 .cshrc
-rw-------    1 rditlsc9 0089JENQ     1637 Oct  3  2016 .emacs
drwxr-----    3 rditlsc9 0089JENQ     4096 Apr 29 17:13 helios
-rw-r--r--    1 rditlsc9 0089JENQ      311 May 20 15:26 hello_world.pbs
-rw-------    1 rditlsc9 0089JENQ      178 Oct  3  2016 .kshrc
drwxr-----   15 rditlsc9 0089JENQ     4096 Apr 23 15:19 miniconda
-rw-r-----    1 rditlsc9 0089JENQ 70348401 Apr 19 10:47 Miniconda3-latest-Linux-x

Alternatively, for a few common commands pyUIT provides special methods that parses the output into a Python data structure. By default the return value is a `list` or `dict`, but if you have the `Pandas` module installed then you can specify the argument `as_df=True` to get result as a `pandas.DataFrame`:

In [14]:
c.list_dir(c.HOME)
# If you have Pandas installed then you can uncomment the following line.
c.list_dir(c.HOME, as_df=True)

Unnamed: 0,group,lastmodified,name,owner,path,perms,size,type
0,0089JENQ,2019-04-29T22:13:56Z,helios,rditlsc9,/p/home/rditlsc9/helios,rwxr-----,4096,dir
1,0089JENQ,2019-04-23T20:19:29Z,miniconda,rditlsc9,/p/home/rditlsc9/miniconda,rwxr-----,4096,dir
2,0089JENQ,2019-04-19T17:05:04Z,.oracle_jre_usage,rditlsc9,/p/home/rditlsc9/.oracle_jre_usage,rwxr-----,4096,dir
3,0089JENQ,2019-04-30T21:21:39Z,.config,rditlsc9,/p/home/rditlsc9/.config,rwxr-----,4096,dir
4,0089JENQ,2019-05-20T15:56:23Z,two,rditlsc9,/p/home/rditlsc9/two,rwxr-xr-x,4096,dir
5,0089JENQ,2019-04-30T21:15:50Z,.conda,rditlsc9,/p/home/rditlsc9/.conda,rwxrwxr-x,4096,dir
6,0089JENQ,2019-05-20T16:34:03Z,.uit-plus,rditlsc9,/p/home/rditlsc9/.uit-plus,rwxr-----,4096,dir
7,0089JENQ,2019-01-25T16:08:08Z,.ssh,rditlsc9,/p/home/rditlsc9/.ssh,rwx------,4096,dir
8,0089JENQ,2019-05-21T18:28:40Z,.bash_history,rditlsc9,/p/home/rditlsc9/.bash_history,rw-------,377,file
9,0089JENQ,2016-10-03T21:31:46Z,.bash_logout,rditlsc9,/p/home/rditlsc9/.bash_logout,rw-------,18,file


Other methods that have special parsing include `show_usage` and `status`. These methods are useful when sumbitting jobs to the queue.

## Uploading and Retrieving Files

You can copy files to and from an HPC system by using the `put_file` and `get_file` methods.

In [15]:
local_file = './data/hello_world.pbs'
remote_file = c.HOME/'pyuit_test'
c.put_file(local_path=local_file, remote_path=remote_file)

{'owner': 'rditlsc9',
 'path': '/p/home/rditlsc9/pyuit_test',
 'size': 311,
 'lastmodified': '2019-05-21T19:09:47Z',
 'name': 'pyuit_test',
 'perms': 'rw-r--r--',
 'type': 'file',
 'group': '0089JENQ'}

In [16]:
local_file = './data/pyuit_test.pbs'
c.get_file(remote_path=remote_file, local_path=local_file)

'./data/pyuit_test.pbs'

## Submitting Jobs to the Queue

The `show_usage` method can be used to access the subproject id, which is needed when submitting jobs to the HPC queuing system.

In [20]:
subproject = c.show_usage()[0]['subproject']
subproject

'ERDCV00898ENQ'

The `uit.Client.submit` method accepts a PBS script as one of the following types:
 * file path
 * string
 * `uit.PbsScript` object
 
So if you already have a PBS script file then you can use the `uit.Client` directly to submit it. In our case we have a template PBS script, but it is missing the job name and the subproject, so we will read it in as a string and then render it with a name and your subproject ID.

In [None]:
pbs_script_file = './data/hello_world.pbs'
job_name = 'hello_world_test'

with open(pbs_script_file) as pbs:
    pbs_script = pbs.read()
pbs_script = pbs_script.format(job_name=job_name, subproject=subproject)
print(pbs_script)

In [None]:
job_id = c.submit(pbs_script=pbs_script)
job_id

We can monitor the status of the job by calling `status` and passing it the job ID. Run this cell repeatedly until the job is finished (status = 'F').

In [None]:
c.status(job_id=job_id)
# If you have Pandas installed then you can uncomment the following line.
# c.status(job_id=job_id, as_df=True)

This job will have written its stdout and stderr to files in the workdir that have a name based off of the job name and the job id. We can list these files to ensure that the job has run:

In [None]:
job_number = job_id.split('.')[0]
print(c.list_dir(c.WORKDIR/f'{job_name}.*{job_number}', parse=False))

We can `cat` the contents of the stdout file to see what output the job created.

In [None]:
job_stdout = c.WORKDIR/f'{job_name}.o{job_number}'
print(c.call(f'cat {job_stdout}'))

Alternatively we can copy these files locally to continue to work with them.

In [None]:
sdtout = c.get_file(job_stdout)
sdtout

In [None]:
with sdtout.open() as out:
    print(out.read())