# 03. Deploying Jupyter

## Overview

In this notebook, you will learn how to:

 - Configure remote Jupyter deployment.
 - Deploy Jupyter on a compute node.
 - Access deployed Jupyter Notebook.

## Import idact

It's recommended that *idact* is installed with *pip*. Alternatively, make sure the dependencies are installed: `pip install -r requirements.txt`, and add *idact* to path, for example:

In [1]:
import sys
sys.path.append('../')

We will use a wildcard import for convenience:

In [2]:
from idact import *
import bitmath

## Load the cluster

Let's load the environment and the cluster. Make sure to use your cluster name.

In [3]:
load_environment()
cluster = show_cluster("hpc")
cluster

Cluster(pro.cyfronet.pl, 22, plggarstka, auth=AuthMethod.PUBLIC_KEY, key='C:\\Users\\Maciej/.ssh\\id_rsa_6p', install_key=False, disable_sshd=False)

In [4]:
access_node = cluster.get_access_node()
access_node.connect()

## Configure remote Jupyter deployment

### Install Jupyter on the cluster

Make sure Jupyter is installed with the Python 3.5+ distribution you intend to use on the cluster. The recommended version is JupyterLab.
See [JupyterLab](https://jupyterlab.readthedocs.io/en/stable/getting_started/installation.html), [Jupyter Notebook](https://jupyter.readthedocs.io/en/latest/install.html).

If you encounter any problems with deployment, this may be due to some library versions being incompatible. You can try installing frozen versions included with the *idact* repo in `envs/dask_jupyter_tornado.txt`:

```
pip install -r dask_jupyter_tornado.txt
```

You may need to add `--user`, if you are using a system Python distribution.

### Specify setup actions

It's rare that the default Python distribution is the one you want to use for computation.

Depending on your setup, you probably need to modify the `PATH` and `PYTHONPATH` environment variables, `source activate` a Conda environment, or perform other specific steps.

In order for *idact* to find and execute the proper binaries, you'll need to specify these steps as a list of Bash script lines. Make sure to modify the list below to fit your needs.

In [5]:
cluster.config.setup_actions.jupyter = ['module load plgrid/tools/python-intel/3.6.2']
save_environment()

### Choose JupyterLab or Jupyter Notebook

By default, JupyterLab is used. If you want to use regular Jupyter Notebook, set the config entry below to False.

In [6]:
cluster.config.use_jupyter_lab = True
save_environment()

## Allocate node for Jupyter

We will deploy Jupyter on a single node. Make sure to adjust the `--account` parameter, same as in the previous notebook.

In [7]:
nodes = cluster.allocate_nodes(nodes=1,
                               cores=2,
                               memory_per_node=bitmath.GiB(10),
                               walltime=Walltime(minutes=10),
                               native_args={
                                   '--account': 'intdata'
                               })
nodes

2018-11-24 00:45:17 INFO: Creating the ssh directory.


Nodes([Node(NotAllocated)], SlurmAllocation(job_id=14334261))

In [8]:
nodes.wait()
nodes

2018-11-24 00:45:26 INFO: Still pending or configuring...


Nodes([Node(p0260:51704, 2018-11-23 23:55:25.372155+00:00)], SlurmAllocation(job_id=14334261))

Let's test the connection, just in case:

In [9]:
nodes[0].run('hostname')

'p0260'

## Deploy Jupyter

After the initial setup, Jupyter can be deployed with a single command:

In [10]:
nb = nodes[0].deploy_notebook()
nb

JupyterDeployment(8080 -> Node(p0260:51704, 2018-11-23 23:55:25.372155+00:00)

If the deployment succeeded, you can open the deployed notebook in the browser:

In [11]:
nb.open_in_browser()

Confirm that there are no issues with the deployed Jupyter Notebook instance. Try to start a kernel and see if it looks stable. Make sure the version of Python you expected is used.

If the Jupyter deployment failed for some reason, you will find the `jupyter` command log in the debug log file: `idact.log`.

If your last failure is a timeout, e.g. `2018-11-12 22:14:00 INFO: Retried and failed: config.retries(...)`, check out the tutorial `07. Adjusting timeouts` if you believe the timeout might be too restrictive for your cluster.

After you're done, you can cancel the deployment by calling `cancel`, though it will be killed anyway when the node allocation ends.

In [12]:
nb.cancel()

2018-11-24 00:46:02 INFO: Cancelling Jupyter deployment.


Alternatively, the following will just close the tunnel, without attempting to kill Jupyter:

In [13]:
nb.cancel_local()

## Cancel the allocation

It's important to cancel an allocation if you're done with it early, in order to minimize the CPU time you are charged for.

In [14]:
nodes.running()

True

In [15]:
nodes.cancel()

2018-11-24 00:46:12 INFO: Cancelling job 14334261.


In [16]:
nodes.running()

False

## Next notebook

In the next notebook, we will deploy a Dask.distributed scheduler and workers on several compute nodes, and browse their dashboards.