# 03. Deploying Jupyter

## Overview

In this notebook, you will learn how to:

 - Configure remote Jupyter deployment.
 - Deploy Jupyter on a compute node.
 - Access deployed Jupyter Notebook.

## Import idact

It's recommended that *idact* is installed with *pip*.  
Alternatively, make sure the dependencies are installed: `pip install -r requirements.txt`, and add *idact* to path, for example:  
`import sys`  
`sys.path.append('<YOUR_IDACT_PATH>')`

We will use a wildcard import for convenience:

In [None]:
from idact import *
import bitmath

## Load the cluster

Let's load the environment and the cluster. Make sure to use your cluster name.

In [None]:
load_environment()
cluster = show_cluster("test")
cluster

In [None]:
access_node = cluster.get_access_node()
access_node.connect()

## Configure remote Jupyter deployment

### Install Jupyter on the cluster

Make sure Jupyter is installed with the Python 3.5+ distribution you intend to use on the cluster. The recommended version is JupyterLab.
See [JupyterLab](https://jupyterlab.readthedocs.io/en/stable/getting_started/installation.html), [Jupyter Notebook](https://jupyter.readthedocs.io/en/latest/install.html).

If you encounter any problems with deployment, this may be due to some library versions being incompatible. You can try installing frozen versions included with the *idact* repo in `envs/dask_jupyter_tornado.txt`:

```
pip install -r dask_jupyter_tornado.txt
```

You may need to add `--user`, if you are using a system Python distribution.

### Specify setup actions

It's rare that the default Python distribution is the one you want to use for computation.

Depending on your setup, you probably need to modify the `PATH` and `PYTHONPATH` environment variables, `source activate` a Conda environment, or perform other specific steps.

In order for *idact* to find and execute the proper binaries, you'll need to specify these steps as a list of Bash script lines. Make sure to modify the list below to fit your needs.

In [None]:
cluster.config.setup_actions.jupyter = ['module load plgrid/tools/python-intel/3.6.2']
save_environment()

### Choose JupyterLab or Jupyter Notebook

By default, JupyterLab is used, if you want to use regular Jupyter Notebook, you must set the entry config below:

In [None]:
cluster.config.use_jupyter_lab = False
save_environment()

## Allocate node for Jupyter

We will deploy Jupyter on a single node. Make sure to adjust the `--account` parameter, same as in the previous notebook.

In [None]:
nodes = cluster.allocate_nodes(nodes=1,
                               cores=2,
                               memory_per_node=bitmath.GiB(10),
                               walltime=Walltime(minutes=10),
                               native_args={
                                   '--account': 'intdata'
                               })
nodes

In [None]:
nodes.wait()
nodes

Let's test the connection, just in case:

In [None]:
nodes[0].run('hostname')

## Deploy Jupyter

After the initial setup, Jupyter can be deployed with a single command:

In [None]:
nb = nodes[0].deploy_notebook()
nb

If the deployment succeeded, you can open the deployed notebook in the browser:

In [None]:
nb.open_in_browser()

Confirm that there are no issues with the deployed Jupyter Notebook instance. Try to start a kernel and see if it looks stable. Make sure the version of Python you expected is used.

If the Jupyter deployment failed for some reason, you will find the `jupyter` command log in the debug log file: `idact.log`.

If your last failure is a timeout, e.g. `2018-11-12 22:14:00 INFO: Retried and failed: config.retries(...)`, check out the tutorial `07. Adjusting timeouts` if you believe the timeout might be too restrictive for your cluster.

After you're done, you can cancel the deployment by calling `cancel`, though it will be killed anyway when the node allocation ends.

In [None]:
nb.cancel()

Alternatively, the following will just close the tunnel, without attempting to kill Jupyter:

In [None]:
nb.cancel_local()

## Cancel the allocation

It's important to cancel an allocation if you're done with it early, in order to minimize the CPU time you are charged for.

In [None]:
nodes.running()

In [None]:
nodes.cancel()

In [None]:
nodes.running()

## Next notebook

In the next notebook, we will deploy a Dask.distributed scheduler and workers on several compute nodes, and browse their dashboards.