Most people don't have a GPU that is suited for Deep Learning installed in their working machine, and in fact you don't need to. It's quite easy to setup a remote GPU server nowadays, and in this blog I will explain how to do so with [Paperspace Gradient](https://www.paperspace.com/gradient). 

I started using Paperspace because of a recommendation from Jeremy Howard in his [Live Coding Videos](https://www.youtube.com/playlist?list=PLfYUBJiXbdtSLBPJ1GMx-sQWf6iNhb8mM). If you haven't seen these lectures, I can highly recommend them. They are a great resource on many things related to getting started with Deep Learning, especially he shows a lot of productivity hacks and practical tips on getting a good setup. 

However, the Paperspace setup explanations are a bit out-dated which can lead to confusion when following along with the video's. This blog will hopefully help others to navigate this and quickly set-up a remote GPU server. I would advice anybody who wants to try Paperspace, to first watch the videos from Jeremy to have a general idea of how it works, and then follow these steps to quickly get set-up.

Once you have signed up to Paperspace, go to their Gradient service and create a new project. Paperspace has a free tier, as well as a pro- ($8/month) and growth-plan ($39/month). I personally signed up for the pro-plan, which has a very good value for money. You get 15Gb persistent storage and free Mid instance types. If available, I use the A4000, which is the fastest and comes with 16GB of GPU memory.

:::{.callout-note}
Paperspace has both free and paid servers. The free ones come with a 6 hour usage limit, after that they are automatically shut down. The paid servers you can use as long as you like. Sometimes the free servers are out of capacity, which is a bit annoying. In my experience however most of the time I'm able to get what I need.
:::

With the pro-plan you can create up to 3 servers, or "Notebooks" as they are called by Paperspace (Throughout this blog I'll refer to Notebook Servers). So let's create our first Notebook Server:

- Select the "Fast.AI" runtime
- Select a machine, for example the Free-A4000 if you have the pro-plan. You can always change this afterwards.
- Remove the Workspace URL under the advanced options to create a totally empty server.

The best user experience is through the JupyterLab interface:

![Click the JupyterLab icon to open up a JupyterLab environment for your GPU server](screenshot.png){width=300}


## Persisted Storage at Paperspace

In general, things are not persisted on Paperspace. That means that anything we store during a session, will be gone when we restart our Notebook Server. However, Paperspace comes with two special folders that are persisted. It's important to understand how these folder works, since we obviously need to persist our work. Besides that, we would also like to persist configuration files from GitHub, Kaggle, HuggingFace or any other service we are interacting with.

The persisted folders are called `/storage` and `/notebooks`. Anything in our `/storage` is shared among all the Notebook Servers we are running, whereas anything that is stored in the `/notebooks` folder is only persisted on that specific Notebook Server.

## Set up

In the first few videos, Jeremy shows a lot of tricks on how to install new packages and set up Git. After the recording of these videos, he made a [GitHub repo](https://github.com/fastai/paperspace-setup) which facilitates this setup greatly and makes most of the steps from the videos unnecessary. So let's use that:

```{.bash filename='Terminal' .code-overflow-wrap}
> git clone https://github.com/fastai/paperspace-setup.git
> cd paperspace-setup
> ./setup.sh
```

To understand what this does, let's have a look at `setup.sh`:

```{.bash filename='setup.py' .code-overflow-wrap}
#!/usr/bin/env bash

mkdir /storage/cfg
cp pre-run.sh /storage/
cp .bash.local /storage/
echo install complete. please start a new instance
```

First it's creating a new directory inside of our `/storage` folder called `cfg`. As we will see, this is where we will store all our configuration files and folders.

Next, the script copies 2 files to our storage folder. Let's have a closer look at those

#### **pre-run.sh**

Paperspace automatically executes `/storage/pre-run.sh` during startup of our Notebook Server (upon creation or restart). This is great, because we can use this to automate our setup.

Have a look [here](https://github.com/fastai/paperspace-setup/blob/master/pre-run.sh) for the full script. Let's have a closer look at this snippet:

```{.bash filename='pre-run.sh (snippet)' .code-overflow-wrap}
for p in .local .ssh .config .ipython .fastai .jupyter .conda .kaggle
do
        if [ ! -e /storage/cfg/$p ]; then
                mkdir /storage/cfg/$p
        fi
        rm -rf ~/$p
        ln -s /storage/cfg/$p ~/
done
```

So for any of these folder names(`.local .ssh ...`) we are creating a directory inside of `/storage/cfg` if it doesn't exist. Also, each of these folders is symlinked to the home directory (`~`).

This means that:

1) When we store something in any of these symlinked folders (e.g. `~/.local`), it's actually being written to the associated storage folder (e.g. `/storage/cfg/.local`).
2) Whenever we restart our Notebook Server, all the things that have been previously been persisted (e.g. in `/storage/cfg/.local`) are available again in the home directory (e.g. `~/.local`).

And as it turns out, many tools we want to use keep their configuration files in this home folder. So by persisting this data, we will ensure they work across restarts of our Notebook servers.

##### **.local**

We saw before that the FastAI runtime comes with a number of installed Python packages. If we want to install additional packages, we could do: `pip install <package>`. Pip installs the packages in `/usr/local/lib`, and are thus not persisted. Since it's very annoying to install all our additional packages upon restart, we can install with `pip install --user <package>`. This installs the package only for the user in the `~/.local` directory, and as we have seen this folder is persisted!

##### **.ssh**

To authenticate with GitHub without using passwords, we use ssh keys. We can create them by running `ssh-keygen`, which adds the private key (`id_rsa`) and the public file (`id_rsa.pub`) to the `~/.ssh` folder. Once we upload the public key to GitHub we have continuous authentication with GitHub, since this folder is also persisted.

By now you probably get the idea, any of these folders represent a certain configuration we want to persist:

- `.conda`: contains conda/mamba installed packages
- `.kaggle`: contains a `kaggle.json` authentication file
- `.fastai`: contains downloaded datasets
-  `.config`, `.ipython` and `.jupyter`: contain config files

I also added `.huggingface` to this list, to make sure my HuggingFace credentials are also persisted. See [here](https://github.com/fastai/paperspace-setup/pull/4) for the PR into the main repo.

In the second part of the script we do exactly the same thing, but for a number of files instead of directories:

```{.bash filename='pre-run.sh (snippet)' .code-overflow-wrap}
for p in .git-credentials .gitconfig .bash_history
do
        if [ ! -e /storage/cfg/$p ]; then
                touch /storage/cfg/$p
        fi
        rm -rf ~/$p
        ln -s /storage/cfg/$p ~/
done
```

Now let's have a look at the second file we store in our `/storage` folder:

#### **.bash.local**

```{.bash filename='.bash.local' .code-overflow-wrap}
#!/usr/bin/env bash

alias mambai='mamba install -p ~/.conda '
alias pipi='pip install --user '

export PATH=~/.local/bin:~/.conda/bin/:$PATH
```

Paperspace runs this script whenever we open a terminal. As you can see it defines two aliases to easily install things persistently with either mamba (`mambai`) or pip (`pipi`). 

Any **binaries** that are installed this way, are installed in `~/.local/bin` (through `pip`) and to `~/.conda/bin/` (through `mamba`). We need to add these paths to the `PATH` variable, to make sure we can call them from the command line.

### Note on Mamba
At this point you might wonder why we have the Mamba installation at all, since we have seen that the system Python is used. In fact, our Mamba environment is totally decoupled from what we are using in our Jupyter notebook, and installing packages through `mamba` will not make them available to our Jupyter Notebook environment. Instead, we should install Python packages with `pip` (or `pipi`)

I guess Jeremy has done this to be able to install none-Python specific packages that he wants to use from the Terminal. For example, in the videos he talks about `ctags` which he installs through `mamba`. So it functions as a general package manager, somewhat similar to `apt-get`. 