<img src="img/dsci513_header2.png" width="600">

# Instructions for connecting to remote and local databases

### Connecting to the database

As discussed in lecture 1, Postgres works based on the client-server model. This means that regardless of whether you want to use a database on your own computer or on a remote computer, you need to connect to it as a client, and the server will provide the required services. For DSCI 513, we have set up a Postgres host on the network that belongs to the Department of Computer Science at UBC. Whenever I ask you to connect to the remote server, this is where you can find the databases for our course. The server stores course databases (whose names are followed by `_dsci513`) as well as personal databases for each one of us (instructor + TAs + students) named using our CWLs. We will regularly use the course databases in the lectures and the labs, but not the personal databases.

For regular databases exposed to the world-wide web, you only need a **host address**, **port**, **username** and **password** to connect to the database. However, since the CS servers at UBC are on an internal network for security reasons, the connection procedure involves one other step. We need to forward the server's port to our local computer through a secure tunnel, which is done using an SSH (secure shell) connection. It is worth mentioning that this is a situation that you may commonly encounter in many organizations to access private databases, so it is worth learning how to do it.

#### Your CS account

First of all, you need to activate your **student** account on CS servers through this [link](https://www.cs.ubc.ca/getacct/). Without doing this first, you won't be able to do any of the following steps.

#### Your login info

You are provided with two sets of login information:

1. One set for logging into the UBC CS network, and
2. Another set for logging into the Postgres server that resides on the UBC CS network.

1) For logging into the **UBC CS network**, you should use:
- Your CWL id as username
- Your CWL password as password

2) For logging into the **Postgres server** once you are on UBC CS network, you should use:
- Your CWL id as username
- Your password is set to `a<student_number>` by default. For example, if your student number is `01000111`, your password will be `a01000111`.

#### Logging into UBC CS network

##### Mac and Linux

In order to do the port forwarding to access our Postgres server on the UBC CS network, run the following command in a terminal:

> ```shell
> ssh -l <CWL> -L localhost:5433:pgserver.students.cs.ubc.ca:5432 remote.students.cs.ubc.ca
> ```

This will try to open a connection on `remote.students.cs.ubc.ca`, and will prompt you for your CWL password. 

##### Windows

You need to download and install [PuTTY](https://www.chiark.greenend.org.uk/~sgtatham/putty/latest.html) to be able to establish a SSH connection.

Once downloaded and installed, launch PuTTY. Your screen will look something like this:

![](./img/putty_main_page.png)

Now create a connection by following these steps:

- Type `remote.students.cs.ubc.ca` in the field labelled "Hostname" or IP Address"
- Type in a name you like to reference this connection in the "Saved Sessions" field. Click the "Save" button, and select the new session. Your screen should look like this:

![](./img/putty_added_server.png)

Now set the tunnelling configuration:

- Select "Session" from the left pane,
- Click "Load",
- Expand `Connections -> SSH -> Tunnels`
- In the "Source Port" field type `5433`
- In "Destination" field type `pgserver.students.cs.ubc.ca:5432`
- Click the "Add" button, which should make your screen look like this:

![](./img/putty_port_forwarding.png)

- Now go back to the "Session" section again and click the "Save" button.
- Click "Open" to launch a session.

#### How to connect to the database

If the connection is successfully established, you'll be logged on your account on the CS servers. It also means that now you can access the remote database on your `localhost` with the port `5433` (because of port forwarding) using pgAdmin, `ipython-sql`, or any other client program.

The labs and lectures in this course are shipped with a `credentials.json` file, which is simply supposed to store the host address, port, username and password. This is because **you should never ever store sensitive information such as usernames and passwords in a notebook or code file**. Instead of inputting our login info right in the notebook, we will read this information from `credentials.json` which we keep it only on our personal computer.

---
**Important:**

Make sure to add `credentials.json` to your `.gitignore_global` file located in your home directory, so you don't accidentally commit your sensitive information to Github.

---

After establishing a secure connection to the CS network, make sure that your `credentials.json` file looks like this:

```json
{
  "host": "localhost",
  "port": 5433,
  "user": "<CWL>",
  "password": "<your_password>"
}
```

### Using course databases locally

It's not that you'll always only use remote databases, sometimes you have local databases on the local Postgres server which is installed on your own computer. For this course, I have also provided [**dump files**](https://www.postgresql.org/docs/current/backup-dump.html) so that if for any reason you can't connect to the remote course database, you can recreate the databases on your own computer. In that case, you just need to go through the following steps.

Suppose that you want to recreate the `world_dsci513` database on your own computer:

- Open pgAdmin. In the left-hand side browser, you'll see a server called `localhost`. Double click on that to open it.
- Now you should see a list of local databases, which at this point should only be `postgres`. Right-click on the `Databases`, then choose `Create => Database...`.
- In the opened window, you'll see options for database creation. In the `Database` field, type `world_dsci513` as its name. It is not required to use this particular name, but to be consistent with the names used on the remote server, this is the better thing to do.
- Click `Save`. The database is now created.

Now you should restore the `world_dsci513` database using the dump file:
- Download the dump file named `databases/world_dsci513.dump` from the course repo [here](https://github.ubc.ca/mds-2021-22/DSCI_513_database-data-retr_students/tree/master/databases).
- In pgAdmin, right-click on your newly created `world_dsci513` database, and choose `Restore...`.
- In the window that opens, click on the three dots `...` in front of the `Filename` field. Choose `world_dsci513.dump` that you downloaded earlier, and click `Restore`.
- You should see a success message on the bottom right of the pgAdmin window
- Viola! you now have the database on your own computer.

When you want to connect to databases on your own computer, you can obviously skip the first step discussed above (i.e. creating SSH connection, etc). The only thing to remember is that you can connect to local databases on your computer using the default Postgres port `5432`. Furthermore, you need to use the username `postgres` and the password that you have setup for this user when you installed Postgres on your computer the first time.

You can test your connection using the following cells:

In [3]:
%load_ext sql

In [7]:
import json
import urllib.parse

with open('credentials.json') as f:
    login = json.load(f)
    
username = login['user']
password = urllib.parse.quote(login['password'])
host = login['host']
port = login['port']

In [None]:
%sql postgresql://{username}:{password}@{host}:{port}/imdb_dsci513

In [None]:
%sql SELECT version();