# Authentication
___________________________________________


In this tutorial, we'll explain to you how to **start the `BastionLab server` with user authentication enabled** and **connect the `client`** to it. We'll execute simple queries on BastionLab's central object, the `RemoteDataFrame`, and we'll show that the **authentication works**, by creating a connection to the server with an non-authenticated identity.

It is important to note that BastionLab has authentication enabled *by default*. But you can also use it without authentication, which is a perfectly fine setting when deploying locally. To do that, you need to export an evironment variable before you run the server, by running `export DISABLE_AUTHENTICATION=1`.

If you’re not deploying locally, you’ll need authentication to help secure access to the server. This means only known users will be able to access and use it.

To do so with BastionLab, we'll use public key authentication. We will also learn how to set up authentication and create 'identities', which are BastionLab’s abstraction for authentication. The `Identity` interface creates both users’ public and private keys.

> *If you want to know more on both the queries and the `RemoteDataFrames`, you can check out the Data Scientist's side in our [Quick tour](https://bastionlab.readthedocs.io/en/latest/docs/quick-tour/quick-tour/).* 

## Pre-requisites
___________________________________________ 

### Installation

In order to run this notebook, we need to:
- Have [Python3.7](https://www.python.org/downloads/) (or greater) and [Python Pip](https://pypi.org/project/pip/) installed
- Have [Docker](https://www.docker.com/) installed ([here's the official tutorial](https://docker-curriculum.com/))
- Install [BastionLab](https://bastionlab.readthedocs.io/en/latest/docs/getting-started/installation/)

We'll do so by running the code block below. 

>You can see our [Installation page](https://bastionlab.readthedocs.io/en/latest/docs/getting-started/installation/) to check if PyPi and Docker are the best method for you.

In [None]:
# pip packages
!pip install bastionlab

### Launch the server

In [None]:
!docker pull mithrilsecuritysas/bastionlab:latest

## Setting up the keys
____________________________________________________________________

In an authentication-enabled environment, BastionLab **only accept request from verified users** (known users whose public keys have been registered to the server at start-up). 

Authentication is done with **asymmetric cryptography**: 
- First the data owners provides a list of authorized public keys to the server at start-up.
- Then all users must provide their corresponding private key to the client when they connect to the server.

### Identity creation

BastionLab provides a utility module to manage the keys. We'll show how to create the public and private keys for a single user. To create and manage key pairs (the corresponding public and private keys), we'll use the `Identity` class of the `client`.

In this section, we will create two identities: 
- one for the `data_owner`
- one for the `data_scientist`

> *Note* - The keys generated by the `Identity` class are placed in the current working directory.

In [1]:
from bastionlab import Identity

# Create `Identity` for data owner.
data_owner = Identity.create("data_owner")

# Create `Identity` for data scientist.
data_scientist = Identity.create("data_scientist")

# Fake `Identity` used for testing authentication
fake_scientist = Identity.create("fake_scientist")

Now that we have setup our identities, we will have to start the server with the **public keys** of the data owner and the data scientist.

> *Note* - This step will have to be done by the party setting up the server, commonly the **data owner**. They will have to get all the public keys of the interested parties.

### BastionLab server public keys structure

Illustrated below is the directory structure of **BastionLab server**.

```sh
keys/
├─ owners/
├─ users/
```

> By convention, `keys` is used as the default directory to store public keys.

For ease of use, it's best to have have a directory structre for your public keys similar to that of BastionLab server.

To that end, run the following commands that will create the relevant directory structure.

In [None]:
!mkdir -p keys/owners keys/users

For the purpose of this tutorial, we'll copy both the public keys of the data owner and data scientist.

In [None]:
!cp data_owner.pub keys/owners
!cp data_scientist.pub keys/users

### Starting BastionLab server with public keys

In [None]:
!docker run -it -p 50056:50056 -v $(pwd)/keys:/app/bin/keys mithrilsecuritysas/bastionlab:latest

### Setting up an authenticated connection

This tutorial is essentially the same as in the [Quick tour](https://bastionlab.readthedocs.io/en/latest/docs/quick-tour/quick-tour/) but using an **authenticated** connection.

> *Note* - Please remember that we use `data_owner` from [Identity Creation](#identity-creation)



In [3]:
!wget 'https://raw.githubusercontent.com/datasciencedojo/datasets/master/titanic.csv'

--2023-01-04 16:03:34--  https://raw.githubusercontent.com/datasciencedojo/datasets/master/titanic.csv
Resolving raw.githubusercontent.com (raw.githubusercontent.com)... 185.199.108.133, 185.199.111.133, 185.199.110.133, ...
Connecting to raw.githubusercontent.com (raw.githubusercontent.com)|185.199.108.133|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 60302 (59K) [text/plain]
Saving to: ‘titanic.csv’


2023-01-04 16:03:35 (486 KB/s) - ‘titanic.csv’ saved [60302/60302]



## Data Owner's side
__________________________________________________________________________________



### Upload the data frame to the BastionLab Client

First, we load the csv file:

In [2]:
import polars as pl

df = pl.read_csv("titanic.csv")

We then open an authenticated connection to the server by providing its hostname and the identity:

In [3]:
from bastionlab import Connection

connection = Connection("localhost", identity=data_owner)

>*For more details on both operations, you can check the Data Owner's side of the [Quick tour tutorial](../getting-started/quick-tour.ipynb).*

### The `Connection` class

The `Connection` class accepts as arguments the `hostname` and `identity`. 

- `hostname`: This is the address of the BastionLab server to which we are connecting. Since we host locally the server, we use `localhost`.

- `identity`: The `Identity.create` method returns `SigningKey` or a _private key_ and this is used to establish a connection with the server.
  - The BastionLab client uses the `SigningKey` to sign a special message (this message contains a unique challenge message requested from the server and some other metadata).
  - The signed message is then sent to the server along with the hash of the public key of the client.
  - If the server has that public key (either in `keys/owners` or `keys/users`), it verifies the signed message using the corresponding public key.
  - If the verification passes, a session is created between the server and client and then a **session token** is sent back to the client.
  - If it fails, a connection isn't established and the server throws an error that *"The user isn't authenticated"*.

To avoid having to authenticate every single operation, which would be tedious and not significantly improve the security, BastionLab allows the user to remain authenticated for as long as the session doesn’t expire. 

To do so, a token is sent from the server to the user after the creation of a connection and silently appended to each subsequent request to the server. For example, once authenticated, the user can `send_df` requests to any of BastionLab’s endpoints without needing to add their identity.


In [4]:
from bastionlab.polars.policy import Policy, Aggregation, Log

policy = Policy(
    safe_zone=Aggregation(min_agg_size=10), unsafe_handling=Log(), savable=True
)
connection.client.polars.send_df(df, policy=policy, sanitized_columns=["Name"])

FetchableLazyFrame(identifier=92f6366d-461e-4b55-acf1-e054dfdce06e)

### Deleting a dataframe

The data owner can also delete dataframes on the server and it's important to note that they are the *only* one with the right to perfom a deletion.

Let's see how it works by sending a dataframe and then deleting it. We’ll use the `delete_df()` method and give it the RemoteDataFrame `identifier` argument. Then we'll list the dataframes available on the server before and after the deletion to test it.

In [5]:
# We create a RemoteLazyFrame with our dataset
rdf = connection.client.polars.send_df(df, policy=policy, sanitized_columns=["Name"])

# We test it's been created
print(connection.client.polars.list_dfs())

# We delete the dataframe using the delete method
rdf.delete()

# We can't find it in the list now: it's been deleted!
print(connection.client.polars.list_dfs())

[FetchableLazyFrame(identifier=fbc57a76-39e7-43dc-8d0a-a595f2c752a8), FetchableLazyFrame(identifier=92f6366d-461e-4b55-acf1-e054dfdce06e)]
[FetchableLazyFrame(identifier=92f6366d-461e-4b55-acf1-e054dfdce06e)]


## Data Scientist's side
__________________________________________________________________________________


The data owner is not the only one that has to connect. Using the same method as in the previous section, here’s how the data scientist can set up a connection to the server using their `Identity`. 

First, we’ll ask the server for a list of all the data frames, then select the first `RemoteDataFrame` to be used for the rest of the analysis: 

In [6]:
connection = Connection("localhost", identity=data_scientist)

client = connection.client

all_rdfs = client.polars.list_dfs()

rdf = all_rdfs[0]

all_rdfs

[FetchableLazyFrame(identifier=92f6366d-461e-4b55-acf1-e054dfdce06e)]

> *Note - If you want to know more about those queries, they are well-detailed in the Data Scientist's side section of the [Quick tour](https://bastionlab.readthedocs.io/en/latest/docs/getting-started/quick-tour/).*

### Running Queries

As an example, let’s do a simple operation: read the first five (5) elements of the `RemoteDataFrame`.

In [7]:
rdf1 = rdf.head(5)
print(rdf1)

rdf2 = rdf1.collect()
print(rdf2)

RemoteLazyFrame
FetchableLazyFrame(identifier=2f9eb018-e429-4220-9a71-38467eafa896)


It works!

## Testing a non-authenticated user
__________________________________________________________________________________


Now, let's try to connect to server with an `Identity` unknown to the server.

In [8]:
connection.close()

connection = Connection("localhost", identity=fake_scientist)
policy = Policy(
    safe_zone=Aggregation(min_agg_size=10), unsafe_handling=Log(), savable=True
)
connection.client.polars.send_df(df, policy=policy, sanitized_columns=["Name"])

_InactiveRpcError: <_InactiveRpcError of RPC that terminated with:
	status = StatusCode.PERMISSION_DENIED
	details = ""79067ba35423057fce2950b2f685733ded4d98b60fdde618d24db8a583522b9a" not authenticated!"
	debug_error_string = "{"created":"@1673019761.451517774","description":"Error received from peer ipv4:127.0.0.1:50056","file":"src/core/lib/surface/call.cc","file_line":966,"grpc_message":""79067ba35423057fce2950b2f685733ded4d98b60fdde618d24db8a583522b9a" not authenticated!","grpc_status":7}"
>

In [9]:
connection.close()