# Authentication

In this tutorial, we are going to start the `BastionLab server` with user authentication enabled and then connect the `client` to it.

Thereafter, we will execute simple queries on a Remote DataFrame (RDF), as used in the [Quick Tour of BastionLab](../quick-tour/quick-tour.ipynb). 

As an addition, we will prove that the authentication works by creating a connection to the server with an unknown identity.

## Installing BastionLab Client from PyPi

In [None]:
! pip install polars bastionlab

## Installing BastionLab Server

### Using the official docker image

In [None]:
!docker pull mithrilsecuritysas/bastionlab:latest

## Setting up the keys

In an authentication-enabled environment, BastionLab only accept request from verified users (i.e., _known users whose public keys have been registered to the server at start-up_). 

Authentication is done with asymmetric cryptography: 
- the data owners provides a list of authorized public keys to the server upon start-up;
- all users must provide their corresponding private key to the client when they connect to the server. 
  
The client then transparently creates a session for the user which is refreshed, by default, every _25 minutes_.

BastionLab also provides a utility module to manage the keys. We will use it to create the public and private keys for a single user.

The `Identity` class of the `client` is used to create and manage key pairs (corresponding public and private keys).

### Identity creation

In this sub-section, we will create two identities: 
- one for the data owner;
- one for the data scientist;

> NB: The keys generated by the `Identity` class are placed in the current working directory.

In [1]:
from bastionlab import Identity

# Create `Identity` for Data owner.
data_owner = Identity.create("data_owner")

# Create `Identity` for Data Scientist.
data_scientist = Identity.create("data_scientist")

# Fake `Identity` used for testing authentication
fake_scientist = Identity.create("fake_scientist")

Now that we have setup our identities, we will have to start the server with both the **public key** of the data owner and the data scientist.

> Please note that this step will have to be done by the party setting up the server, commonly the **data owner**. They will have to get all the public keys of the interested parties.

### BastionLab Server Public Keys Structure.

Illustrated below is the directory structure of **BastionLab server**.

```sh
keys/
├─ owners/
├─ users/
```

> By convention, `keys` is used as the default directory to store public keys.

For ease of use, it's best to have have a directory structre for your public keys similar to that of BastionLab server.

To that end, run the following commands that will create the relevant directory structure.

In [3]:
!mkdir -p keys/owners keys/users

For the purpose of this tutorial, we copy both the public keys of the data owner and data scientist.

In [4]:
!cp data_owner.pub keys/owners
!cp data_scientist.pub keys/users

# Starting BastionLab Server with Public Keys.

In [None]:
!docker run -it -p 50056:50056 -v $(pwd)/keys:/app/bin/keys mithrilsecuritysas/bastionlab:latest

## Setting up an Authenticated Connection

This tutorial is essentially the same as [Quick Tour](../quick-tour/quick-tour.ipynb) but using an **authenticated** connection.

> NB: Please remember that we use `data_owner` from [Identity Creation](#identity-creation)

In [5]:
!wget 'https://raw.githubusercontent.com/datasciencedojo/datasets/master/titanic.csv'

--2022-12-07 13:06:44--  https://raw.githubusercontent.com/datasciencedojo/datasets/master/titanic.csv
Resolving raw.githubusercontent.com (raw.githubusercontent.com)... 185.199.110.133, 185.199.111.133, 185.199.108.133, ...
Connecting to raw.githubusercontent.com (raw.githubusercontent.com)|185.199.110.133|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 60302 (59K) [text/plain]
Saving to: ‘titanic.csv’


2022-12-07 13:06:45 (8.57 MB/s) - ‘titanic.csv’ saved [60302/60302]



# Data Owner's Side



### Upload the data frame to the BastionLab Client

As a reminder, we load the csv file and perform all other operations just like we do in [Quick Tour](../quick-tour/quick-tour.ipynb) 

In [2]:
import polars as pl

df = pl.read_csv("titanic.csv")

We then open an authenticated connection to the server by providing its hostname and the identity.

In [3]:
from bastionlab import Connection

connection = Connection("localhost", identity=data_owner)

## Behind the scenes.

The `Connection` class accepts as argument the `hostname` and `identity`. 

- _hostname_: This is the address of the BastionLab server to which we are connecting. Since we host locally the server, we use `localhost`.
- _identity_: The `Identity.create` method returns `SigningKey` or a _private key_ and this is used to establish a connection with the server.
  - The BastionLab client uses the `SigningKey` to sign a special message (this message contains a unique challenge message requested from the server and some other metadata).
  - The signed message is then sent to the server along with the hash of the public key of the client.
  - if the server has that public key (i.e., either in `keys/owners` or `keys/users`), it verifies the signed message using the corresponding public key.
  - If the verification passes, a session is created between the server and client and then a **session token** is sent back to the client.
  - If it fails, a connection isn't established and the server throws an error that "The user isn't authenticated".

### Auto-token append

The client internally appends the session token received from the server in every request. And the server authenticates every call with that.

This means methods can be called on `connection.client.polars.*` without passing the `Identity` per-call.

> By default, the session token is refreshed after every 25 mins.

In [4]:
from bastionlab.polars.policy import Policy, Aggregation, Log

policy = Policy(safe_zone=Aggregation(min_agg_size=10), unsafe_handling=Log())
connection.client.polars.send_df(df, policy=policy, sanitized_columns=["Name"])

FetchableLazyFrame(identifier=feb9b163-4090-441c-b13d-0edf96b24d2a)

# Data Scientist's Side

In [5]:
connection = Connection("localhost", identity=data_scientist)

client = connection.client

all_rdfs = client.polars.list_dfs()

rdf = all_rdfs[0]

all_rdfs

[FetchableLazyFrame(identifier=94a15881-7b9d-465f-9754-735c4f6b9907),
 FetchableLazyFrame(identifier=c63d1ce2-8d87-4ed0-9766-46526f93cfdb),
 FetchableLazyFrame(identifier=feb9b163-4090-441c-b13d-0edf96b24d2a)]

### Running Queries



In [6]:
rdf1 = rdf.head(5)
print(rdf1)

rdf2 = rdf1.collect()
print(rdf2)

RemoteLazyFrame
FetchableLazyFrame(identifier=1027df07-5809-4a63-9548-de28ac1b2ae3)


## Testing Non-authenticated user

Here, we try to connect to server with an `Identity` unknown to the server.

As we expect, it will fail to connect and the server returns `User not authenticated` error.

In [18]:
connection = Connection("localhost", identity=fake_scientist)
policy = Policy(safe_zone=Aggregation(min_agg_size=10), unsafe_handling=Log())
connection.client.polars.send_df(df, policy=policy, sanitized_columns=["Name"])

_InactiveRpcError: <_InactiveRpcError of RPC that terminated with:
	status = StatusCode.PERMISSION_DENIED
	details = ""fd00b3e823d44bef6b9fad10f32d9695da683c5e1d5c5a980ccd2e0db01bdd66" not authenticated!"
	debug_error_string = "{"created":"@1670428727.104907366","description":"Error received from peer ipv6:[::1]:50056","file":"src/core/lib/surface/call.cc","file_line":966,"grpc_message":""fd00b3e823d44bef6b9fad10f32d9695da683c5e1d5c5a980ccd2e0db01bdd66" not authenticated!","grpc_status":7}"
>