<div id="colab_button">
  <h1>Memory quotas and deletion of dataframes</h1>
  <a target="_blank" href="https://colab.research.google.com/github/mithril-security/bastionlab/blob/v0.3.6/docs/docs/tutorials/memory_quotas.ipynb"> 
  <img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>
</div>

________________________

Memory quotas define how much memory each user can occupy on the server.
This tutorial demonstrates how memory quotas work and how to delete dataframes to free memory. 

Data owners can set the desired memory quota (in bytes) in the config.toml before launching the server.

<b>Memory quotas only work when authentication is enabled.</b>

## Pre-requisites
___________________________________________

### Installation and dataset

In order to run this notebook, we need to:
- Have [Python3.7](https://www.python.org/downloads/) (or greater) and [Python Pip](https://pypi.org/project/pip/) installed
- Install [BastionLab](https://bastionlab.readthedocs.io/en/latest/docs/getting-started/installation/)
- Download [the dataset](https://www.kaggle.com/competitions/titanic) we will be using in this tutorial.

We'll do so by running the code block below. 

>If you are running this notebook on your machine instead of [Google Colab](https://colab.research.google.com/github/mithril-security/bastionlab/blob/v0.3.6/docs/docs/tutorials/memory_quotas.ipynb), you can see our [Installation page](https://bastionlab.readthedocs.io/en/latest/docs/getting-started/installation/) to find the installation method that best suits your needs.

In [1]:
# pip packages
!pip install bastionlab
!pip install bastionlab_server

# download the Titanic dataset
!wget 'https://raw.githubusercontent.com/datasciencedojo/datasets/master/titanic.csv'

--2023-02-14 17:42:28--  https://raw.githubusercontent.com/datasciencedojo/datasets/master/titanic.csv
Resolving raw.githubusercontent.com (raw.githubusercontent.com)... 2606:50c0:8002::154, 2606:50c0:8000::154, 2606:50c0:8003::154, ...
Connecting to raw.githubusercontent.com (raw.githubusercontent.com)|2606:50c0:8002::154|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 60302 (59K) [text/plain]
Saving to: ‘titanic.csv.6’


2023-02-14 17:42:28 (12.4 MB/s) - ‘titanic.csv.6’ saved [60302/60302]



Our dataset is based on the Titanic dataset, one of the most popular datasets used for understanding machine learning which contains information relating to the passengers aboard the Titanic.

In [1]:
from bastionlab import Identity

# Create `Identity` for data owner.
data_owner = Identity.create("data_owner")

# Create `Identity` for data scientist.
data_scientist = Identity.create("data_scientist")

### Launch and connect to the server

In [2]:
# launch bastionlab_server test package
import bastionlab_server

# the True paramter turns authentication on for the server
srv = bastionlab_server.start(True)

BastionLab server (version 0.3.7) already installed
Libtorch (version 1.13.1) already installed
TLS certificates already generated
Bastionlab server is now running on port 50056


[2023-02-15T08:56:32Z INFO  bastionlab] Authentication is enabled.
[2023-02-15T08:56:32Z INFO  bastionlab] Telemetry is disabled.
[2023-02-15T08:56:32Z INFO  bastionlab] Successfully loaded saved dataframes
[2023-02-15T08:56:32Z INFO  bastionlab] BastionLab server listening on 0.0.0.0:50056.
[2023-02-15T08:56:32Z INFO  bastionlab] Server ready to take requests


>*Note that the bastionlab_server package we install here was created for testing purposes. You can also install BastionLab server using our Docker image or from source (especially for non-test purposes). Check out our [Installation Tutorial](../getting-started/installation.md) for more details.*

It's important to note that in a typical workflow, the data owner would send a set of keys to the server, so that authorization can be required for all users at the point of connection. **BastionLab offers the authorization feature**, authorization must be enabled for memory quotas to work. When launching your own server, you can refer to the [authentication tutorial](https://bastionlab.readthedocs.io/en/latest/docs/tutorials/authentication/) to set it up.

In [3]:
# connecting to the server
from bastionlab import Connection

connection = Connection("localhost", identity=data_scientist)
client = connection.client

### Upload the dataframe to the server

We'll quickly upload the dataset to the server with an open safety policy, since setting up BastionLab is not the focus of this tutorial. It will allows us to demonstrate features without having to approve any data access requests. You can check out how to define a safe privacy policy [here](https://bastionlab.readthedocs.io/en/latest/docs/tutorials/defining_policy_privacy/).

In [None]:
import polars as pl
from bastionlab.polars.policy import Policy, TrueRule, Log

df = pl.read_csv("titanic.csv")

policy = Policy(safe_zone=TrueRule(), unsafe_handling=Log(), savable=True)
rdf = client.polars.send_df(df, policy=policy, sanitized_columns=["Name"])

rdf

<div class="warning">
<b>This policy is not suitable for production.</b> Please note that we <i>only</i> use it for demonstration purposes, to avoid having to approve any data access requests in the tutorial. <br></div> <br>

We'll check that we're properly connected and that we have the authorizations by running a simple query:

In [14]:
per_class_rates = (
    rdf.select([pl.col("Pclass"), pl.col("Survived")])
    .groupby(pl.col("Pclass"))
    .agg(pl.col("Survived").mean())
    .sort("Survived", reverse=True)
    .collect()
)

In [15]:
client.polars.list_dfs()

[FetchableLazyFrame(identifier=44cab93b-d18a-4cf1-a09d-3038c79a982c),
 FetchableLazyFrame(identifier=16fb1a27-bae6-457f-a321-6166f8ca7a68)]

## Deleting the dataframe
_______________________________________

Now let's delete the resulting dataframe from our previous operation. This will delete the dataframe from the memory of the server, if the dataframe was previously saved using save() for persistence, it will be deleted from local storage as well.

<b> Data owners can delete any dataframe. Data scientists can only delete dataframes that they created, for example as a result of an operation. </b>

In [16]:
per_class_rates.delete()

client.polars.list_dfs()

[FetchableLazyFrame(identifier=44cab93b-d18a-4cf1-a09d-3038c79a982c)]

We see that the deleted dataframe is no longer available on the server.

Finally, close the connection.

In [17]:
# connection.close()
# bastionlab_server.stop(srv)

BastionLab's server already stopped


False