# Updating your policy
____________________________

Policies are at the heart of the security protections in BastionLab, allowing you to control who can access your data and to what extent. To learn more about how to create and customize a policy, see the [defining policies tutorial](defining_policies.ipynb).

In this tutorial, we are going to focus on **how to update an existing policy** in BastionLab, so you can easily change the policy of your datasets without having to re-upload them.

But before we dive in, let's do some set-up!

## Pre-requisites
_______________________________

You'll need [Python](https://www.python.org/downloads/) and [Docker](https://www.docker.com/) installed on your machine to run this tutorial. 

> *You can refer to our [Installation section](https://bastionlab.readthedocs.io/en/latest/docs/tutorials/installation/) to check the specific technical requirements needed by BastionLab's Client and Server, as well as other methods to install them if the following ones are not for you.*

We'll first install the packages of the [Polars](https://www.pola.rs/) library as well as [BastionLab](https://bastionlab.readthedocs.io/en/latest/docs/tutorials/installation/)'s. Then we'll launch server with our Docker image.

In [None]:
!pip install polars bastionlab
!docker pull mithrilsecuritysas/bastionlab:latest


## Setting up the keys
________________________________

In this tutorial, we are going to create three authenticated users: the `data_owner`and `data_scientist_1`. To do so, we'll use BastionLab's `Identity` module. It will help us to illustrate how the security log can track different users' actions.

>*To learn about the `Identity` module, you can check out our [authentication tutorial](https://bastionlab.readthedocs.io/en/latest/docs/tutorials/authentication/).*

Note that the keys generated by the `Identity` class are placed in the current working directory.

In [None]:
from bastionlab import Identity

# Create `Identity` for the data owner.
data_owner = Identity.create("data_owner")

# Create `Identity` for data Scientists 1.
data_scientist_1 = Identity.create("data_scientist_1")

Now that we have setup our identities, we will have to start the server with the **public keys** of the data owner and the data scientists.

> *Note - This step will have to be done by the party setting up the server, commonly the **data owner**. They will have to get all the public keys of the interested parties.*

### Setting up the keys directory

We will now set up the `keys` directory that the server will use to verify users.

Let's follow the default schema:

```sh
keys/
├─ owners/
├─ users/
```

In [3]:
!mkdir -p keys/owners keys/users

We can now copy the public keys of the data owner and data scientists into the revelant sub-directories.

In [4]:
!cp data_owner.pub keys/owners
!cp data_scientist_1.pub keys/users

### Starting BastionLab server with public keys

Finally, we can launch the server with our keys directory.

In [5]:
!docker run -it -p 50056:50056 -v $(pwd)/keys:/app/bin/keys mithrilsecuritysas/bastionlab:latest

## Data owner's side : uploading dataset with default policy
__________________________________________


We can finally start our tutorial by putting ourselves into the shoes of the data owner.

First, let's connect to the server!

We open an authenticated connection by providing the server's hostname and our identity:

In [6]:
from bastionlab import Connection

connection = Connection("localhost", identity=data_owner)
client = connection.client

Next, we will create there Polars dataframes and send them to the server. Since we don't specify a policy, the default policy will be used, which stipulates that any information extracted by the user must be an aggregated function of at least 10 rows or else the data owner must approve the action.

I will also store our RemoteLazyFrame's identifier in the variable RDF1 for later use.

In [7]:
import polars as pl

df1 = pl.DataFrame(
    {
        "Quarter": ["Q1", "Q2", "Q3", "Q4"],
        "Sales": [100000, 150000, 75000, 200000],
    }
)

# send using default policy
rdf1 = client.polars.send_df(df1)
RDF1 = rdf1.identifier

If we try to extract the whole dataset by using the `collect` and `fetch` methods, an access request is sent to the data owner. After accepting this request, the RemoteLazyFrame can be displayed.

In [8]:
rdf1.collect().fetch()

Reason: Cannot fetch a result DataFrame that does not aggregate at least 10 rows of DataFrame f23d5eaa-2d4a-450d-b5d2-61c24decf29c.

A notification has been sent to the data owner. The request will be pending until the data owner accepts or denies it or until timeout seconds elapse.[37m
[32mThe query has been accepted by the data owner.[37m


Quarter,Sales
str,i64
"""Q1""",100000
"""Q2""",150000
"""Q3""",75000
"""Q4""",200000


Let's imagine the data owner finds this policy too strict for this dataset- perhaps it does not contain any confidential information. They can simply create a new policy and use the `update_policy()` RemoteLazyFrame method to change it.

Here we create a policy which only needs information extracted to be grouped together as one row, thus, essentially nullifying any aggregation requirements, and will log any unsafe requests. We can instantly see the results of this by trying to display our RemoteLazyFrame again with the `collect()` and `fetch()` methods, which no longer triggers an access request.

In [9]:
from bastionlab import Identity, Connection
from bastionlab.polars.policy import (
    Policy,
    Aggregation,
    Log,
)

policy = Policy(safe_zone=Aggregation(1), unsafe_handling=Log())
rdf1.update_policy(policy)
rdf1.collect().fetch()

Quarter,Sales
str,i64
"""Q1""",100000
"""Q2""",150000
"""Q3""",75000
"""Q4""",200000


We will now close the connection and take a look at the `update_policy()` methods from a data scientist's point of view.

In [10]:
connection.close()

## Data scientist's POV
_______________________________

Now imagine we are a Data scientist working with this dataset, but not as a data owner! 

Firstly, we will connect to the server using the `Identity` we created earlier on. We can fetch the data owner's dataset using the identifier we stored in RDF1 earlier on.

If we run, `collect()` and `fetch()`, we will see that the new more lenient policy is still in place.

In [11]:
connection = Connection("localhost", identity=data_scientist_1)
client = connection.client

rdf4 = client.polars.get_df(RDF1)
rdf4.collect().fetch()

Quarter,Sales
str,i64
"""Q1""",100000
"""Q2""",150000
"""Q3""",75000
"""Q4""",200000


Now imagine that the data scientist thinks this is too weak and wanted to update the policy. Well, simply put, they can't. Only the data owner can change their datasets policies.

We can test out this theory by updating the policy to a stronger policy which requires that any information extracted is an aggregation of at least 10 rows, and then printing out all the contents of rdf1.

This should fail this policy, but since the `update_policy` will ignore any requested changes to the policy requested by users who are not the owners of the dataset in question, the previous lenient policy is still in place.

In [12]:
policy = Policy(safe_zone=Aggregation(10), unsafe_handling=Log())
rdf4.update_policy(policy)
rdf4.collect().fetch()

[31mOnly the data owner can update the policy when authentication mode is enabled[37m


Quarter,Sales
str,i64
"""Q1""",100000
"""Q2""",150000
"""Q3""",75000
"""Q4""",200000


That brings us to the end of this quick introduction to updating policies. We can now close our connection to the server:

In [13]:
connection.close()