# Saving Dataframes with BastionLab

In this tutorial, we will save a dataframe and then restart the server to test whether the dataframe remains available on the server.

## Pre-requisites

### Technical Requirements

To start this tutorial, ensure the following are already installed in your system:
- Python3.7 or greater (get the latest version of Python at https://www.python.org/downloads/ or with your operating system’s package manager)
- [Python Pip](https://pypi.org/project/pip/) (PyPi), the package manager
- [Installed BastionLab Server](https://bastionlab.readthedocs.io/en/latest/docs/tutorials/installation/)

You will need to install polars, bastionlab client and download a data set. 

In [28]:
! pip install polars
! pip install bastionlab

In [29]:
!wget 'https://raw.githubusercontent.com/chingjunetao/medium-article/master/simple-guide-to-data-cleaning/modified_titanic_data.csv'

## Getting set-up

<b> Launch the bastionlab server in a terminal. </b>

We then connect to our sever and send over the dataset.

First, we read in the dataset using Polar's read_csv() function, which returns a Polar's DataFrame instance containing the dataset.

Second, we connect to the server using Bastionlab's Connection() method.

In [32]:
from bastionlab import Connection
import polars as pl

df = pl.read_csv("modified_titanic_data.csv")

connection = Connection("localhost")
client = connection.client

Finally, we send the Polar's DataFrame instance to the server using Bastionlab's polars.send_df() method which will return a RemoteLazyFrame instance, a reference to the DataFrame uploaded which we will be working with throughout this tutorial.

For the sake of this tutorial, we specify an unsafe policy which disables all checks. We set the `unsafe_zone` parameter to `TrueRule()` to allow all requests. In this case, the `unsafe_handling` parameter can be anything (as there are no unsafe requests), we set it to `Log()` in the following example.

Note that this is done only so that we can focus on demonstrating BastionLab's ability to save dataframes without having to worry about approving any data access requests. 

<b>This policy is not suited for production.</b>


In [33]:
from bastionlab.polars.policy import Policy, TrueRule, Log

policy = Policy(safe_zone=TrueRule(), unsafe_handling=Log())
rdf = client.polars.send_df(df, policy=policy, sanitized_columns=["Name"])

rdf

identifier = rdf.identifier

FetchableLazyFrame(identifier=8a745515-1877-4182-ad6a-05be8931addc)

Lets do a simple operation on the data frame.

In [34]:
per_class_rates = (
    rdf.select([pl.col("Pclass"), pl.col("Survived")])
    .groupby(pl.col("Pclass"))
    .agg(pl.col("Survived").mean())
    .sort("Survived", reverse=True)
    .collect()
    .fetch()
)

['PassengerId',
 'Survived',
 'Pclass',
 'Name',
 'Sex',
 'Age',
 'SibSp',
 'Parch',
 'Ticket',
 'Fare',
 'Cabin',
 'Embarked',
 'OnboardTime']

## Saving the data frame

Now we save the resulting dataframe from our previous operation.

In [35]:
per_class_rates.save()
saved_identifier = per_class_rates.identifier

connection.close()

['PassengerId',
 'Survived',
 'Pclass',
 'Name',
 'Sex',
 'Age',
 'SibSp',
 'Parch',
 'Ticket',
 'Cabin',
 'Embarked',
 'OnboardTime']

We will now restart the server and check which dataframes persist in the server.

## Testing Persistence of Dataframes

<b>Terminate the running bastionlab server (ctrl+C), then restart it. </b>

Reconnect to the server and list the available dataframes.

In [36]:
connection = Connection("localhost")
client = connection.client

client.polars.list_dfs()

Age
u32
178


As you can see, the saved dataframe persists on the server.

You can print the dataframe to be sure it's the same one you saved.

In [37]:
retrieved_rdf = client.polars.get_df(saved_identifier)
retrieved_rdf.fetch()

Age
u32
0


Finally, close the connection.

In [38]:
connection.close()

Age
u32
178
