# Tutorial 1: A basic avatarization

In this tutorial, we will connect to a server to perform the avatarization of a dataset that does not require any pre-processing. We'll retrieve the anonymized dataset and the associated avatarization report. 

## Connection

In [None]:
import os

url=os.environ.get("AVATAR_BASE_URL")
username=os.environ.get("AVATAR_USERNAME")
password=os.environ.get("AVATAR_PASSWORD")

In [None]:
# This is the client that you'll be using for all of your requests
from avatars.client import ApiClient
from avatars.models import AvatarizationJobCreate, AvatarizationParameters
from avatars.models import ReportCreate

# The following are not necessary to run avatar but are used in this tutorial
import pandas as pd
import io

# Change this to your actual server endpoint, e.g. base_url="https://avatar.company.com"
client = ApiClient(base_url=url)
client.authenticate(
    username=username, password=password
)

# Verify that we can connect to the API server
client.health.get_health()

## Loading data

We recommend loading your csv file as a pandas dataframe. It enables you to check your data before avatarization and to pre-process it if required. 

In this tutorial, we use the simple and well-known `iris` dataset to demonstrate the main steps of an avatarization.

In [None]:
df = pd.read_csv("../fixtures/iris.csv")

In [None]:
df

In [None]:
dataset = client.pandas_integration.upload_dataframe(df)
print(dataset)

The data has now been loaded onto the server. 

Note that it is also possible to directly load a csv file without using pandas. 

In [None]:
filename = "../fixtures/iris.csv"

with open(filename, "r") as f:
    
    dataset = client.datasets.create_dataset(request=f)
print(dataset)

## Analyze your data

A tool to analyze the data prior to an avatarization is provided. It computes several statistics that can be useful to:
- confirm that the data loaded is as expected and
- give insight on potential transformation to the data that are required (this will be covered in later tutorials)

In [None]:
dataset

In [None]:
while dataset.summary is None:
    dataset = client.datasets.analyze_dataset(dataset.id)

In [None]:
print(dataset.summary)

In [None]:
for var in dataset.summary.stats:
    print('---------')
    for stat in var:
        print(stat)

## Creating and launching an avatarization job

In [None]:
job = client.jobs.create_avatarization_job(
    AvatarizationJobCreate(
        parameters=AvatarizationParameters(
            k = 5,
            dataset_id=dataset.id
        ),
    )
)

print(job.status)

## Retrieving the completed avatarization job

In [None]:
job = client.jobs.get_avatarization_job(id=job.id)

print(job.status)

## Retrieving the avatars

In [None]:
# Download the avatars as a string
avatars_str = client.datasets.download_dataset(job.result.avatars_dataset.id)

# Download the avatars as a pandas dataframe
avatars_df = client.pandas_integration.download_dataframe(job.result.avatars_dataset.id)

In [None]:
print(avatars_str)

In [None]:
print(avatars_df)

## Retrieving the utility and privacy metrics

Because this dataset did not require any pre-processing or post-processing outside the avatarization job, the metrics calculated at the end of the avatarization job can directly be used.

In [None]:
privacy_metrics = job.result.privacy_metrics
print("*** Privacy metrics ***")
for metric in privacy_metrics:
    print(metric)

In [None]:
utility_metrics = job.result.signal_metrics
print("*** Utility metrics ***")
for metric in utility_metrics:
    print(metric)

## Retrieving the avatarization report

In [None]:
report = client.reports.create_report(ReportCreate(job_id=job.id))
result = client.reports.download_report(id=report.id)

with open("./my_avatarization_report.pdf", "wb") as f:
    f.write(result)

The report is now generated and available on your machine

*In the next tutorial, we will show how to parameterize an avatarization.*