# Using the API with sensitive data stored locally or on a secure server (Jinkompute)

## Introduction

One might want to compare simulation outputs with sensitive real-life data without uploading the latter on jinko and relying the data-overlay features. This cookbook will show how to load data stored either locally, or on a secure FTP. As an example, we will display the latter side by side with simulated data coming from jinko.

In [None]:
# Jinko specifics imports & initialization
# Please fold this section and do not change
import jinko_helpers as jinko

# Connect to Jinko (see README.md for more options)
jinko.initialize()

In [None]:
# Cookbook specific imports
import io
import json
import numpy as np
import os
import pandas as pd
import plotly.graph_objects as go
import zipfile

# Cookbook specific constants:
# Put here the constants that are specific to your cookbook like
# the reference to the Jinko items, the name of the model, etc.

# The trial's short id can be retrieved in the url, pattern is `https://jinko.ai/<trail_sid>`
trial_sid = 'tr-OxkW-mB8I'

## Step 1 : Loading simulated data

In [None]:
core_item_id = jinko.get_core_item_id(trial_sid)
trial_versions = jinko.make_request(
    f'/core/v2/trial_manager/trial/{core_item_id['id']}/status', params={"statuses": "completed"}
).json()

In [None]:
if trial_sid is None:
    raise Exception('Please specify a Trial Id')
else:
    print(f'Using Trial ID: {trial_sid}')

# Converting short Id to coreItemId
try:
    core_item_id = jinko.get_core_item_id(trial_sid, 1)
except Exception as e:
    print(f'Failed to find corresponding trial, check the trial_id')
    raise

# Listing all trial versions
try:
    trial_versions = jinko.make_request(
        f'/core/v2/trial_manager/trial/{core_item_id['id']}/status', params={"statuses": "completed"}
    ).json()
    print(f'Fetched {len(trial_versions)} completed versions for the trial.')
except Exception as e:
    print(f'Error fetching trial versions: {e}')
    raise

# Get the latest completed version
try:
    latest_completed_version = next(
        (item for item in trial_versions if item['status']
         == 'completed'), None
    )
    if latest_completed_version is None:
        raise Exception('No completed Trial version found')
    else:
        print(
            'Successfully fetched this simulation:\n',
            json.dumps(latest_completed_version, indent=1),
        )
        # Store the trial Id and the snapshot Id to use in the API requests
        simulation_id = latest_completed_version['simulationId']
        trial_core_item_id = simulation_id['coreItemId']
        trial_snapshot_id = simulation_id['snapshotId']
except Exception as e:
    print(f'Error processing trial versions: {e}')
    raise

# Retrieving results summary 
responseSummary = jinko.get_trial_scalars_summary(trial_core_item_id, trial_snapshot_id, print_summary=False)

# Extracting arm names
arm_names = responseSummary['arms']

# Storing the list of scenario descriptors to fetch them
scenarioDescriptors = [
    scalar['id']
    for scalar in (responseSummary['scalars'] + responseSummary['categoricals'])
    if 'ScenarioOverride' in scalar['type']['labels']
]
print('List of scenario overrides:\n', scenarioDescriptors, '\n')

# Downloading time series
response = jinko.make_request(
    '/core/v2/trial_manager/trial/%s/snapshots/%s/output_ids'
    % (trial_core_item_id, trial_snapshot_id),
    method='GET',
)
responseSummary = json.loads(response.content.decode('utf-8'))
print('Available time series:\n', responseSummary, '\n')

idsForTimeSeries = [x['id'] for x in responseSummary]

try:
    print('Retrieving time series data...')
    response = jinko.make_request(
        "/core/v2/result_manager/trial/%s/snapshots/%s/timeseries/download" % (
            trial_core_item_id, trial_snapshot_id
        ),
        method='POST',
        json={
            "timeseries": {ts: arm_names for ts in idsForTimeSeries},
        },
    )
    if response.status_code == 200:
        print('Time series data retrieved successfully.')
        archive = zipfile.ZipFile(io.BytesIO(response.content))
        filename = archive.namelist()[0]
        print(f'Extracted time series file: {filename}')
        csvTimeSeries = archive.read(filename).decode('utf-8')
    else:
        print(
            f'Failed to retrieve time series data: {response.status_code} - {response.reason}'
        )
        response.raise_for_status()
except Exception as e:
    print(f'Error during time series retrieval or processing: {e}')
    raise

## Step 2 : Post-processing simulations

In [None]:
# Loading timeseries into a dataframe
df_time_series = pd.read_csv(io.StringIO(csvTimeSeries))

print(df_time_series['Patient Id'].unique())
display(df_time_series.head())

# Pivotting to a wide format to obtain protocol overrides in columns
df_time_series = df_time_series.pivot(
    index=['Patient Id', 'Arm', 'Time'], columns='Descriptor', values='Value'
)

df_time_series = df_time_series.drop(columns=['Time'])
df_time_series = df_time_series.reset_index()

# Converting time to days
df_time_series['Time'] = df_time_series['Time'].map(
    lambda x: x / (60 * 60 * 24))

print('Timeseries data (first rows): \n')
display(df_time_series.head())

## Step 3 : Importing local data
### Step 3.1 : Creating fake real-life observations

In [None]:
# Creating 'fake real-life' data based on the simulated one with added noise.
# Only the drug dose variable will be used in this cookbook.
df_fake_time_series = df_time_series[['Arm', 'Blood.Drug', 'Time']].copy()
display(df_fake_time_series.head())

# Adding positive gaussian noise to blood drug values
df_fake_time_series['Blood.Drug'] = df_fake_time_series['Blood.Drug'].apply(
    lambda x: x + x * 0.5 * np.random.random_sample(1))
df_fake_time_series['Blood.Drug'] = np.around(
    np.concatenate(df_fake_time_series['Blood.Drug']), 6).tolist()
display(df_fake_time_series.head())

# Writing the csv file at the location of the cookbook
current_path = os.getcwd()
df_fake_time_series.to_csv(path_or_buf=current_path + '/fake_real_data.csv')
del df_fake_time_series

### Step 3.2 : Loading the data
#### Local data

In [None]:
# As mentionnend in the previous chunk, for convenience, data is stored in the same folder as the jupyter notebook
df_real_data = pd.read_csv('fake_real_data.csv')

# Removing the csv now that the data has been loaded
os.remove('fake_real_data.csv')

#### Data on jinkompute

To load data from Jinkompute, the procedure to follow would be the same once the connection to an instance has been established. If you are working remotely, you will first have to connect to the VPN via [TailScale](https://docs.google.com/document/d/1n4wvFvEO-cVJxi5TLpIgqP3C_l5HB5IqvfjwStRLhaw/edit), then use the `jinkompute-mount` command to mount the Jinkompute server as a local hard drive.

### Step 4 : Using the data
Now that the data has been loaded, one is free to use it in various ways. For this cookbook, we will simply plot a part of our 'real-life' data side by side with the simulations, as an example. Only the iv-0.1-10 dose will be used.

In [None]:
# Plotting the data
# Creating the initial figure
fig = go.Figure()

# Adding the first line, representing simulated data
fig.add_trace(go.Scatter(x=df_time_series['Time'][df_time_series['Arm'] == 'iv-0.1-10'],
                         y=df_time_series['Blood.Drug'][df_time_series['Arm']
                                                        == 'iv-0.1-10'],
                         mode='lines',
                         name='Simulated data',
                         line=dict(color='red')))

# Adding the second line for observed data
fig.add_trace(go.Scatter(x=df_real_data['Time'][df_real_data['Arm'] == 'iv-0.1-10'],
                         y=df_real_data['Blood.Drug'][df_real_data['Arm']
                                                      == 'iv-0.1-10'],
                         mode='lines',
                         name='Observed data',
                         line=dict(color='blue')))

# Updating the labels
fig.update_layout(
    title='Comparison of simulated and observed blood drug concentrations',
    xaxis_title='Time (days)',
    yaxis_title='Drug concentration (µg/mL)'
)

# Displaying the figure
fig.show()