# Comparison of trial curves (survival analysis)

## Introduction

This cookbook will guide you through the creation of a simple visualization of the same output over several trials. Additionnally, it also compares each Kaplan-Meier curves to data with a log rank test. 


Linked resources: [Jinko](https://jinko.ai/project/e0fbb5bb-8929-439a-bad6-9e12d19d9ae4?labels=24574ece-6bde-4d76-896a-187426965a51).

In [None]:
# Jinko specifics imports & initialization
# Please fold this section and do not change
import jinko_helpers as jinko


# This function ensures that authentication is correct
# It it also possible to override the base url by passing baseUrl=...
# If everything is well setup, it should print "Authentication successful"
jinko.initialize()

In [None]:
# Cookbook specifics imports

import io
import pandas as pd
import zipfile
from typing import List, Any
from sksurv.nonparametric import kaplan_meier_estimator
from lifelines.statistics import logrank_test
import matplotlib.pyplot as plt

## Fill with informations on what you want to see

In [None]:
# Cookbook specifics constants

# Fill the short Id of your Trials (ex: tr-EKRx-3HRt)
trialIdList = ["tr-gLnd-8yYx", "tr-f9PT-uDkz"]

# Fill the Id of the biomarkers you want to retrieve. Then, it will be transformed into a Kaplan-Meier curve.
# See Section "Plot the Kaplan Meier curves" in Cell "Load and Plots all trials"
biomarkersId = [
    "timeToClinicalProgression-at-P2Y",
]

# Fill the arm that you want to observe.
arm = "Treated"

# if you want to compare results with data
compare_with_data = True

## Let's use the API and plot the data

### Load data

If you have data, you can load them here. If you don't, you can skip this cell. There is a dummy example of what datas should look like for this script.

In [None]:
# Time at which a patient is censored or encounter the event
durationData = [20, 40, 80, 120, 120, 160, 180, 300, 380, 500, 600]

# Status of the patients: True: event occurred, False: censor. All patients that did not encounter the event are censored
statusData = [True, True, True, False, True, True, False, True, True, True, False]

### Load and plots all trials

In [None]:
for trialId in trialIdList:
    # Retrieve your trial information
    ## Convert short Id to coreItemId
    coreItemId = jinko.get_core_item_id(trialId, 1)
    # Get the last version of your Trial
    ## List all Trial versions
    versions: List[Any] = jinko.make_request(
        "/core/v2/trial_manager/trial/%s/status" % (coreItemId["id"])
    ).json()
    ## Get the latest completed version
    latestCompletedVersion = next(
        (item for item in versions if item["status"] == "completed"), None
    )
    if latestCompletedVersion is None:
        raise Exception("No completed Trial version found")

    dfBiomarkers_raw = jinko.get_trial_scalars_as_dataframe(
        latestCompletedVersion["simulationId"]["coreItemId"], latestCompletedVersion["simulationId"]["snapshotId"], scalar_ids=biomarkersId
    )

    # Plot the Kaplan Meier curves
    ## Convert the `armId` column to two groups (e.g., ArmA and ArmB)
    dfBiomarkers_raw["armId"] = dfBiomarkers_raw["armId"].map(lambda x: x.split("_")[0])
    ## Split the data by arm
    grouped = dfBiomarkers_raw.groupby("armId")
    group = grouped.get_group(arm)
    time = group["value"].values
    event_occurred = [True] * len(time)  # Assuming all events are uncensored
    max_time = max(time)
    event_occurred = [False if t == max_time else True for t in time]
    ## Construct the Kaplan - Meier curve
    time_simulation, survival_simulation, conf_int = kaplan_meier_estimator(
        event_occurred, time, conf_type="log-log"
    )
    ## Plot curves
    plt.step(
        time_simulation, survival_simulation, where="post", label=f"Trial id {trialId}"
    )

    # Compare with data, only if compare_with_data = 1
    if compare_with_data == 1:
        results = logrank_test(
            time,
            durationData,
            event_observed_A=event_occurred,
            event_observed_B=statusData,
        )
        print(
            "The logrank test for trial ",
            trialId,
            " compare with data, is: p_value = ",
            results.p_value,
        )


# Customize plot
## Add data, only if compare_with_data = 1
if compare_with_data == 1:
    timeData, survivalData = kaplan_meier_estimator(statusData, durationData)
    plt.plot(
        timeData,
        survivalData,
        linestyle="--",
        marker="o",
        color="red",
        markersize=1,
        label="Extracted Data",
    )

plt.ylabel("Survival probability")
plt.xlabel("Time (months)")
plt.title("Kaplan-Meier Curves by Treatment Arm with Extracted Data")
plt.legend(title="Treatment Arm")
plt.grid(True)
plt.show()