# Erlotinib High Dose Treatment Data: LXF A677 Implanted in Mice

In [1], the tumour growth inhibition (TGI) PKPD model of Erlotinib and Gefitinib was derived from two separate *in vivo* experiments. In particular, the growth of patient-derived tumour explants LXF A677 (adenocarcinoma of the lung) and cell line-derived tumour xenografts VXF A431 (vulva cancer) in mice were monitored. Each experiment comprised a control growth group and three groups that were treated with either Erlotinib or Gefitnib at one of three dose levels. Treatments were orally administered once a day.

In this notebook, DESCRIPTION TO BE COMPLETED

## Raw PK data for all dosing regimens

In [1]:
# 
# Import raw LXF A677 Erlotinib PK data.
#

import os
import pandas as pd


# Import LXF A677 PK data
path = os.path.dirname(os.getcwd())  # make import independent of local path structure
pk_data_raw = pd.read_csv(path + '/data_raw/PK_LXF_erlo.csv', sep=';')

# Display data
print('Raw PK Data Set for all dosing regimens:')
pk_data_raw

Raw PK Data Set for all dosing regimens:


Unnamed: 0,#ID,TIME,DOSE,ADDL,II,Y,YTYPE,CENS,CELL LINE,DOSE GROUP,DRUG,EXPERIMENT,TUMOR SIZE,BW
0,6,0.0,.,.,.,.,.,.,1,100.00,1,2,68.7500,24.2
1,6,2.0,.,.,.,.,.,.,1,100.00,1,2,75.4290,24.7
2,6,3.0,2450,.,.,.,.,.,1,100.00,1,2,75.4290,24.7
3,6,4.0,.,.,.,.,.,.,1,100.00,1,2,115.3510,23.6
4,6,4.0,2350,2,1,.,.,.,1,100.00,1,2,115.3510,23.6
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
467,162,21.0,.,.,.,.,.,.,1,6.25,1,2,490.3980,24.2
468,162,23.0,.,.,.,.,.,.,1,6.25,1,2,425.2700,24.5
469,162,25.0,.,.,.,.,.,.,1,6.25,1,2,432.6660,24.5
470,162,28.0,.,.,.,.,.,.,1,6.25,1,2,585.0000,25.3


## Raw PD data for all Erlotinib and Gefitinib dosing regimens

In [2]:
#
# Import raw PD data.
#

import os
import pandas as pd


# Import LXF A677 PD data
path = os.path.dirname(os.getcwd())  # make import independent of local path structure
pd_data_raw = pd.read_csv(path + '/data_raw/PKPD_ErloAndGefi_LXF.csv', sep=';')

# Display data
print('Raw PD Data Set for all Erlotinib and Gefitinib dosing regimens:')
pd_data_raw

Raw PD Data Set for all Erlotinib and Gefitinib dosing regimens:


Unnamed: 0,#ID,TIME,DOSE,ADDL,II,Y,YTYPE,CENS,CELL LINE,DOSE GROUP,...,DRUG,DRUGCAT,EXPERIMENT,BW,YTV,KA,V,KE,w0,I
0,4,0,.,.,.,113.1325,2,.,1,6.25,...,2,2,2,29.7,.,55,1.403,4.1328,113.1325,0.007717
1,4,3,187.5,.,.,.,.,.,1,6.25,...,2,2,2,30.1,.,55,1.403,4.1328,113.1325,0.007717
2,4,4,181.25,2,1,.,.,.,1,6.25,...,2,2,2,29.0,.,55,1.403,4.1328,113.1325,0.007717
3,4,7,181.25,1,1,.,.,.,1,6.25,...,2,2,2,29.1,.,55,1.403,4.1328,113.1325,0.007717
4,4,9,181.25,1,1,.,.,.,1,6.25,...,2,2,2,29.3,.,55,1.403,4.1328,113.1325,0.007717
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
1101,170,11,.,.,.,194.0785,2,.,1,0.00,...,2,0,2,28.6,194.0785,55,1.400,3.8700,80.0565,0.007717
1102,170,9,.,.,.,176.157,2,.,1,0.00,...,2,0,2,29.3,176.157,55,1.400,3.8700,80.0565,0.007717
1103,170,7,.,.,.,158.994,2,.,1,0.00,...,2,0,2,28.8,158.994,55,1.400,3.8700,80.0565,0.007717
1104,170,4,.,.,.,109.33,2,.,1,0.00,...,2,0,2,27.9,109.33,55,1.400,3.8700,80.0565,0.007717


## Cleaning the data

DISCUSSION TO BE COMPLETED

Need 
- ID
- Time
- Plasma concentration (check whether the value is actually the plasma concentration)
- Tumour volume
- Body weight
- Dose Time
- Dose Amount

## Cleaning the data

We obtained those datasets from the authors of [1]. There is a lot of information in the datasets that are not relevant for our purposes.

All we really need for our analyis is

- **#ID** indicating which mouse was measured,
- **BODY WEIGHT** indicating the weight of the mouse
- **TIME CONC** indicating the time point of each plasma concentration measurement,
- **CONC** indicating the measured plasma concentration of the compound,
- **TIME VOLUME** indicating the time point of each tumour volume measurement,
- **TUMOUR VOLUME** indicating the measured tumour volume,
- **DOSE TIME** indicating the time point when the dose was administered,
- **DOSE AMOUNT** indicating the amount of the administered dose.

It is not unambiguous to identify said properties in the dataframes. That is why we detail our mapping in the following and document the reasons for our decisions.

- **#ID**: Mapping is obvious,
- **BODY WEIGHT**: Mapped to **BW** after confirmation from the authors. **BW** measured in $\text{g}.
- **TIME CONC**: **TIME** column in `pk_data_raw`. Time measured in $\text{day}$.
- **CONC**: **Y** column in `pk_data_raw` after confirmation from authors. Concentration measured in $\text{mg/L}$.
- **TIME VOLUME**: **TIME** column in either `pk_data_raw` or `pd_data_raw`. Time measued in $\text{day}$.
- **TUMOUR VOLUME**: **Y** column in `pd_data_raw`. Tumour measured in $\text{mm}^3$.
- **DOSE TIME**: We take the dosing times from the description in the reference [1], due to unfamiliarity with Monlix's conventions.
- **DOSE AMOUNT**: We take the dosing times from the description in the reference [1], due to unfamiliarity with Monlix's conventions.

The low dose dosing regimen for Erlotinib was an oral dose of $100\, \text{mg/kg/L}$ per day from day 3 to 9 and day 14 to 16. According to Roche's study report doses were adjusted throughout the experiment. However, the body weight was only measured on days 0, 2, 4, 7, 9, 11, 14, 16, 18, 21, 23, 25, 28, 30. As a result it may be assumed that also the dose was only adjusted when a new measurement of the body weight was obtained, despite being administered daily.

NOTE: THE DOSE AMOUNT in the DOSE column DEVIATES from THEORETICAL DOSE. WHY? THERE IS NO DOSE IN ROCHE'S REPORT.

Remarks on remaining column keys:

- **DOSE**, **ADDL**, **II**: According to Monolix these keys encode for the dose amount (DOSE), the number of doses (ADDL) to add in addition to the dose in intervals specified by II. Since we take the doses from the study directly, we don't need those keys.
- **YTYPE**, **CENS**: According to Monolix these keys encode for the data type (tumour volume in this case) and whether the measurered values were subject to censoring. We should make sure that censored data should be dealt with accordingly and only one data type is present in the data set.
- **CELL LINE**, **DOSE GROUP**, **DRUG**, **EXPERIMENT**: These customised keys are quite self-explanatory. We should make sure that the data we use is uni-valued in these columns.
- **DRUGCAT**: The meaning of this key is less clear. It may refer to the drug category encoding for the route of administration. We should make sure that this column is also only uni-valued. If mutliple values are assumed we need to clarify what this column means.
- **BW**: refers to the body weight of the mouse at the time of the measurement.
- **KA**, **V**, **KE**, **w0**, **I**: These keys are customised keys, whose meaning is not immediately clear. They appear to be parameters of the PKPD model. We are interested in infering parameters, so we are not interested in any previously obtained parameters, and choose to ignore this column.

For reasons that will become clear later, we will choose to measure the tumour volume in $\text{cm}^3$.

## Create Erlotinib low dose PKPD dataset

TO BE COMPLETED: CURRENTLY NOT WORKING

In [18]:
#
# Create LXF A677 data from raw data set.
#

import os

import numpy as np
import pandas as pd


# Get path to directory
path = os.path.dirname(os.getcwd())  # make import independent of local path structure

# Import LXF A677 Erlotinib PK data
pk_data_raw = pd.read_csv(path + '/data_raw/PK_LXF_erlo.csv', sep=';')

# Import LXF A677 PD data
pd_data_raw = pd.read_csv(path + '/data_raw/PKPD_ErloAndGefi_LXF.csv', sep=';')

# Make sure that data is stored as numeric data
pk_data = pk_data_raw.apply(pd.to_numeric, errors='coerce', downcast='float')
pd_data = pd_data_raw.apply(pd.to_numeric, errors='coerce', downcast='float')

# Mask PD data for Erlotinib treatment (DRUG 1)
pd_data = pd_data[pd_data['DRUG'] == 1]

# Mask for the low dose group (DOSE GROUP 100)
pk_data = pk_data[pk_data['DOSE GROUP'] == 100.0]
pd_data = pd_data[pd_data['DOSE GROUP'] == 100.0]

# Sort dataframes according to measurement times
pk_data.sort_values('TIME', inplace=True)
pd_data.sort_values('TIME', inplace=True)

# Check that pk mice are a subset of pd mice
# NOTE: PK measurements were not taken for all mice
assert np.alltrue(np.isin(pk_data['#ID'].unique(), pd_data['#ID'].unique()))

# Get dose relevant columns and rows from dataframes (needed to reconstruct dosing schedule further down)
pk_dose_data = pk_data[~pk_data['DOSE'].isnull()][['#ID', 'TIME', 'DOSE', 'BW']]
pd_dose_data = pd_data[~pd_data['DOSE'].isnull()][['#ID', 'TIME', 'DOSE', 'BW']]

# Filter out DOSE entries (we compute those independently)
pk_data = pk_data[pk_data['DOSE'].isnull()]
pd_data = pd_data[pd_data['DOSE'].isnull()]

# Assert that for each mouse the tumour volumes and body weights at a given time point agrees across the two datasets
# NOTE: PK has only information about a subset of mice
for mouse_id in pk_data['#ID'].unique():
    # Create mask for mouse
    pk_mask = (pk_data['#ID'] == mouse_id) & pk_data['Y'].isnull()
    pd_mask = pd_data['#ID'] == mouse_id

    # Assert that times are the same, if mouse exists in both dataframes
    if not pk_data[pk_mask].empty:
        display(pk_data[pk_mask]['TIME'])
        display(pd_data[pd_mask]['TIME'])
        assert np.array_equal(pk_data[pk_mask]['TIME'], pd_data[pd_mask]['TIME'])

    # Assert that tumour volumes are the same
    assert np.array_equal(pk_data[pk_mask]['TUMOR SIZE'], pd_data[pd_mask]['Y'])

    # Assert that body weights agree
    assert np.array_equal(pk_data[pk_mask]['BW'], pd_data[pd_mask]['BW'])

# Initialise final dataframe from PD dataframe and forget about all columns but #ID, TIME, Y and BW
data = pd_data[['#ID', 'TIME', 'Y', 'BW']]

# Rename TIME to TIME VOLUME in day
data = data.rename(columns={'TIME': 'TIME VOLUME in day'})

# Rename Y to TUMOUR VOLUME in cm^3 and convert from mm^3 to cm^3
data = data.rename(columns={'Y': 'TUMOUR VOLUME in cm^3'})
data['TUMOUR VOLUME in cm^3'] *= 10E-03

# Rename BW to BODY WEIGHT in g
data = data.rename(columns={'BW': 'BODY WEIGHT in g'})

# Extract #ID, TIME, Y and BW from PK data
conc_data = pk_data[['#ID', 'TIME', 'Y', 'BW']]

# Filter only for those rows with non-nan entries
conc_data = conc_data[~conc_data['Y'].isnull()]

# Rename TIME to TIME CONC in day
conc_data = conc_data.rename(columns={'TIME': 'TIME CONC in day'})

# Rename Y to CONC in mg/L and convert from ng/L to mg/L
conc_data = conc_data.rename(columns={'Y': 'CONC in mg/L'})
conc_data['CONC in mg/L'] *= 1E-03

# Rename BW to BODY WEIGHT in g
conc_data = conc_data.rename(columns={'BW': 'BODY WEIGHT in g'})

# Add concentration data to final dataframe
data = pd.concat([data, conc_data])

# Define dosing time points in day
dose_times = [3.0, 4.0, 5.0, 6.0, 7.0, 8.0, 9.0, 14.0, 15.0, 16.0]

# Assert that mice in dose dataframes are the same as in the final data frame
assert np.array_equal(np.sort(pk_dose_data['#ID'].unique()), np.sort(data['#ID'].unique()))
assert np.array_equal(np.sort(pd_dose_data['#ID'].unique()), np.sort(data['#ID'].unique()))

# Initialise dose dataframe based on pk dataframe
dose_data = pk_dose_data

# Get dose from dataframe TODO: CHECK WHY ENTRIES DEVIATE FROM THEORETICAL VALUES
for mouse_id in data['#ID'].unique():
    # Create mask for mouse and filter out NaN dose entries
    pk_mask = (pk_dose_data['#ID'] == mouse_id)
    pd_mask = (pd_dose_data['#ID'] == mouse_id)

    # Check that DOSE entries agree across datasets
    assert np.array_equal(pk_dose_data[pk_mask]['DOSE'], pd_dose_data[pd_mask]['DOSE'])
    
    # Add dose amount to data frame for each dose event
    for time_id, time in enumerate(dose_times):
        # Mask dataframe for time
        mask = (dose_data['#ID'] == mouse_id) & (dose_data['TIME'] == time)

        # If first dose is empty raise an error
        if dose_data[mask]['DOSE'].empty & (time == dose_times[0]):     
            raise ValueError

        # Else if dose is empty and not the first dose, fill with previous dose
        # (Assume that dose has not been altered)
        elif dose_data[mask]['DOSE'].empty:
            # Create mask for previous time point
            mask = (dose_data['#ID'] == mouse_id) & (dose_data['TIME'] == dose_times[time_id-1])

            # Append dose to container
            dose_data = dose_data.append(
                pd.DataFrame({
                    '#ID': mouse_id, 
                    'TIME': time, 
                    'DOSE': dose_data[mask]['DOSE'],
                    'BW': dose_data[mask]['BW']}),
                ignore_index=True)

        else:
            # Assert that there is only one entry per time point
            assert len(dose_data[mask]['DOSE']) == 1

# Sort dataframes according to measurement times
dose_data.sort_values('TIME', inplace=True)

# Rename TIME to TIME DOSE
dose_data = dose_data.rename(columns={'TIME': 'TIME DOSE in day'})

# Rename DOSE to DOSE AMOUNT and convert from ng to mg
dose_data = dose_data.rename(columns={'DOSE': 'DOSE AMOUNT in mg'})
dose_data['DOSE AMOUNT in mg'] *= 1E-03

# Rename BW to BODY WEIGHT
dose_data = dose_data.rename(columns={'BW': 'BODY WEIGHT in g'})

# Add dose data to final dataframe
data = pd.concat([data, dose_data])

# Display final Erlotinib low dose dataset
print('Low dose Erlotinib dataset LXF A677:')
data

0      0.0
1      2.0
3      4.0
5      7.0
7      9.0
8     11.0
9     14.0
12    16.0
14    18.0
15    21.0
16    23.0
17    25.0
18    28.0
19    30.0
Name: TIME, dtype: float32

21     0.0
32     2.0
30     4.0
37     7.0
38     9.0
39    11.0
36    14.0
34    16.0
35    18.0
33    21.0
31    23.0
29    25.0
28    28.0
27    30.0
Name: TIME, dtype: float32

29     0.0
30     2.0
32     4.0
34     7.0
36     9.0
37    11.0
38    14.0
41    16.0
43    18.0
44    21.0
45    23.0
46    25.0
47    28.0
48    30.0
Name: TIME, dtype: float32

132     0.0
138     2.0
136     4.0
139     7.0
143     9.0
141    11.0
140    14.0
144    16.0
142    21.0
137    23.0
135    25.0
133    28.0
134    30.0
Name: TIME, dtype: float32

AssertionError: 

In [6]:
pk_data['#ID'].unique()

array([  6,  28,  11, 128, 134,  67])

In [7]:
pd_data['#ID'].unique()

array([  6, 147, 137, 134,  67,  28,  11, 128])

## Illustrate Erlotinib low dose data

We use [plotly](https://plotly.com/python/) to create interactive visualisations of the time-series data.

In [4]:
def compute_cumulative_dose_amount(times, doses, end_exp, duration=1E-03, start_exp=0):
    """
    Converts bolus dose amounts to a cumulative dose amount series that can be plotted nicely.
    
    Optionally the start and end of the experiment can be provided, so a constant cumulative amount
    is displayed for the entire duration experiment.
    """
    # Get number of measurements
    n = len(times)

    # Define how many cumulative time points are needed (add start and end if needed)
    m = 2 * n + 2

    # Create time container
    cum_times = np.empty(shape=m)

    # Create dose container
    cum_doses = np.empty(shape=m)

    # Add first entry (assuming no base level drug)
    cum_times[0] = 0
    cum_doses[0] = 0
    cum_doses[1] = 0  # At start of first dose there will also be no drug

    # Add start and end time of dose to container
    cum_times[1:-2:2] = times  # start of dose
    cum_times[2:-1:2] = times + duration  # end of dose
    cum_times[-1] = end_exp

    # Add cumulative dose amount at start and end of dose to container
    cum_doses[3:-2:2] = np.cumsum(doses[:-1])  # start of doses (except first dose, dealt with above)
    cum_doses[2:-1:2] = np.cumsum(doses)  # end of doses
    cum_doses[-1] =  np.cumsum(doses)[-1]  # final dose level

    return cum_times, cum_doses


In [5]:
#
# Visualise Erlotinib low dose data growth data.
#
# This cell needs the above created dataset:
# [data]
#

import pandas as pd
import plotly.colors
import plotly.graph_objects as go
from plotly.subplots import make_subplots


# Get number of individual mice
n_mice = len(data['#ID'].unique())

# Define colorscheme
colors = plotly.colors.qualitative.Plotly[:n_mice]

# Create figure
fig = make_subplots(rows=3, cols=1, shared_xaxes=True, row_heights=[0.2, 0.4, 0.4], vertical_spacing=0.05)

# Scatter plot of concentration and tumour growth data
for index, mouse_id in enumerate(np.sort(data['#ID'].unique())):
    # Mask dataset for mouse
    mask = data['#ID'] == mouse_id
    mouse_data = data[mask]

    # Get concentration measurement times
    conc_times = mouse_data['TIME CONC in day'].to_numpy()

    # Get measured concentrations
    conc = mouse_data['CONC in mg/L'].to_numpy()

    # Get tumour volume measurement times
    volume_times = mouse_data['TIME VOLUME in day'].to_numpy()

    # Get measured concentrations
    volumes = mouse_data['TUMOUR VOLUME in cm^3'].to_numpy()

    # Get dosing time points
    dose_times = mouse_data['TIME DOSE in day'].to_numpy()

    # Get doses
    doses = mouse_data['DOSE AMOUNT in mg'].to_numpy()

    # Filter nans from dose arrays
    dose_times = dose_times[~np.isnan(dose_times)]
    doses = doses[~np.isnan(doses)]

    # Convert dose events to cumulative dose amount time series
    dose_times, doses = compute_cumulative_dose_amount(
        times=dose_times,
        doses=doses,
        end_exp=30)

    # Plot cumulative dosed amount
    fig.add_trace(
        go.Scatter(
            x=dose_times,
            y=doses,
            legendgroup="ID: %d" % mouse_id,
            name="ID: %d" % mouse_id,
            showlegend=False,
            hovertemplate=
                "<b>Cumulative dose in mg</b><br>" +
                "ID: %d<br>" % mouse_id +
                "Time: %{x:.0f} day<br>" +
                "Tumour volume: %{y:.02f} cm^3<br>" +
                "<extra></extra>",
            mode="lines",
            line=dict(color=colors[index])),
        row=1,
        col=1)

    # Plot concentration data
    fig.add_trace(
        go.Scatter(
            x=conc_times,
            y=conc,
            legendgroup="ID: %d" % mouse_id,
            name="ID: %d" % mouse_id,
            showlegend=False,
            hovertemplate=
                "<b>Plasma Concentration in mg/L</b><br>" +
                "ID: %d<br>" % mouse_id +
                "Time: %{x:.0f} day<br>" +
                "Plasma concentration: %{y:.02f} mg/L<br>" +
                "<extra></extra>",
            mode="markers",
            marker=dict(
                symbol='circle',
                opacity=0.7,
                line=dict(color='black', width=1),
                color=colors[index])),
        row=2,
        col=1)

    # Plot tumour volume data
    fig.add_trace(
        go.Scatter(
            x=volume_times,
            y=volumes,
            legendgroup="ID: %d" % mouse_id,
            name="ID: %d" % mouse_id,
            showlegend=True,
            hovertemplate=
                "<b>Tumour volume in cm^3 %s</b><br>" +
                "ID: %d<br>" % mouse_id +
                "Time: %{x:} day<br>" +
                "Tumour volume: %{y:.02f} cm^3<br>" +
                "<extra></extra>",
            mode="markers",
            marker=dict(
                symbol='circle',
                opacity=0.7,
                line=dict(color='black', width=1),
                color=colors[index])),
        row=3,
        col=1)

# Set figure size
fig.update_layout(
    autosize=True,
    template="plotly_white")

# Set X axis label
fig.update_xaxes(title_text=r'$\text{Time in day}$', row=3, col=1)

# Set Y axes labels
fig.update_yaxes(title_text=r'$\text{Amount in mg}$', row=1, col=1)
fig.update_yaxes(title_text=r'$\text{Conc. in mg/L}$', row=2, col=1)
fig.update_yaxes(title_text=r'$\text{Tumour volume in cm}^3$', row=3, col=1)

# Add switch between linear and log y-scale
fig.update_layout(
    updatemenus=[
        dict(
            type = "buttons",
            direction = "left",
            buttons=list([
                dict(
                    args=[{
                        "yaxis2.type": "linear",
                        "yaxis3.type": "linear"}],
                    label="Linear y-scale",
                    method="relayout"
                ),
                dict(
                    args=[{
                        "yaxis2.type": "log",
                        "yaxis3.type": "log"}],
                    label="Log y-scale",
                    method="relayout"
                )
            ]),
            pad={"r": 0, "t": -10},
            showactive=True,
            x=0.0,
            xanchor="left",
            y=1.15,
            yanchor="top"
        ),
    ]
)

# Show figure
fig.show()

**Figure 1:** TO BE COMPLETED

## Export cleaned data

In [6]:
#
# Export cleaned data sets for inference in other notebooks.
#
# This cell needs the above created dataset
#

import os
import pandas as pd


# Get path of current working directory
path = os.getcwd()

# Export cleaned LXF A677 control growth data
data.to_csv(path + '/data/erlotinib_low_dose_lxf.csv')

## Bibliography

- <a name="ref1"> [1] </a> Eigenmann et. al., Combining Nonclinical Experiments with Translational PKPD Modeling to Differentiate Erlotinib and Gefitinib, Mol Cancer Ther (2016)

[Back to project overview](https://github.com/DavAug/ErlotinibGefitinib/blob/master/README.md) | [Forward to next notebook](https://github.com/DavAug/ErlotinibGefitinib/blob/master/notebooks/control_growth/pooled_model.ipynb)