# Requirements

In [1]:
import pandas as pd

# Data

Read the patient experiment data.

In [2]:
data = pd.read_excel('data/patient_experiment.xlsx')

In [3]:
data.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 62 entries, 0 to 61
Data columns (total 4 columns):
 #   Column       Non-Null Count  Dtype         
---  ------       --------------  -----         
 0   patient      62 non-null     int64         
 1   dose         61 non-null     float64       
 2   date         62 non-null     datetime64[ns]
 3   temperature  61 non-null     float64       
dtypes: datetime64[ns](1), float64(2), int64(1)
memory usage: 2.1 KB


The first step is transforming the data into a time series.

In [4]:
def create_time_series(df):
    return df.pivot_table(index='date', columns=['patient'])

Next, we should deal with missing data by interpolation.

In [5]:
def impute(df):
    return df.interpolate()

Finally, we compute the mean value of the temperatures across all patients for each time step.  Note that the name of the column is a parameter.

In [6]:
def compute_mean(df, column):
    df['avg_temp'] = df[column].mean(axis=1)
    return df

All these operations can be chained using pipes.

In [7]:
time_series = data.pipe(create_time_series) \
                  .pipe(impute) \
                  .pipe(compute_mean, 'temperature')

In [8]:
time_series.info()

<class 'pandas.core.frame.DataFrame'>
DatetimeIndex: 7 entries, 2012-10-02 10:00:00 to 2012-10-02 16:00:00
Data columns (total 19 columns):
 #   Column            Non-Null Count  Dtype  
---  ------            --------------  -----  
 0   (dose, 1)         7 non-null      float64
 1   (dose, 2)         7 non-null      float64
 2   (dose, 3)         7 non-null      float64
 3   (dose, 4)         7 non-null      float64
 4   (dose, 5)         7 non-null      float64
 5   (dose, 6)         7 non-null      float64
 6   (dose, 7)         7 non-null      float64
 7   (dose, 8)         7 non-null      float64
 8   (dose, 9)         7 non-null      float64
 9   (temperature, 1)  7 non-null      float64
 10  (temperature, 2)  7 non-null      float64
 11  (temperature, 3)  7 non-null      float64
 12  (temperature, 4)  7 non-null      float64
 13  (temperature, 5)  7 non-null      float64
 14  (temperature, 6)  7 non-null      float64
 15  (temperature, 7)  7 non-null      float64
 16  (temperat

The original dataframe is unchanged.

In [9]:
data.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 62 entries, 0 to 61
Data columns (total 4 columns):
 #   Column       Non-Null Count  Dtype         
---  ------       --------------  -----         
 0   patient      62 non-null     int64         
 1   dose         61 non-null     float64       
 2   date         62 non-null     datetime64[ns]
 3   temperature  61 non-null     float64       
dtypes: datetime64[ns](1), float64(2), int64(1)
memory usage: 2.1 KB
