# <center>Study of the Efficacy of Different Anti-Cancer Treatments</center>
In a recent animal study, 249 mice with squamous cell carcinoma (SCC) tumors were treated with various drug regimens, including Capomulin. Over a 45-day period, the study aimed to assess the efficacy of these treatments in inhibiting tumor development. The task involves analyzing the complete dataset and generating tables and figures for the technical report. The study's top-level summary is pending detailed analysis, but it aims to compare the performance of Capomulin against other regimens, considering factors such as tumor size reduction, survival rates, and potential side effects. The ultimate goal is to provide the executive team with a comprehensive overview of the study results to inform future directions in anti-cancer medication development.

## Data Cleaning

In [51]:
# Import dependencies

import pandas as pd

In [52]:
# Read the mouse CSV file and print the first five rows

mouse_df = pd.read_csv('data\Mouse_metadata.csv')
mouse_df.head()

Unnamed: 0,Mouse ID,Drug Regimen,Sex,Age_months,Weight (g)
0,k403,Ramicane,Male,21,16
1,s185,Capomulin,Female,3,17
2,x401,Capomulin,Female,16,15
3,m601,Capomulin,Male,22,17
4,g791,Ramicane,Male,11,16


In [53]:
# Read the study CSV file and print the first five rows

study_df = pd.read_csv('data\Study_results.csv')
study_df.head()

Unnamed: 0,Mouse ID,Timepoint,Tumor Volume (mm3),Metastatic Sites
0,b128,0,45.0,0
1,f932,0,45.0,0
2,g107,0,45.0,0
3,a457,0,45.0,0
4,c819,0,45.0,0


In [54]:
# Merge the two DataFrames on the 'Mouse ID' column and print the first 15 rows

merged_df = pd.merge(mouse_df, study_df, on = 'Mouse ID', how = 'left')
merged_df.head(15)

Unnamed: 0,Mouse ID,Drug Regimen,Sex,Age_months,Weight (g),Timepoint,Tumor Volume (mm3),Metastatic Sites
0,k403,Ramicane,Male,21,16,0,45.0,0
1,k403,Ramicane,Male,21,16,5,38.825898,0
2,k403,Ramicane,Male,21,16,10,35.014271,1
3,k403,Ramicane,Male,21,16,15,34.223992,1
4,k403,Ramicane,Male,21,16,20,32.997729,1
5,k403,Ramicane,Male,21,16,25,33.464577,1
6,k403,Ramicane,Male,21,16,30,31.099498,1
7,k403,Ramicane,Male,21,16,35,26.546993,1
8,k403,Ramicane,Male,21,16,40,24.365505,1
9,k403,Ramicane,Male,21,16,45,22.050126,1


In [55]:
# Investigate the merged DataFrame and check for any null values

merged_df.info()
print('There are no null values in the merged DataFrame')

<class 'pandas.core.frame.DataFrame'>
Int64Index: 1893 entries, 0 to 1892
Data columns (total 8 columns):
 #   Column              Non-Null Count  Dtype  
---  ------              --------------  -----  
 0   Mouse ID            1893 non-null   object 
 1   Drug Regimen        1893 non-null   object 
 2   Sex                 1893 non-null   object 
 3   Age_months          1893 non-null   int64  
 4   Weight (g)          1893 non-null   int64  
 5   Timepoint           1893 non-null   int64  
 6   Tumor Volume (mm3)  1893 non-null   float64
 7   Metastatic Sites    1893 non-null   int64  
dtypes: float64(1), int64(4), object(3)
memory usage: 133.1+ KB
There are no null values in the merged DataFrame


In [56]:
# Investigate the descriptive statistics for the numerical columns

merged_df.describe()

Unnamed: 0,Age_months,Weight (g),Timepoint,Tumor Volume (mm3),Metastatic Sites
count,1893.0,1893.0,1893.0,1893.0,1893.0
mean,12.81458,25.662441,19.572108,50.448381,1.021659
std,7.189592,3.921622,14.07946,8.894722,1.137974
min,1.0,15.0,0.0,22.050126,0.0
25%,7.0,25.0,5.0,45.0,0.0
50%,13.0,27.0,20.0,48.951474,1.0
75%,20.0,29.0,30.0,56.2922,2.0
max,24.0,30.0,45.0,78.567014,4.0


In [57]:
# Investigate the shape of the DataFrame

shape = merged_df.shape
print(f'There are {shape[0]} rows and {shape[1]} columns in the DataFrame.')

There are 1893 rows and 8 columns in the DataFrame.


In [58]:
# Find the total number of mice used in the study

mouse_count = merged_df['Mouse ID'].nunique()
print(f'There are a total of {mouse_count} mice in the sample.')

There are a total of 249 mice in the sample.


In [59]:
# Find any duplicates in trials and delete the corresponding specimen as bad samples

duplicates = merged_df.duplicated(subset=['Mouse ID', 'Timepoint'])
merged_df.loc[duplicates, :]

Unnamed: 0,Mouse ID,Drug Regimen,Sex,Age_months,Weight (g),Timepoint,Tumor Volume (mm3),Metastatic Sites
909,g989,Propriva,Female,21,26,0,45.0,0
911,g989,Propriva,Female,21,26,5,47.570392,0
913,g989,Propriva,Female,21,26,10,49.880528,0
915,g989,Propriva,Female,21,26,15,53.44202,0
917,g989,Propriva,Female,21,26,20,54.65765,1


In [60]:
# Mouse ID g989 has duplicate trials. Therefore, we will delete the specimen as bad samples

merged_df = merged_df.loc[merged_df['Mouse ID'] != 'g989']
mouse_count = merged_df['Mouse ID'].nunique()
print(f'There are now a total of {mouse_count} mice in the sample after dropping bad samples.')

There are now a total of 248 mice in the sample after dropping bad samples.


In [61]:
# Rename Columns

merged_df = merged_df.rename(columns={'Age_months':'Age (Months)', 'Timepoint':'Timepoint (Days)'})
merged_df.head()

Unnamed: 0,Mouse ID,Drug Regimen,Sex,Age (Months),Weight (g),Timepoint (Days),Tumor Volume (mm3),Metastatic Sites
0,k403,Ramicane,Male,21,16,0,45.0,0
1,k403,Ramicane,Male,21,16,5,38.825898,0
2,k403,Ramicane,Male,21,16,10,35.014271,1
3,k403,Ramicane,Male,21,16,15,34.223992,1
4,k403,Ramicane,Male,21,16,20,32.997729,1


## Summary Statistics

In [70]:
# Create a DataFrame to summarize Drug Regimen vs. Tumor Volume (mm3)

group = merged_df.groupby('Drug Regimen')
t_mean = group['Tumor Volume (mm3)'].mean()
t_median = group['Tumor Volume (mm3)'].median()
t_variance = group['Tumor Volume (mm3)'].var()
t_stdev = group['Tumor Volume (mm3)'].std()
t_sem = group['Tumor Volume (mm3)'].sem()

summary_df = pd.DataFrame({'Mean Tumor Volume (mm3)':t_mean, 'Median Tumor Volume (mm3)':t_median, 
                           'Variance Tumor Volume':t_variance, 'Standard Deviation Tumor Volume':t_stdev, 
                           'Standard Error of the Mean':t_sem})
summary_df

Unnamed: 0_level_0,Mean Tumor Volume (mm3),Median Tumor Volume (mm3),Variance Tumor Volume,Standard Deviation Tumor Volume,Standard Error of the Mean
Drug Regimen,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1
Capomulin,40.675741,41.557809,24.947764,4.994774,0.329346
Ceftamin,52.591172,51.776157,39.290177,6.268188,0.469821
Infubinol,52.884795,51.820584,43.128684,6.567243,0.492236
Ketapril,55.235638,53.698743,68.553577,8.279709,0.60386
Naftisol,54.331565,52.509285,66.173479,8.134708,0.596466
Placebo,54.033581,52.288934,61.168083,7.821003,0.581331
Propriva,52.32093,50.446266,43.852013,6.622085,0.544332
Ramicane,40.216745,40.673236,23.486704,4.846308,0.320955
Stelasyn,54.233149,52.431737,59.450562,7.710419,0.573111
Zoniferol,53.236507,51.818479,48.533355,6.966589,0.516398
