## Observations and Insights

### Pie plot

Pie plot for distribution of female versus male mice show that the clinical research used almost equal number of male and female mice for the study of thier responses to different treatment regimens.

### Box Plot

The Box plot for final tumor volumes across the four treatment regimens:Capomulin, Ramicane, Infubinol, and Ceftamin, show :

1) The final tumor volumes of mice being treated with  Capomulin and Ramicane have a similar range. Mice being treated with Infubinol and Ceftamin also have similar range of final tumor values, and are higher than that of other two drugs.

2) While there are no outliers identified for other drugs ,one mice(Mouse ID c326) who was treated with Infubinol has been identified as an outlier.

3) The box plot also shows that mice being treated with Capomulin and Ceftamin have final tumor volumes that spread out far from the middle value(median), hence there is more variability in the data points.

### Line Graph

The line graph "Tumor reponse to Capomulin regimen (Mouse ID s185 )" shows that this mouse has repsponded very well to Capomulin treatment ; the tumors volume have shrunk(~48%) in the course of the treatment.

### Correlation and Linear regression

The correlation coefficient between mouse weight and average tumor volume for the Capomulin regimen is 0.95, which shows that their is a strong positive linear relationship between the two. 

The linear regression model also shows that about 90% of the variation in tumor volume is due to the variation in mouse weight.

## Dependencies and starter code

In [1]:
%matplotlib notebook

# Dependencies and Setup

import matplotlib.pyplot as plt
import pandas as pd
import scipy.stats as st
import numpy as np
from scipy.stats import linregress

#This stops the graphs from overwriting each other
plt.ioff() 

# Study data files
mouse_metadata = "data/Mouse_metadata.csv"
study_results = "data/Study_results.csv"

# Read the mouse data and the study results
mouse_metadata = pd.read_csv(mouse_metadata)
study_results = pd.read_csv(study_results)

# Combine the data into a single dataset
mouse_study_results_df = pd.merge(mouse_metadata,study_results,on="Mouse ID",how="left")

mouse_study_results_df.head()


Unnamed: 0,Mouse ID,Drug Regimen,Sex,Age_months,Weight (g),Timepoint,Tumor Volume (mm3),Metastatic Sites
0,k403,Ramicane,Male,21,16,0,45.0,0
1,k403,Ramicane,Male,21,16,5,38.825898,0
2,k403,Ramicane,Male,21,16,10,35.014271,1
3,k403,Ramicane,Male,21,16,15,34.223992,1
4,k403,Ramicane,Male,21,16,20,32.997729,1


## Summary statistics

In [2]:
# Generate a summary statistics table of mean, median, variance, standard deviation, and SEM of the tumor volume for each regimen

#Grouped by Drug Regimen
mouse_study_results_grouped = mouse_study_results_df.groupby(["Drug Regimen"])

#Mean
mean_tumor_vol = mouse_study_results_grouped["Tumor Volume (mm3)"].mean()

#Median
median_tumor_vol = mouse_study_results_grouped["Tumor Volume (mm3)"].median()

#Variance for population using ddof = 0
variance_tumor_vol = mouse_study_results_grouped["Tumor Volume (mm3)"].agg(np.var,ddof=0)

#Standard Deviation for population using ddof = 0
stddev_tumor_vol = mouse_study_results_grouped["Tumor Volume (mm3)"].agg(np.std,ddof=0)

#Standard Error for population using ddof = 0
sem_tumor_vol = mouse_study_results_grouped["Tumor Volume (mm3)"].sem(ddof=0)


Tumor_vol_statistics = pd.DataFrame({"Mean Tumor Volume":mean_tumor_vol,
                                     "Median Tumor Volume":median_tumor_vol,
                                     "Variance Tumor Volume":variance_tumor_vol,
                                     "Std Deviation Tumor Volume":stddev_tumor_vol,
                                     "Std Error Tumor Volume": sem_tumor_vol})

Tumor_vol_statistics


Unnamed: 0_level_0,Mean Tumor Volume,Median Tumor Volume,Variance Tumor Volume,Std Deviation Tumor Volume,Std Error Tumor Volume
Drug Regimen,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1
Capomulin,40.675741,41.557809,24.839296,4.983904,0.328629
Ceftamin,52.591172,51.776157,39.069446,6.250556,0.468499
Infubinol,52.884795,51.820584,42.886388,6.54877,0.490851
Ketapril,55.235638,53.698743,68.18893,8.257659,0.602252
Naftisol,54.331565,52.509285,65.817708,8.112811,0.59486
Placebo,54.033581,52.288934,60.830138,7.799368,0.579722
Propriva,52.322552,50.854632,42.08802,6.487528,0.511289
Ramicane,40.216745,40.673236,23.383692,4.835669,0.32025
Stelasyn,54.233149,52.431737,59.122106,7.68909,0.571526
Zoniferol,53.236507,51.818479,48.266689,6.947423,0.514977


## Bar plots

In [3]:
# Generate a bar plot showing number of data points for each treatment regimen using pandas

bar_plot_pandas = Tumor_vol_statistics.plot(kind='bar',figsize=(10,3),align = "center",rot = 45,fontsize =8 )
bar_plot_pandas.legend(bbox_to_anchor=(1, 0.85),loc = "upper left",fontsize =8 )

# Set a title and x and y labels for the chart
bar_plot_pandas.set_xlabel("Drug Regimen",fontsize = 8)
bar_plot_pandas.set_ylabel("Tumor Volume Statistics",fontsize = 8)
bar_plot_pandas.set_title("Tumor Statistics Vs Drug Regimen",fontsize = 8)
plt.tight_layout()

plt.show()




<IPython.core.display.Javascript object>

In [4]:
# Generate a bar plot showing number of data points for each treatment regimen using pyplot
x_axis = np.arange(len(Tumor_vol_statistics))
tick_locations = [value for value in x_axis]

plt.figure(figsize=(10,3))
width = 0.12 #setting width of the bars

#Plotting the bars one after the other for each data point
pmean= plt.bar(x_axis, Tumor_vol_statistics["Mean Tumor Volume"],width, color="steelblue", alpha=1, align="center")
pmedian = plt.bar(x_axis+width, Tumor_vol_statistics["Median Tumor Volume"], width ,color="darkorange", alpha=1, align="center")
pvariance = plt.bar(x_axis+(2*width), Tumor_vol_statistics["Variance Tumor Volume"], width ,color="green", alpha=1, align="center")
pstd = plt.bar(x_axis+(3*width), Tumor_vol_statistics["Std Deviation Tumor Volume"], width ,color="red", alpha=1, align="center")
pstderr = plt.bar(x_axis+(4*width), Tumor_vol_statistics["Std Error Tumor Volume"], width ,color="mediumpurple", alpha=1, align="center")

legend_values = ["Mean Tumor Volume", "Median Tumor Volume","Variance Tumor Volume",
                 "Std Deviation Tumor Volume","Std Error Tumor Volume"]

plt.legend((legend_values[0],legend_values[1],legend_values[2],legend_values[3],legend_values[4]),
           bbox_to_anchor=(1, 0.85),
           loc = "upper left",
           fontsize = 8)

plt.xticks(tick_locations, Tumor_vol_statistics.index, rotation="45",fontsize = 8)

plt.title("Tumor Statistics Vs Drug Regimen",fontsize = 8)
plt.xlabel("Drug Regimen",fontsize = 8)
plt.ylabel("Tumor Volume Statistics",fontsize = 8)
plt.tight_layout()

plt.show()


<IPython.core.display.Javascript object>

## Pie plots

In [5]:
# Generate a pie plot showing the distribution of female versus male mice using pandas

#Mouse metadata DF value counts by sex will give the count of male Vs female mice
Mouse_metadata_by_sex = mouse_metadata["Sex"].value_counts()


pie_plot_pandas = Mouse_metadata_by_sex.plot(kind='pie',autopct = '%1.1f%%',figsize=(6, 4),
                                             labels=None,fontsize = 8)

pie_plot_pandas.legend(loc="center", labels=Mouse_metadata_by_sex.index,bbox_to_anchor=(1.1, 0.5),fontsize = 8)
pie_plot_pandas.set_title("Distribution of Female versus Male mice",fontsize = 8)
pie_plot_pandas.set(ylabel="")
plt.show()


<IPython.core.display.Javascript object>

In [6]:
# Generate a pie plot showing the distribution of female versus male mice using pyplot

plt.figure(figsize=(6,4))
plt.pie(Mouse_metadata_by_sex,autopct = '%1.1f%%')


plt.title("Distribution of Female versus Male mice",fontsize = 8)
plt.legend(loc="center", labels=Mouse_metadata_by_sex.index,bbox_to_anchor=(1.1, 0.5),fontsize = 8)

plt.show()



<IPython.core.display.Javascript object>

## Quartiles, outliers and boxplots

In [7]:
#Calculate the final tumor volume of each mouse across four of the most promising treatment regimens.
#Capomulin, Ramicane, Infubinol, and Ceftamin
#Calculate the IQR and quantitatively determine if there are any potential outliers.

# Capomulin 
mouse_study_Capomulin = mouse_study_results_df.loc[(mouse_study_results_df["Drug Regimen"] == "Capomulin")]

mouse_study_Capomulin_grouped = mouse_study_Capomulin.groupby(["Mouse ID"])

#get the max Timepoint for each mouse ID
final_timepoint_Capomulin = mouse_study_Capomulin_grouped["Timepoint"].max()

#merge the final timepoints per mouse with the main data set, to get the corresponding final tumor volume 
Capomulin_final_result = pd.merge(final_timepoint_Capomulin,mouse_study_Capomulin,on=["Mouse ID","Timepoint"])

Capomulin_final_result_reduced = Capomulin_final_result[["Mouse ID","Timepoint","Drug Regimen","Tumor Volume (mm3)"]]

print("Below are the final tumor volume of each mouse being treated with Capomulin: ") 
print(Capomulin_final_result_reduced)

#quartiles and IQR calculation 
Capomulin_quartiles = Capomulin_final_result_reduced['Tumor Volume (mm3)'].quantile([.25,.5,.75]) 
Capomulin_lowerq = Capomulin_quartiles[0.25] 
Capomulin_upperq = Capomulin_quartiles[0.75] 
Capomulin_iqr = Capomulin_upperq-Capomulin_lowerq

print(f"The lower quartile of Capomulin drug regimen response in terms of tumor volume is: {Capomulin_lowerq}") 
print(f"The upper quartile of Capomulin drug regimen response in terms of tumor volume is: {Capomulin_upperq}") 
print(f"The interquartile range of Capomulin drug regimen response in terms of tumor volume is: {Capomulin_iqr}") 
print(f"The median of Capomulin drug regimen response in terms of tumor volume is: {Capomulin_quartiles[0.5]} ")

Capomulin_lower_bound = Capomulin_lowerq - (1.5*Capomulin_iqr) 
Capomulin_upper_bound = Capomulin_upperq + (1.5*Capomulin_iqr) 
print(f"Values below {Capomulin_lower_bound} could be outliers.") 
print(f"Values above {Capomulin_upper_bound} could be outliers.")

#find outliers where tumor volume < lower bound or > upper bound 
outlier_Capomulin = Capomulin_final_result_reduced.loc[(Capomulin_final_result_reduced['Tumor Volume (mm3)'] < Capomulin_lower_bound) | (Capomulin_final_result_reduced['Tumor Volume (mm3)'] > Capomulin_upper_bound)]

#print outliers , if any, else print a message that no outliers were found. 
if len(outlier_Capomulin) == 0: 
    print(" However, there are no outliers identified") 
else: 
    print("Outliers identified below") 
    print(outlier_Capomulin)

Below are the final tumor volume of each mouse being treated with Capomulin: 
   Mouse ID  Timepoint Drug Regimen  Tumor Volume (mm3)
0      b128         45    Capomulin           38.982878
1      b742         45    Capomulin           38.939633
2      f966         20    Capomulin           30.485985
3      g288         45    Capomulin           37.074024
4      g316         45    Capomulin           40.159220
5      i557         45    Capomulin           47.685963
6      i738         45    Capomulin           37.311846
7      j119         45    Capomulin           38.125164
8      j246         35    Capomulin           38.753265
9      l509         45    Capomulin           41.483008
10     l897         45    Capomulin           38.846876
11     m601         45    Capomulin           28.430964
12     m957         45    Capomulin           33.329098
13     r157         15    Capomulin           46.539206
14     r554         45    Capomulin           32.377357
15     r944         45    

In [8]:
#Ramicane calculations for IQR and outliers done in the similar way as above
mouse_study_Ramicane = mouse_study_results_df.loc[(mouse_study_results_df["Drug Regimen"] == "Ramicane")]

mouse_study_Ramicane_grouped = mouse_study_Ramicane.groupby(["Mouse ID"])

final_timepoint_Ramicane = mouse_study_Ramicane_grouped["Timepoint"].max()

Ramicane_final_result = pd.merge(final_timepoint_Ramicane,mouse_study_Ramicane,on=["Mouse ID","Timepoint"])

Ramicane_final_result_reduced = Ramicane_final_result[["Mouse ID","Timepoint","Drug Regimen","Tumor Volume (mm3)"]]

print("Below are the final tumor volume of each mouse being treated with Ramicane: ")
print(Ramicane_final_result_reduced)

Ramicane_quartiles = Ramicane_final_result_reduced['Tumor Volume (mm3)'].quantile([.25,.5,.75])
Ramicane_lowerq = Ramicane_quartiles[0.25]
Ramicane_upperq = Ramicane_quartiles[0.75]
Ramicane_iqr = Ramicane_upperq-Ramicane_lowerq

print(f"The lower quartile of Ramicane drug regimen response in terms of tumor volume is: {Ramicane_lowerq}")
print(f"The upper quartile of Ramicane drug regimen response in terms of tumor volume is: {Ramicane_upperq}")
print(f"The interquartile range of Ramicane drug regimen response in terms of tumor volume is: {Ramicane_iqr}")
print(f"The median of Ramicane drug regimen response in terms of tumor volume is: {Ramicane_quartiles[0.5]} ")

Ramicane_lower_bound = Ramicane_lowerq - (1.5*Ramicane_iqr)
Ramicane_upper_bound = Ramicane_upperq + (1.5*Ramicane_iqr)
print(f"Values below {Ramicane_lower_bound} could be outliers.")
print(f"Values above {Ramicane_upper_bound} could be outliers.")

outlier_Ramicane = Ramicane_final_result_reduced.loc[(Ramicane_final_result_reduced['Tumor Volume (mm3)'] < Ramicane_lower_bound) | (Ramicane_final_result_reduced['Tumor Volume (mm3)'] > Ramicane_upper_bound)]

if len(outlier_Ramicane) == 0:
 print(" However, there are no outliers identified")
else:
         print("Outliers identified below")
         print(outlier_Ramicane)


Below are the final tumor volume of each mouse being treated with Ramicane: 
   Mouse ID  Timepoint Drug Regimen  Tumor Volume (mm3)
0      a411         45     Ramicane           38.407618
1      a444         45     Ramicane           43.047543
2      a520         45     Ramicane           38.810366
3      a644         45     Ramicane           32.978522
4      c458         30     Ramicane           38.342008
5      c758         45     Ramicane           33.397653
6      d251         45     Ramicane           37.311236
7      e662         45     Ramicane           40.659006
8      g791         45     Ramicane           29.128472
9      i177         45     Ramicane           33.562402
10     i334         45     Ramicane           36.374510
11     j913         45     Ramicane           31.560470
12     j989         45     Ramicane           36.134852
13     k403         45     Ramicane           22.050126
14     m546         45     Ramicane           30.564625
15     n364         45     

In [9]:
#Infubinol calculations for IQR and outliers done in the similar way as above
mouse_study_Infubinol = mouse_study_results_df.loc[(mouse_study_results_df["Drug Regimen"] == "Infubinol")]


mouse_study_Infubinol_grouped = mouse_study_Infubinol.groupby(["Mouse ID"])

Infubinol_final_timepoint = mouse_study_Infubinol_grouped["Timepoint"].max()


Infubinol_final_result = pd.merge(Infubinol_final_timepoint,mouse_study_Infubinol,on=["Mouse ID","Timepoint"])


Infubinol_final_result_reduced = Infubinol_final_result[["Mouse ID","Timepoint","Drug Regimen","Tumor Volume (mm3)"]]


print("Below are the final tumor volume of each mouse being treated with Infubinol: ")
print(Infubinol_final_result_reduced)

Infubinol_quartiles = Infubinol_final_result_reduced['Tumor Volume (mm3)'].quantile([.25,.5,.75])
Infubinol_lowerq = Infubinol_quartiles[0.25]
Infubinol_upperq = Infubinol_quartiles[0.75]
Infubinol_iqr = Infubinol_upperq-Infubinol_lowerq

print(f"The lower quartile of Infubinol drug regimen response in terms of tumor volume is: {Infubinol_lowerq}")
print(f"The upper quartile of Infubinol drug regimen response in terms of tumor volume is: {Infubinol_upperq}")
print(f"The interquartile range of Infubinol drug regimen response in terms of tumor volume is: {Infubinol_iqr}")
print(f"The median of Infubinol drug regimen response in terms of tumor volume is: {Infubinol_quartiles[0.5]} ")

Infubinol_lower_bound = Infubinol_lowerq - (1.5*Infubinol_iqr)
Infubinol_upper_bound = Infubinol_upperq + (1.5*Infubinol_iqr)
print(f"Values below {Infubinol_lower_bound} could be outliers.")
print(f"Values above {Infubinol_upper_bound} could be outliers.")

outlier_Infubinol = Infubinol_final_result_reduced.loc[(Infubinol_final_result_reduced['Tumor Volume (mm3)'] < Infubinol_lower_bound) | (Infubinol_final_result_reduced['Tumor Volume (mm3)'] > Infubinol_upper_bound)]

if len(outlier_Infubinol) == 0:
 print(" However, there are no outliers identified")
else:
         print("Outliers identified below")
         print(outlier_Infubinol)



Below are the final tumor volume of each mouse being treated with Infubinol: 
   Mouse ID  Timepoint Drug Regimen  Tumor Volume (mm3)
0      a203         45    Infubinol           67.973419
1      a251         45    Infubinol           65.525743
2      a577         30    Infubinol           57.031862
3      a685         45    Infubinol           66.083066
4      c139         45    Infubinol           72.226731
5      c326          5    Infubinol           36.321346
6      c895         30    Infubinol           60.969711
7      e476         45    Infubinol           62.435404
8      f345         45    Infubinol           60.918767
9      i386         40    Infubinol           67.289621
10     k483         45    Infubinol           66.196912
11     k804         35    Infubinol           62.117279
12     m756          5    Infubinol           47.010364
13     n671         30    Infubinol           60.165180
14     o809         35    Infubinol           55.629428
15     o813          5    

In [10]:
#Ceftamin calculations for IQR and outliers done in the similar way as above
mouse_study_Ceftamin = mouse_study_results_df.loc[(mouse_study_results_df["Drug Regimen"] == "Ceftamin")]

mouse_study_Ceftamin_grouped = mouse_study_Ceftamin.groupby(["Mouse ID"])

Ceftamin_final_timepoint = mouse_study_Ceftamin_grouped["Timepoint"].max()

Ceftamin_final_result = pd.merge(Ceftamin_final_timepoint,mouse_study_Ceftamin,on=["Mouse ID","Timepoint"])

Ceftamin_final_result_reduced = Ceftamin_final_result[["Mouse ID","Timepoint","Drug Regimen","Tumor Volume (mm3)"]]

print("Below are the final tumor volume of each mouse being treated with Ceftamin: ")
print(Ceftamin_final_result_reduced)

Ceftamin_quartiles = Ceftamin_final_result_reduced['Tumor Volume (mm3)'].quantile([.25,.5,.75])
Ceftamin_lowerq = Ceftamin_quartiles[0.25]
Ceftamin_upperq = Ceftamin_quartiles[0.75]
Ceftamin_iqr = Ceftamin_upperq-Ceftamin_lowerq

print(f"The lower quartile of Ceftamin drug regimen response in terms of tumor volume is: {Ceftamin_lowerq}")
print(f"The upper quartile of Ceftamin drug regimen response in terms of tumor volume is: {Ceftamin_upperq}")
print(f"The interquartile range of Ceftamin drug regimen response in terms of tumor volume is: {Ceftamin_iqr}")
print(f"The median of Ceftamin drug regimen response in terms of tumor volume is: {Ceftamin_quartiles[0.5]} ")

Ceftamin_lower_bound = Ceftamin_lowerq - (1.5*Ceftamin_iqr)
Ceftamin_upper_bound = Ceftamin_upperq + (1.5*Ceftamin_iqr)
print(f"Values below {Ceftamin_lower_bound} could be outliers.")
print(f"Values above {Ceftamin_upper_bound} could be outliers.")

outlier_Ceftamin = Ceftamin_final_result_reduced.loc[(Ceftamin_final_result_reduced['Tumor Volume (mm3)'] < Ceftamin_lower_bound) | (Infubinol_final_result_reduced['Tumor Volume (mm3)'] > Infubinol_upper_bound)]

if len(outlier_Ceftamin) == 0:
 print(" However, there are no outliers identified")
else:
         print("Outliers identified below")
         print(outlier_Ceftamin)


Below are the final tumor volume of each mouse being treated with Ceftamin: 
   Mouse ID  Timepoint Drug Regimen  Tumor Volume (mm3)
0      a275         45     Ceftamin           62.999356
1      b447          0     Ceftamin           45.000000
2      b487         25     Ceftamin           56.057749
3      b759         30     Ceftamin           55.742829
4      f436         15     Ceftamin           48.722078
5      h531          5     Ceftamin           47.784682
6      j296         45     Ceftamin           61.849023
7      k210         45     Ceftamin           68.923185
8      l471         45     Ceftamin           67.748662
9      l490         30     Ceftamin           57.918381
10     l558         10     Ceftamin           46.784535
11     l661         45     Ceftamin           59.851956
12     l733         45     Ceftamin           64.299830
13     o287         45     Ceftamin           59.741901
14     p438         45     Ceftamin           61.433892
15     q483         40     

In [11]:
# Generate a box plot of the final tumor volume of each mouse across four regimens of interest
data = [Capomulin_final_result_reduced["Tumor Volume (mm3)"],Ramicane_final_result_reduced["Tumor Volume (mm3)"], 
        Infubinol_final_result_reduced["Tumor Volume (mm3)"],Ceftamin_final_result_reduced["Tumor Volume (mm3)"]]


#defining the outlier display
red_star = dict(markerfacecolor='r', marker='*')


Drugs = ["","Capomulin","Ramicane","Infubinol","Ceftamin"]
ind = np.arange(len(Drugs))
tick = [value for value in ind]

fig1, ax1 = plt.subplots()
ax1.set_title("Final Tumor Volumes Vs Drugs",fontsize = 8)
ax1.set_ylabel('Final Tumor volumes (mm3)',fontsize = 8)

ax1.boxplot(data,flierprops=red_star)

plt.xticks(tick, Drugs,fontsize = 8)
plt.ylim(20,80)
plt.yticks(fontsize = 8)
plt.tight_layout()

plt.show()


<IPython.core.display.Javascript object>

## Line and scatter plots

In [12]:
# Generate a line plot of time point versus tumor volume for a mouse treated with Capomulin

#Capomulin treated mouse ID s185

mouse_study_Capomulin_s185 = mouse_study_Capomulin.loc[(mouse_study_Capomulin["Mouse ID"] == "s185")]

Timepoint_in_days = mouse_study_Capomulin_s185["Timepoint"]

Tumor_vol_in_days = mouse_study_Capomulin_s185["Tumor Volume (mm3)"]

plt.plot(Timepoint_in_days,Tumor_vol_in_days,marker="o", color="green")

plt.title("Tumor reponse to Capomulin regimen (Mouse ID s185 )",fontsize =8)
plt.xlabel("Time in Days",fontsize =8)
plt.ylabel("Tumor Volume (mm3)",fontsize =8)

plt.xticks(fontsize =8)
plt.yticks(fontsize =8)
plt.xlim(-5, 50)
plt.ylim(10, 50)

plt.grid(alpha = 0.25)
plt.show()

<IPython.core.display.Javascript object>

In [13]:
# Generate a scatter plot of mouse weight versus average tumor volume for the Capomulin regimen

mouse_study_Capomulin_by_weight= mouse_study_Capomulin.groupby(["Weight (g)"])
Average_tumor_vol_by_weight = mouse_study_Capomulin_by_weight["Tumor Volume (mm3)"].mean()

Avg_Tumor_volume_by_weight_df = pd.DataFrame(Average_tumor_vol_by_weight)

x_values = Avg_Tumor_volume_by_weight_df.index

y_values = Avg_Tumor_volume_by_weight_df["Tumor Volume (mm3)"]


plt.scatter(x_values,y_values,marker="d", facecolors="g", edgecolors="black")

plt.title("Weight Vs Average Tumor Volume",fontsize =8)
plt.xlabel("Weight (g)",fontsize =8)
plt.ylabel("Tumor Volume (mm3)",fontsize =8)

plt.xlim(13,28)
plt.ylim(35,47)
plt.xticks(fontsize =8)
plt.yticks(fontsize =8)

plt.grid(alpha = 0.25)
plt.show()



<IPython.core.display.Javascript object>

In [14]:
# Calculate the correlation coefficient and linear regression model for mouse weight and average tumor volume for the Capomulin regimen
Corr_Coeff = round(st.pearsonr(x_values,y_values)[0],2)
print(f"The correlation coefficient between mouse weight and average tumor volume for the Capomulin regimen is {Corr_Coeff}")

The correlation coefficient between mouse weight and average tumor volume for the Capomulin regimen is 0.95


In [15]:
#Linear regression model

(slope, intercept, rvalue, pvalue, stderr) = linregress(x_values, y_values)

regress_values = x_values * slope + intercept

line_eq = "y = " + str(round(slope,2)) + "x + " + str(round(intercept,2))

print(f"The r-squared value is: {rvalue}")
print(f"The linear regression values are {regress_values[0],regress_values[1],regress_values[2],regress_values[3],regress_values[4],regress_values[5],regress_values[6],regress_values[7],regress_values[8]}")

plt.scatter(x_values,y_values,marker="o", facecolors="b", edgecolors="black")
plt.plot(x_values,regress_values,"g-")
plt.annotate(line_eq,(22,40),fontsize=8,color="red")

plt.title(f"Linear Regression model ({line_eq}) ",fontsize = 8)
plt.xlabel("Weight (g)",fontsize = 8)
plt.ylabel("Tumor Volume (mm3)",fontsize = 8)

plt.xlim(13,28)
plt.ylim(35,47)
plt.xticks(fontsize =8)
plt.yticks(fontsize =8)

plt.grid(alpha = 0.25)
plt.show()


The r-squared value is: 0.950524396185527
The linear regression values are (36.18581912960284, 37.97536434907097, 39.764909568539096, 40.65968217827316, 41.55445478800722, 42.44922739774128, 43.344000007475344, 44.238772617209406, 45.13354522694347)


<IPython.core.display.Javascript object>