This respository apply a Python Matplotlib to visualize a real-world pharmaceutical data. The data is sourced from Pymaceuticals Inc., a burgeoning pharmaceutical company based out of San Diego. Pymaceuticals specializes in anti-cancer pharmaceuticals. In its most recent efforts, it began screening for potential treatments for squamous cell carcinoma (SCC), a commonly occurring form of skin cancer.
These analysis used a complete data from their most recent animal study in two datasets in CSV format. Data set one is Mouse_metadata.csv wich includes 249 mice identified data with SCC tumor growth were treated through a variety of drug regimens, and their Sex, Age_months and Weight (g) identified. The other dataset is Study_results.csv file which includes the results of the study in each columns Mouse I,Timepoint,Tumor Volume (mm3), and Metastatic Sites.
The purpose of this study was to compare the performance of Pymaceuticals' drug of interest, Capomulin, versus the other treatment regimens. The analysis also generated all of the table and figures needed for the technical, and top-level summary report of the study. For this analysis both datasets imported, merged,cleaned and the aggregate data diplayed in to Python Pandas dataframes, visualized in Matplotlib, and other libraries used in order to make a stastical analysis. The project is conducted in Jupyter notebook to showcase, and communicate the analysis report.
Data cleaning
summary statistics
Bar and Pie Charts
Quartiles, Outliers and Boxplots
Line and Scatter Plots
Correlation and Regression
- The data was loaded, read, combined, duplicate removed, and the head (5 rows on the top) of cleaned data out put looks as follows:
- A summary statistics table was generated by using two techniques one is by creating multiple series, and putting them all together at the end, and the other method produces everything in a single groupby function. The summary statistic table consis the mean, median, variance, standard deviation, and SEM of the tumor volume for each drug regimen.
-
Two identical bar charts was generated by using both Pandas's DataFrame.plot() and Matplotlib's pyplot that shows the number of total mice for each treatment regimen throughout the course of the study.
-
Two identical pie plot was generated by using both Pandas's DataFrame.plot() and Matplotlib's pyplot that shows the distribution of female or male mice in the study.
- The final tumor volume of each mouse across four of the most promising treatment regimens was created: Capomulin, Ramicane, Infubinol, and Ceftamin. Afterward the quartiles, IQR, and potential outliers across all the four treatment regimens was quantitatively determined.
- A box and whisker plot of the final tumor volume for all four treatment regimens was generated, and a potential outliers highlighted by using color, and style.
- A line plot created on selected mouse (r157) that was treated with Capomulin, and generate a line plot of time point versus tumor volume for that mouse.
- A scatter plot of mouse weight versus average tumor volume for the Capomulin treatment regimen was created.
- A correlation coefficient, and linear regression analysis was conducted between mouse weight and average tumor volume for the Capomulin treatment. A Plot of the linear regression model created on top of the previous scatter plot.
The correlation between mouse weight and average tumor volume is 0.84