# Cardiomyocyte Content Prediction

In [1]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
import sklearn

## Introduction

#### Stem cell-derived cardyomyocytes (CM) have great potential for ischemic heart disease treatment. For their in-vitro production, hPSC (human pluripotent stem cells) must differentiate into hPSC-CM. To gain insight into the CM production process, process parameters and measurements will be examined to determined those that are most influential on increasing CM production (*source paper)

### Main questions:
#### Which factors are most influential in increasing CM production?
#### Can CM production be accurately predicted with these factors?

In [9]:
# Importing data
train_df = pd.read_csv('train_data.csv')
test_df = pd.read_csv('test_data.csv')
# Ensuring both datasets have same number of features
assert len(train_df.columns) == len(test_df.columns)
# Combining both datasets
data = pd.concat([train_df, test_df], ignore_index=True)

#### New train and test sets will be made later to ensure a random split

## Data Cleaning

In [12]:
data.isna().sum().sum()

0

In [13]:
data.describe()

Unnamed: 0,dd0 Cell Density,dd0-dd1 Cell Density Gradient,dd1 Cell Density,dd1-dd2 Cell Density Gradient,dd2 Cell Density,dd2-dd3 Cell Density Gradient,dd3 Cell Density,dd3-dd5 Cell Density Gradient,dd5 Cell Density,dd5-dd7 Cell Density Gradient,...,dd1 Lactate Concentration,dd3 Lactate Concentration,dd5 Lactate Concentration,dd7 Lactate Concentration,dd0 Glucose Concentration,dd1 Glucose Concentration,dd3 Glucose Concentration,dd5 Glucose Concentration,dd7 Glucose Concentration,dd10 CM Content
count,60.0,60.0,60.0,60.0,60.0,60.0,60.0,60.0,60.0,60.0,...,60.0,60.0,60.0,60.0,60.0,60.0,60.0,60.0,60.0,60.0
mean,0.726333,0.306942,0.851463,0.759819,1.44745,0.461963,1.903242,-0.062192,1.695008,-0.030869,...,13.263147,14.983333,14.678624,13.037513,9.814893,4.170149,1.607243,2.760942,2.849942,66.172667
std,0.250183,0.586726,0.235372,0.675301,0.47042,0.743174,0.586961,0.338154,0.606149,0.331125,...,2.431586,3.79293,3.327453,3.320086,1.007209,1.46022,2.528723,1.875777,2.192971,29.385584
min,0.295,-0.755274,0.29,-0.392857,0.34,-0.462059,0.685,-0.78125,0.49,-0.794344,...,8.8,0.995,5.9,5.9,7.135,0.6,0.0,0.0,0.0,5.36
25%,0.555,-0.039279,0.697125,0.200316,1.15525,0.067512,1.395,-0.290455,1.2125,-0.239271,...,11.5,14.616667,12.1825,11.09,9.1625,3.37,0.0,1.35125,0.775,47.675
50%,0.67,0.239709,0.84,0.672321,1.38,0.245569,2.0225,-0.09143,1.64,-0.083716,...,13.2,16.15,14.7525,13.1875,9.52,4.4425,0.9,2.285,2.885,75.35
75%,0.8975,0.521429,1.0,1.05101,1.7925,0.593039,2.285,0.131216,2.1125,0.162875,...,14.06875,16.5725,17.5,15.39125,10.5025,5.3,1.910347,4.1025,4.295,92.0
max,1.28,2.625,1.36,3.567308,2.4,4.128648,3.0,0.810945,3.06,0.79375,...,19.59,18.73,20.375,20.59,12.4,6.8,10.27,8.1,8.7,97.4
