# Predicting Fetal Health: A Comprehensive Classification Study

In the realm of fetal health classification, we embark on a vital mission to predict the well-being of unborn infants, and this task carries immense weight, touching the lives of expectant mothers and the most vulnerable among us. Our journey begins with a meticulous analysis of a comprehensive fetal health classification dataset—a treasure trove of 2126 records of features extracted from Cardiotocogram exams. These records were carefully classified into three classes: "Normal," "Suspect," and "Pathological" by three expert obstetricians, whose expertise and insights are invaluable in shaping our understanding of fetal health.

Much like a symphony of strategies in a grand performance, we wield a trio of classifiers: linear regression, random forest, and decision tree. Each classifier possesses its unique strengths and methods, akin to the diverse tactics employed by teams on the field. With precision honed through rigorous training, these algorithms adapt to unveil critical insights, distinguishing between potential fetal health concerns and positive outcomes.

Yet, fetal health classification extends beyond individual performance. It demands a deep understanding of the intricate interplay between health indicators, much like deciphering the tactics of rival teams. We delve into the dataset's depths, unveiling hidden patterns and identifying variables that signify fetal well-being or distress. This analytical journey mirrors the art of studying opponents' strategies to gain an edge.

While the applause may be silent in the laboratory, the significance of our work cannot be overstated. These classifiers empower healthcare professionals to make informed decisions, ensuring the well-being of both unborn children and expectant mothers. In this realm, your role is indispensable, much like an unsung hero, translating data into actionable insights that shape the course of future generations' health—a testament to the power of data and innovation in the world of healthcare.

## Module 1
### Task 1: Data Dive: Exploring Fetal Health Insights



In [2]:
import pandas as pd
df = pd.read_csv('/content/drive/MyDrive/HiCounselor/Predicting_Fetal_Health_A_Comprehensive_Classification_Study/fetal_health.csv')

In [4]:
df.head()

Unnamed: 0,baseline value,accelerations,fetal_movement,uterine_contractions,light_decelerations,severe_decelerations,prolongued_decelerations,abnormal_short_term_variability,mean_value_of_short_term_variability,percentage_of_time_with_abnormal_long_term_variability,...,histogram_min,histogram_max,histogram_number_of_peaks,histogram_number_of_zeroes,histogram_mode,histogram_mean,histogram_median,histogram_variance,histogram_tendency,fetal_health
0,120.0,0.0,0.0,0.0,0.0,0.0,0.0,73.0,0.5,43.0,...,62.0,126.0,2.0,0.0,120.0,137.0,121.0,73.0,1.0,2.0
1,132.0,0.006,0.0,0.006,0.003,0.0,0.0,17.0,2.1,0.0,...,68.0,198.0,6.0,1.0,141.0,136.0,140.0,12.0,0.0,1.0
2,133.0,0.003,0.0,0.008,0.003,0.0,0.0,16.0,2.1,0.0,...,68.0,198.0,5.0,1.0,141.0,135.0,138.0,13.0,0.0,1.0
3,134.0,0.003,0.0,0.008,0.003,0.0,0.0,16.0,2.4,0.0,...,53.0,170.0,11.0,0.0,137.0,134.0,137.0,13.0,1.0,1.0
4,132.0,0.007,0.0,0.008,0.0,0.0,0.0,16.0,2.4,0.0,...,53.0,170.0,9.0,0.0,137.0,136.0,138.0,11.0,1.0,1.0


In [5]:
df.shape

(2126, 22)

### Task 2: Managing Duplicates in Fetal Health Data


In [9]:
duplicates = df.duplicated().sum()
print(duplicates)

13


### Task 3: Enhancing Data Integrity


In [11]:
df.drop_duplicates(inplace=True)
df

Unnamed: 0,baseline value,accelerations,fetal_movement,uterine_contractions,light_decelerations,severe_decelerations,prolongued_decelerations,abnormal_short_term_variability,mean_value_of_short_term_variability,percentage_of_time_with_abnormal_long_term_variability,...,histogram_min,histogram_max,histogram_number_of_peaks,histogram_number_of_zeroes,histogram_mode,histogram_mean,histogram_median,histogram_variance,histogram_tendency,fetal_health
0,120.0,0.000,0.000,0.000,0.000,0.0,0.0,73.0,0.5,43.0,...,62.0,126.0,2.0,0.0,120.0,137.0,121.0,73.0,1.0,2.0
1,132.0,0.006,0.000,0.006,0.003,0.0,0.0,17.0,2.1,0.0,...,68.0,198.0,6.0,1.0,141.0,136.0,140.0,12.0,0.0,1.0
2,133.0,0.003,0.000,0.008,0.003,0.0,0.0,16.0,2.1,0.0,...,68.0,198.0,5.0,1.0,141.0,135.0,138.0,13.0,0.0,1.0
3,134.0,0.003,0.000,0.008,0.003,0.0,0.0,16.0,2.4,0.0,...,53.0,170.0,11.0,0.0,137.0,134.0,137.0,13.0,1.0,1.0
4,132.0,0.007,0.000,0.008,0.000,0.0,0.0,16.0,2.4,0.0,...,53.0,170.0,9.0,0.0,137.0,136.0,138.0,11.0,1.0,1.0
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
2121,140.0,0.000,0.000,0.007,0.000,0.0,0.0,79.0,0.2,25.0,...,137.0,177.0,4.0,0.0,153.0,150.0,152.0,2.0,0.0,2.0
2122,140.0,0.001,0.000,0.007,0.000,0.0,0.0,78.0,0.4,22.0,...,103.0,169.0,6.0,0.0,152.0,148.0,151.0,3.0,1.0,2.0
2123,140.0,0.001,0.000,0.007,0.000,0.0,0.0,79.0,0.4,20.0,...,103.0,170.0,5.0,0.0,153.0,148.0,152.0,4.0,1.0,2.0
2124,140.0,0.001,0.000,0.006,0.000,0.0,0.0,78.0,0.4,27.0,...,103.0,169.0,6.0,0.0,152.0,147.0,151.0,4.0,1.0,2.0


### Task 4: Managing Missing Values in Fetal Health Dataset


In [12]:
null_values = df.isnull().sum()
print(null_values)

baseline value                                            0
accelerations                                             0
fetal_movement                                            0
uterine_contractions                                      0
light_decelerations                                       0
severe_decelerations                                      0
prolongued_decelerations                                  0
abnormal_short_term_variability                           0
mean_value_of_short_term_variability                      0
percentage_of_time_with_abnormal_long_term_variability    0
mean_value_of_long_term_variability                       0
histogram_width                                           0
histogram_min                                             0
histogram_max                                             0
histogram_number_of_peaks                                 0
histogram_number_of_zeroes                                0
histogram_mode                          

### Class Distribution Analysis

In [13]:
values = df['fetal_health'].value_counts()
values

Unnamed: 0_level_0,count
fetal_health,Unnamed: 1_level_1
1.0,1646
2.0,292
3.0,175


## Module 2
### Task 1: Split Data into Features and Target

In [18]:
# Split into X and Y
x = df.drop(['fetal_health'], axis=1)
y = df['fetal_health']

In [19]:
# Inspect data
x, y

(      baseline value  accelerations  fetal_movement  uterine_contractions  \
 0              120.0          0.000           0.000                 0.000   
 1              132.0          0.006           0.000                 0.006   
 2              133.0          0.003           0.000                 0.008   
 3              134.0          0.003           0.000                 0.008   
 4              132.0          0.007           0.000                 0.008   
 ...              ...            ...             ...                   ...   
 2121           140.0          0.000           0.000                 0.007   
 2122           140.0          0.001           0.000                 0.007   
 2123           140.0          0.001           0.000                 0.007   
 2124           140.0          0.001           0.000                 0.006   
 2125           142.0          0.002           0.002                 0.008   
 
       light_decelerations  severe_decelerations  prolongued_d

### Task 2: Split Data into Training and Testing Sets¶


In [22]:
from sklearn.model_selection import train_test_split

x_train, x_test, y_train, y_test =  train_test_split(x, y, test_size = 0.3, random_state = 42)

x_train, x_test, y_train, y_test


(      baseline value  accelerations  fetal_movement  uterine_contractions  \
 1143           122.0          0.004            0.00                 0.006   
 1290           115.0          0.003            0.00                 0.009   
 1467           148.0          0.005            0.00                 0.005   
 1077           134.0          0.002            0.00                 0.010   
 15             130.0          0.006            0.38                 0.004   
 ...              ...            ...             ...                   ...   
 1651           132.0          0.007            0.00                 0.010   
 1104           122.0          0.000            0.00                 0.003   
 1142           122.0          0.004            0.00                 0.005   
 1306           138.0          0.002            0.00                 0.007   
 869            136.0          0.007            0.00                 0.005   
 
       light_decelerations  severe_decelerations  prolongued_d

### Task 3: Train a Logistic Regression Model

In [23]:
from sklearn.linear_model import LogisticRegression
from sklearn.metrics import accuracy_score

model1b = LogisticRegression(solver='liblinear').fit(x_train,y_train)


### Task 4: Make Predictions with the Trained Model¶


In [24]:
y_predict1b = model1b.predict(x_test)

In [25]:
y_predict1b

array([1., 1., 1., 2., 1., 2., 1., 1., 1., 1., 1., 1., 1., 1., 3., 1., 1.,
       1., 1., 2., 3., 1., 3., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1.,
       1., 1., 1., 1., 1., 1., 3., 1., 1., 1., 2., 1., 1., 1., 1., 1., 1.,
       2., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 2., 1., 1., 1., 1., 1.,
       1., 1., 1., 1., 1., 3., 1., 2., 2., 1., 1., 3., 2., 1., 1., 1., 1.,
       1., 1., 1., 1., 3., 2., 3., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1.,
       1., 1., 2., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 2., 1., 1., 1.,
       2., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1.,
       1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1.,
       3., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1.,
       1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 2., 1.,
       1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 3., 3.,
       1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 3., 1., 1., 1., 1., 1., 1.,
       3., 1., 1., 1., 1.

### Task 5: Calculate Model Accuracy


In [26]:
# Check Accuracy
accuracy_lr = accuracy_score(y_test, y_predict1b)


In [27]:
accuracy_lr

0.88801261829653

### Task 6: Initialize a Random Forest Classifier

In [29]:
from sklearn.ensemble import RandomForestClassifier

model2b = RandomForestClassifier(criterion='gini', n_estimators=100, max_depth=4, random_state=33)

### Task 7: Train the Random Forest Classifier

In [30]:
model2b.fit(x_train, y_train)

In [31]:
# Make Prediction
y_pred2b = model2b.predict(x_test)

In [32]:
y_pred2b

array([1., 1., 1., 1., 1., 2., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1.,
       1., 1., 2., 3., 1., 3., 1., 1., 1., 1., 1., 1., 1., 1., 1., 2., 1.,
       1., 1., 1., 1., 1., 1., 2., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1.,
       2., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 2., 1., 1., 1., 1., 2.,
       1., 1., 1., 1., 1., 3., 1., 2., 2., 1., 2., 3., 2., 1., 1., 1., 1.,
       1., 1., 1., 1., 1., 2., 3., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1.,
       1., 1., 2., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1.,
       1., 1., 1., 1., 2., 1., 1., 1., 2., 1., 1., 1., 1., 1., 1., 1., 1.,
       1., 1., 1., 1., 1., 1., 1., 1., 1., 2., 1., 1., 1., 1., 1., 1., 2.,
       3., 1., 1., 2., 1., 1., 1., 2., 1., 2., 1., 1., 1., 1., 1., 1., 1.,
       2., 2., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 2., 1.,
       1., 1., 1., 1., 1., 1., 1., 1., 1., 2., 1., 1., 1., 1., 1., 3., 3.,
       2., 1., 1., 1., 1., 1., 1., 1., 1., 1., 3., 1., 1., 1., 1., 1., 1.,
       3., 1., 1., 1., 1.

### Task 9: Calculate Accuracy of the Random Forest Model¶

In [33]:
# Check Accuracy
accuracy_rf = accuracy_score(y_test, y_pred2b)

In [34]:
accuracy_rf

0.9211356466876972

### Task 10: Initialize a Decision Tree Classifier

In [36]:
from sklearn.tree  import DecisionTreeClassifier

model3b = DecisionTreeClassifier(criterion='gini', max_depth=2, random_state=33)

### Task 11: Train the Decision Tree Classifier

In [37]:
model3b.fit(x_train, y_train)


### Task 12: Make Predictions with the Decision Tree Model

In [39]:
# Make Predictions
y_pred3b = model3b.predict(x_test)

In [41]:
y_pred3b

array([1., 1., 1., 1., 1., 2., 1., 1., 1., 1., 1., 1., 1., 1., 3., 1., 1.,
       1., 1., 2., 3., 1., 3., 1., 1., 1., 2., 1., 1., 1., 1., 1., 2., 1.,
       1., 1., 1., 1., 1., 1., 2., 1., 1., 1., 1., 1., 2., 1., 1., 2., 1.,
       2., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 2., 1., 1., 1., 1., 2.,
       1., 1., 1., 1., 1., 3., 1., 2., 2., 1., 2., 3., 2., 1., 1., 1., 1.,
       1., 1., 1., 1., 1., 2., 3., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1.,
       1., 1., 2., 1., 1., 1., 3., 1., 1., 1., 1., 1., 1., 2., 1., 1., 1.,
       1., 1., 1., 1., 2., 1., 1., 1., 2., 1., 1., 1., 1., 1., 1., 3., 1.,
       1., 1., 1., 1., 1., 1., 1., 1., 1., 2., 1., 1., 1., 1., 1., 1., 2.,
       3., 1., 1., 2., 1., 2., 1., 2., 1., 2., 1., 1., 1., 1., 1., 1., 1.,
       2., 2., 1., 1., 2., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 2., 1.,
       1., 1., 1., 1., 1., 1., 1., 1., 1., 2., 1., 2., 1., 1., 1., 3., 3.,
       2., 2., 1., 2., 1., 1., 1., 1., 1., 1., 3., 1., 1., 1., 1., 1., 1.,
       3., 2., 1., 1., 1.

### Task 13: Calculate Accuracy of the Decision Tree Model

In [42]:
# Check Accuracy Score
accuracy_dt = accuracy_score(y_test, y_pred3b)

In [43]:
accuracy_dt

0.9037854889589906

### Model Performance Unveiled: Assessing Fetal Health Predictions

In [44]:
prediction_df = pd.DataFrame({'Actual':y_test, 'Predicted':y_pred2b})

In [45]:
prediction_df


Unnamed: 0,Actual,Predicted
601,2.0,1.0
2005,1.0,1.0
427,1.0,1.0
291,2.0,1.0
197,2.0,1.0
...,...,...
1750,3.0,3.0
948,1.0,1.0
523,1.0,1.0
817,2.0,2.0
