# Predictive Models

In this notebook we will use several predictive models to try to classify the input data into correct fetal health group. Compare models to see which one is better and select the best model in use.

## Import libraries

In [1]:
import pandas as pd
import numpy as np
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LogisticRegression
from sklearn.preprocessing import StandardScaler
from sklearn.metrics import accuracy_score, precision_score, recall_score, f1_score, confusion_matrix, classification_report
from sklearn.pipeline import Pipeline
from sklearn.tree import DecisionTreeClassifier
from sklearn.ensemble import RandomForestClassifier

# Data loading

In [2]:
data = pd.read_csv('fetal_health.csv')
X = data.drop(columns=['fetal_health'])
y = data['fetal_health']
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42, stratify=y)

## Logistic Regression Model

We will evaluate the model base on the accuracy, precision, recall, and f1 score.
Generally, the model achieve accuracy of 0.88 which is very good.

In [3]:
logistic_model = LogisticRegression(solver='liblinear', max_iter=500)
logistic_model.fit(X_train, y_train)

y_pred = logistic_model.predict(X_test)

# Evaluation
lr_accuracy = accuracy_score(y_test, y_pred)
precision = precision_score(y_test, y_pred, average='weighted')
recall = recall_score(y_test, y_pred, average='weighted')
f1 = f1_score(y_test, y_pred, average='weighted')
class_report = classification_report(y_test, y_pred)

# Print results
print("Logistic Regression Model Evaluation Metrics:")
print(f"Accuracy: {lr_accuracy:.2f}")
print(f"Precision: {precision:.2f}")
print(f"Recall: {recall:.2f}")
print(f"F1 Score: {f1:.2f}")
print("Classification Report:")
print(class_report)

Logistic Regression Model Evaluation Metrics:
Accuracy: 0.88
Precision: 0.87
Recall: 0.88
F1 Score: 0.87
Classification Report:
              precision    recall  f1-score   support

         1.0       0.90      0.97      0.94       332
         2.0       0.60      0.46      0.52        59
         3.0       0.96      0.66      0.78        35

    accuracy                           0.88       426
   macro avg       0.82      0.70      0.75       426
weighted avg       0.87      0.88      0.87       426



## Random Forest Model

Same with Logistic Regression model, we evaluate the model base on the same metrics. The Random Forest seem to perform better with 0.92 accuracy.

In [4]:
random_forest_model = RandomForestClassifier(random_state=42, n_estimators=100)  # You can adjust n_estimators for optimization

# Train the Random Forest model
random_forest_model.fit(X_train, y_train)

# Make predictions with the Random Forest model
y_pred = random_forest_model.predict(X_test)

# Evaluate the Random Forest model
rf_accuracy = accuracy_score(y_test, y_pred)
precision = precision_score(y_test, y_pred, average='weighted')
recall = recall_score(y_test, y_pred, average='weighted')
f1 = f1_score(y_test, y_pred, average='weighted')
class_report = classification_report(y_test, y_pred)

print("Random Forest Model Evaluation Metrics:")
print(f"Accuracy: {rf_accuracy:.2f}")
print(f"Precision: {precision:.2f}")
print(f"Recall: {recall:.2f}")
print(f"F1 Score: {f1:.2f}")
print("Classification Report:")
print(class_report)

Random Forest Model Evaluation Metrics:
Accuracy: 0.92
Precision: 0.92
Recall: 0.92
F1 Score: 0.92
Classification Report:
              precision    recall  f1-score   support

         1.0       0.94      0.98      0.96       332
         2.0       0.83      0.68      0.75        59
         3.0       0.85      0.83      0.84        35

    accuracy                           0.92       426
   macro avg       0.88      0.83      0.85       426
weighted avg       0.92      0.92      0.92       426



## Decision Tree Model

Same with Logistic Regression model, we evaluate the model base on the same metrics. The Decision Tree seem to perform better than Logistic Regression model with 0.90 accuracy, but slightly worse than Random Forest by 0.02.

In [5]:
decision_tree_model = DecisionTreeClassifier(random_state=42)
decision_tree_model.fit(X_train, y_train)

y_pred = decision_tree_model.predict(X_test)

dt_accuracy = accuracy_score(y_test, y_pred)
precision = precision_score(y_test, y_pred, average='weighted')
recall = recall_score(y_test, y_pred, average='weighted')
f1 = f1_score(y_test, y_pred, average='weighted')
class_report = classification_report(y_test, y_pred)

print("Decision Tree Model Evaluation Metrics:")
print(f"Accuracy: {dt_accuracy:.2f}")
print(f"Precision: {precision:.2f}")
print(f"Recall: {recall:.2f}")
print(f"F1 Score: {f1:.2f}")
print("Classification Report:")
print(class_report)

Decision Tree Model Evaluation Metrics:
Accuracy: 0.90
Precision: 0.90
Recall: 0.90
F1 Score: 0.90
Classification Report:
              precision    recall  f1-score   support

         1.0       0.94      0.95      0.95       332
         2.0       0.73      0.64      0.68        59
         3.0       0.81      0.83      0.82        35

    accuracy                           0.90       426
   macro avg       0.82      0.81      0.82       426
weighted avg       0.90      0.90      0.90       426



## Conclusion models evaluation and which one to choose

We can see that random forest perform better than other models not in just accuracy but also most of other metrics.
Conclusion: For this three models, we will choose Random Forest Model to use in production.

In [6]:
print("Model Accuracy:")
print("Logistic Regression: ", lr_accuracy)
print("Random Forest: ", rf_accuracy)
print("Decision Tree: ", dt_accuracy)

Model Accuracy:
Logistic Regression:  0.8755868544600939
Random Forest:  0.9248826291079812
Decision Tree:  0.9014084507042254
