<hr>

## TABLE OF CONTENTS

1. [Packages](##1.-PACKAGES)

2. [Datasets](##2.-DATASETS)

3. [Analysis & Visualization](##3.-ANALYSIS-&-VISUALIZATION)

4. [Data Preparation & Preprocessing](##4.-DATA-PREPARATION-&-PREPROCESSING)

5. [Build & Evaluate ML Models](##5.-BUILD-&-EVALUATE-ML-MODELS)

6. [Model Comparison](##6.-MODEL-COMPARISON)

7. [Conclusion](##7.-CONCLUSION)

<hr>

## 1. PACKAGES

In [1]:
# Essential libraries:

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns

In [2]:
# Machine learning libraries:

# model selection:
from sklearn.model_selection import train_test_split
from sklearn.model_selection import GridSearchCV
from sklearn.model_selection import RandomizedSearchCV

# preprocessing:
from sklearn.preprocessing import StandardScaler
from sklearn.preprocessing import MinMaxScaler
from sklearn.preprocessing import RobustScaler
from sklearn.preprocessing import OneHotEncoder
from sklearn.preprocessing import LabelEncoder
from sklearn.preprocessing import OrdinalEncoder

# models:
from sklearn.tree import DecisionTreeClassifier
from sklearn.naive_bayes import GaussianNB
# metrics:
from sklearn.metrics import accuracy_score, precision_score, recall_score, f1_score
from sklearn.metrics import roc_auc_score, roc_curve, mean_squared_error
from sklearn.metrics import classification_report, confusion_matrix, ConfusionMatrixDisplay

<hr>

## 2. DATASETS

In [4]:
df_2c = pd.read_csv('datasets/orthopedic_2C.csv')
df_3c = pd.read_csv('datasets/orthopedic_3C.csv')

In [6]:
print(df_2c.columns)
print(df_3c.columns)

Index(['pelvic_incidence', 'pelvic_tilt numeric', 'lumbar_lordosis_angle',
       'sacral_slope', 'pelvic_radius', 'degree_spondylolisthesis', 'class'],
      dtype='object')
Index(['pelvic_incidence', 'pelvic_tilt', 'lumbar_lordosis_angle',
       'sacral_slope', 'pelvic_radius', 'degree_spondylolisthesis', 'class'],
      dtype='object')


In [7]:
df_2c.rename(columns={'pelvic_tilt numeric': 'pelvic_tilt', 'class': 'class_2C'}, inplace=True)

In [8]:
df_2c.head()

Unnamed: 0,pelvic_incidence,pelvic_tilt,lumbar_lordosis_angle,sacral_slope,pelvic_radius,degree_spondylolisthesis,class_2C
0,63.027817,22.552586,39.609117,40.475232,98.672917,-0.2544,Abnormal
1,39.056951,10.060991,25.015378,28.99596,114.405425,4.564259,Abnormal
2,68.832021,22.218482,50.092194,46.613539,105.985135,-3.530317,Abnormal
3,69.297008,24.652878,44.311238,44.64413,101.868495,11.211523,Abnormal
4,49.712859,9.652075,28.317406,40.060784,108.168725,7.918501,Abnormal


In [9]:
df_3c.head()

Unnamed: 0,pelvic_incidence,pelvic_tilt,lumbar_lordosis_angle,sacral_slope,pelvic_radius,degree_spondylolisthesis,class
0,63.027817,22.552586,39.609117,40.475232,98.672917,-0.2544,Hernia
1,39.056951,10.060991,25.015378,28.99596,114.405425,4.564259,Hernia
2,68.832021,22.218482,50.092194,46.613539,105.985135,-3.530317,Hernia
3,69.297008,24.652878,44.311238,44.64413,101.868495,11.211523,Hernia
4,49.712859,9.652075,28.317406,40.060784,108.168725,7.918501,Hernia


In [10]:
# Compare the first 6 columns of df_2c and df_3c
columns_2c = set(df_2c.columns[:6])
columns_3c = set(df_3c.columns[:6])

# Check if the first 6 columns are the same
if columns_2c == columns_3c:
    print("The first 6 features (columns) in both DataFrames are the same.")
else:
    print("The first 6 features (columns) in the DataFrames are different.")
    print("\nColumns in df_2c but not in df_3c:", columns_2c - columns_3c)
    print("\nColumns in df_3c but not in df_2c:", columns_3c - columns_2c)

The first 6 features (columns) in both DataFrames are the same.


In [None]:
# Create a new DataFrame with the first 6 columns of df_2c and the last column of df_3c
df = df_2c.copy()
df['class_3C'] = df_3c.iloc[:, -1]

In [12]:
df.head()

Unnamed: 0,pelvic_incidence,pelvic_tilt,lumbar_lordosis_angle,sacral_slope,pelvic_radius,degree_spondylolisthesis,class_2C,class_3C
0,63.027817,22.552586,39.609117,40.475232,98.672917,-0.2544,Abnormal,Hernia
1,39.056951,10.060991,25.015378,28.99596,114.405425,4.564259,Abnormal,Hernia
2,68.832021,22.218482,50.092194,46.613539,105.985135,-3.530317,Abnormal,Hernia
3,69.297008,24.652878,44.311238,44.64413,101.868495,11.211523,Abnormal,Hernia
4,49.712859,9.652075,28.317406,40.060784,108.168725,7.918501,Abnormal,Hernia


In [13]:
df.shape

(310, 8)

In [15]:
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 310 entries, 0 to 309
Data columns (total 8 columns):
 #   Column                    Non-Null Count  Dtype  
---  ------                    --------------  -----  
 0   pelvic_incidence          310 non-null    float64
 1   pelvic_tilt               310 non-null    float64
 2   lumbar_lordosis_angle     310 non-null    float64
 3   sacral_slope              310 non-null    float64
 4   pelvic_radius             310 non-null    float64
 5   degree_spondylolisthesis  310 non-null    float64
 6   class_2C                  310 non-null    object 
 7   class_3C                  310 non-null    object 
dtypes: float64(6), object(2)
memory usage: 19.5+ KB


<hr>

## 3. ANALYSIS & VISUALIZATION

<hr>

## 4. DATA PREPARATION & PREPROCESSING

<hr>

## 5. BUILD & EVALUATE ML MODELS

### DecisionTree Model

### Gaussian Naive Bayes Model

<hr>

## 6. MODEL COMPARISON

<hr>

## 7. CONCLUSION