### Load and Process Data

**Preprocess CSV**

In [1]:
from scripts.data_preprocessing import *

X,y = preprocess_data('../data/Maternal Health Risk Data Set.csv',drop_dup=False)

**Split into train and test sets**

In [2]:
from sklearn.model_selection import train_test_split

X_train, X_test, y_train, y_test = train_test_split(X,y, test_size = 0.2,random_state=1)
print(f"Dataset has {X_train.shape[0]} training points and {X_test.shape[0]} testing points\n")

Dataset has 809 training points and 203 testing points



### Train Classifiers

Training 3 different classifiers: XGBoost, SVM, and random forest. 

**Define Classifiers**

In [3]:
from sklearn.ensemble import RandomForestClassifier
from sklearn.svm import SVC
import xgboost

classifiers = []

xgb = xgboost.XGBClassifier()
classifiers.append(xgb)

svm = SVC()
classifiers.append(svm)

rf = RandomForestClassifier(random_state=42)
classifiers.append(rf)

In [4]:
# Define parameters for cross-fold

xgb_params = {
    'max_depth': [None, 10, 20, 30],
    'eta': [.001, .01, .1, 1]
}

svm_params = {
        'C': [0.01, 0.1, 1, 10, 100],
        'gamma': [1, 0.1, 0.01, 0.001, 0.0001],
        'kernel': ['linear', 'rbf']
    }

rf_params = {
    'n_estimators': [100, 200, 300],  
    'max_depth': [None, 10, 20, 30],  
    'min_samples_split': [2, 5, 10],  
    'min_samples_leaf': [1, 2, 4]  
}

params = [xgb_params, svm_params, rf_params]

**Train on the classifiers**

In [5]:
from scripts.train_model import *

grids = grid_search(classifiers,params,X_train,y_train,X_test,y_test)

Tuned XGBClassifier Accuracy: 91.47%
Best Hyperparameters: {'eta': 1, 'max_depth': None}
              precision    recall  f1-score   support

         0.0       0.91      0.87      0.89        79
         1.0       0.83      0.87      0.85        60
         2.0       0.95      0.95      0.95        64

    accuracy                           0.90       203
   macro avg       0.90      0.90      0.90       203
weighted avg       0.90      0.90      0.90       203

Tuned SVC Accuracy: 90.61%
Best Hyperparameters: {'C': 100, 'gamma': 1, 'kernel': 'rbf'}
              precision    recall  f1-score   support

         0.0       0.80      0.84      0.81        79
         1.0       0.82      0.85      0.84        60
         2.0       0.91      0.83      0.87        64

    accuracy                           0.84       203
   macro avg       0.84      0.84      0.84       203
weighted avg       0.84      0.84      0.84       203

Tuned RandomForestClassifier Accuracy: 91.47%
Best Hyperpara

### Save Best Model

In [6]:
best_model = grids[0]['grid']

In [7]:
import pickle

with open('../models/best_model.pkl','wb') as f:
    pickle.dump(best_model,f)