# ML Challenge (Optional)

Train, test, optimize, and analyze the performance of a classification model using a methodology of your choice for the randomly generated moons dataset.

You are not being evaluated for the performance of your model. Instead, we are interested in whether you can implement a simple but rigorous ML workflow.

Show all of your work in this notebook.

In [1]:
# you are free to use any package you deem fit

## Dataset

In [2]:
# DO NOT MODIFY
from sklearn.datasets import make_moons

X, Y = make_moons(random_state=42, n_samples=(50, 450), noise=0.25)


## Training

I choose Support Vector Machines (SVM) as the classification model.

In [3]:
from sklearn.model_selection import train_test_split
from sklearn.svm import SVC

X_train, X_test, Y_train, Y_test = train_test_split(X, Y, test_size=0.2, random_state=42)
svm_classifier = SVC(kernel='linear', random_state=42)
svm_classifier.fit(X_train, Y_train)

SVC(kernel='linear', random_state=42)

In [7]:
from sklearn.metrics import accuracy_score, confusion_matrix, classification_report

# Predict on the test data
Y_pred = svm_classifier.predict(X_test)

# Calculate accuracy
accuracy = accuracy_score(Y_test, Y_pred)
print("Accuracy:", accuracy)

# Generate a confusion matrix
confusion = confusion_matrix(Y_test, Y_pred)
print("Confusion Matrix:\n", confusion)

# Generate a classification report
report = classification_report(Y_test, Y_pred)
print("Classification Report:\n", report)


Accuracy: 0.96
Confusion Matrix:
 [[ 2  2]
 [ 2 94]]
Classification Report:
               precision    recall  f1-score   support

           0       0.50      0.50      0.50         4
           1       0.98      0.98      0.98        96

    accuracy                           0.96       100
   macro avg       0.74      0.74      0.74       100
weighted avg       0.96      0.96      0.96       100



## Testing / Optimization

In [5]:
from sklearn.model_selection import GridSearchCV

# Define the hyperparameter grid to search
param_grid = {
    'C': [0.1, 1, 10],
    'kernel': ['linear', 'rbf', 'poly', 'sigmoid']
}

# Initialize GridSearchCV
grid_search = GridSearchCV(estimator=svm_classifier, param_grid=param_grid, cv=5)
grid_search.fit(X_train, Y_train)

# Get the best hyperparameters
best_params = grid_search.best_params_
print("Best Hyperparameters:", best_params)

# Get the best model
best_svm_classifier = grid_search.best_estimator_


Best Hyperparameters: {'C': 10, 'kernel': 'rbf'}


## Performance Analysis

In [6]:
# Predict with the best model
Y_pred_best = best_svm_classifier.predict(X_test)

# Calculate accuracy with the best model
accuracy_best = accuracy_score(Y_test, Y_pred_best)
print("Best Model Accuracy:", accuracy_best)

# Generate a confusion matrix with the best model
confusion_best = confusion_matrix(Y_test, Y_pred_best)
print("Best Model Confusion Matrix:\n", confusion_best)

# Generate a classification report with the best model
report_best = classification_report(Y_test, Y_pred_best)
print("Best Model Classification Report:\n", report_best)


Best Model Accuracy: 0.97
Best Model Confusion Matrix:
 [[ 2  2]
 [ 1 95]]
Best Model Classification Report:
               precision    recall  f1-score   support

           0       0.67      0.50      0.57         4
           1       0.98      0.99      0.98        96

    accuracy                           0.97       100
   macro avg       0.82      0.74      0.78       100
weighted avg       0.97      0.97      0.97       100

