# ML Challenge (Optional)

Train, test, optimize, and analyze the performance of a classification model using a methodology of your choice for the randomly generated moons dataset.

You are not being evaluated for the performance of your model. Instead, we are interested in whether you can implement a simple but rigorous ML workflow.

Show all of your work in this notebook.

In [1]:
# you are free to use any package you deem fit
from sklearn.datasets import make_moons
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler
from sklearn.svm import SVC
from sklearn.pipeline import make_pipeline
from sklearn.model_selection import GridSearchCV
from sklearn.metrics import classification_report, confusion_matrix


## Dataset

In [2]:
#dataset generation
X, Y = make_moons(random_state=42, n_samples=(50, 450), noise=0.25)

## Training

In [3]:
# Split the dataset into a training set and a test set
X_train, X_test, Y_train, Y_test = train_test_split(X, Y, test_size=0.3, random_state=42)

# Create a pipeline that will scale the data and then apply SVC (Support Vector Classification)
pipeline = make_pipeline(StandardScaler(), SVC(random_state=42))

## Testing / Optimization

In [4]:
# Define a grid of parameters over which to optimize the SVC
param_grid = {
    'svc__C': [0.1, 1, 10],
    'svc__gamma': [1, 0.1, 0.01],
    'svc__kernel': ['rbf', 'linear']
}
# Use grid search to find the best parameters by cross-validation
grid_search = GridSearchCV(pipeline, param_grid, cv=5)

# Fit the model on the training data
grid_search.fit(X_train, Y_train)

## Performance Analysis

In [5]:
# Make predictions on the test data
Y_pred = grid_search.predict(X_test)

# Evaluate the model's performance
clf_report = classification_report(Y_test, Y_pred)
conf_matrix = confusion_matrix(Y_test, Y_pred)

# Extract the best model parameters
best_params = grid_search.best_params_

# Output the classification report, the confusion matrix, and the best parameters
print("Classification Report:")
print(clf_report)
print("Confusion Matrix:")
print(conf_matrix)
print("Best Parameters:")
print(best_params)

Classification Report:
              precision    recall  f1-score   support

           0       0.85      0.73      0.79        15
           1       0.97      0.99      0.98       135

    accuracy                           0.96       150
   macro avg       0.91      0.86      0.88       150
weighted avg       0.96      0.96      0.96       150

Confusion Matrix:
[[ 11   4]
 [  2 133]]
Best Parameters:
{'svc__C': 1, 'svc__gamma': 1, 'svc__kernel': 'rbf'}


<h2>Summary<h2>


Upon analysis, the Support Vector Machine (SVM) model trained on the moons dataset determined an effective decision boundary between the two classes. The optimal settings for the SVM, as identified by the grid search, were C=1 and gamma=1 using the radial basis function (rbf) kernel. The model concluded with a high precision (0.97) for the majority class (class 1), indicating a strong ability to label negative class instances correctly. The recall (0.99) for the same class suggests that the model is also highly capable of identifying all positive instances. However, for the minority class (class 0), the precision (0.85) and recall (0.73) were lower, indicating a slightly less robust performance. The overall accuracy of the model stood at 96%, which signifies that the model was highly effective in classifying the dataset as a whole. The conclusion drawn from the model's evaluation is that while it performs well in general, especially for the majority class, there may be opportunities to enhance its ability to classify the minority class more accurately, perhaps through more nuanced feature selection or advanced model tuning.