## Machine Learning II Homework

In this homework, we're going to compare the results of an Support Vector Machine (SVN) classifier with those of a Gaussian Naive Bayes (GNB) classifier on the same data.

### First, we'll make the data set.

In [None]:
from sklearn.datasets import make_blobs

In [None]:
this_seed = 42 # Set the seed so we can play with it
# Create two 2D blobs of data
X, y = make_blobs(n_samples=300, centers=2, 
                  random_state=this_seed, cluster_std=3.14)

Now, let's take a look at it to see what we're dealing with.

In [None]:
from matplotlib import pyplot as plt

In [None]:
# Plot the blobs of data
plt.scatter(X[:, 0], X[:, 1], c=y, cmap='viridis')
plt.title("Two 2D Blobs of Data")
plt.show()

Later, we'll re-run everything using different seeds. Make note of this scatter plot each time, and make a guess as how well the classifiers will do.

---

### Split the data

First, split the data into training and test subsets. Make it a 70/30 training/test split, and set the `random_state` to 42. We'll use the exact same split for both classifiers to make the comparison fair.

In [None]:
from sklearn.model_selection import train_test_split

In [None]:
# Split the data into training and test sets (70% training, 30% testing)
X_train, X_test, y_train, y_test = train_test_split(X, y, 
                                                    test_size=0.3, 
                                                    random_state=42)

___

### Classify the data using an SVM classifier

In [None]:
from sklearn.svm import SVC

Make and fit the model:

In [None]:
# Create a Support Vector Machine (SVM) classifier and train it on the training data
svm = SVC(kernel='linear')   # make a linear SVM
svm.fit(X_train, y_train)    # train the model

Predict the test data:

In [None]:
y_pred = svm.predict(X_test) # Make predictions on the test data

---

### Classify the data using an GNB classifier

In [None]:
from sklearn.naive_bayes import GaussianNB

Make and fit the model:

In [None]:
gnb = GaussianNB() # Create a Naive Bayes classifier
gnb.fit(X_train, y_train) # and train it on the PCA-transformed training data

Predict the test data:

In [None]:
y_pred = gnb.predict(X_test) # Make predictions on the PCA-transformed testing data

---

### Look at the confusion matrixes for the two classifiers

In [None]:
from sklearn.metrics import confusion_matrix

Compute and print the SVM confusion matrix:

In [None]:
# Compute and print the svm confusion matrix
conf_matrix_svm = confusion_matrix(y_test, y_pred)
print(conf_matrix_svm)

Compute and print the GNB confusion matrix:

In [None]:
# Compute and print the gnb confusion matrix
conf_matrix_gnb = confusion_matrix(y_test, y_pred)
print(conf_matrix_gnb)

---

### Let's play

Re-run the above for several different seeds (including 11). *No need to print them or write them down*, just get a feel for what's going on. In the cell below briefly describe how the two classifiers compare across different data sets.

How do you expect the classifiers to perform with much larger and smaller blob sizes (larger `cluster_std`)? 

Print out an example confusion matrix from a large and small blob size (from either classifier) below.