# Support Vector Machine Modeling

## 1. Identify the model approach(es), describe, and justify the selection

Support vector machines (SVM) are a type of supervised learning used for classification and regression. Here, SVMs will be used to classify patients as either readmission (readmitted within 30 days of being discharged) or non_readmission. SVMs try to find a hyperplane (decision boundary) that separates classes of observations in feature space. Unlike probability models, the SVM does not use probability for classification; instead, we aim for the direct caclulation of a separating hyperplane (as in notes). Moreover, SVMs are a type of *large margin* classifiers. In these types of classifiers, we try to find the best separating hyperplane that is farthest as possible from any points. In other words, we want to minimize the norm of the parameter vector by choosing a $\theta$ such that the projection of each point x onto $\theta$ is a maximum.

Support vector machines are useful in this problem because they work well in high dimensional spaces, and we have multiple variables that we want to use to predict readmission. Even though they can use many features accurately, they are also memory efficient; SVMS only use a subset of training points in the decision function (i.e., the points "closest" to the decision boundary line, because those that lie farther from the boundary are easy to classify). Many tuning parameters are available for SVMS, including different kernels and regularization terms that can account for overfitting and bias errors. I will test multiple different kernel possibilities, and then investigate the regularization term. I expect that I will need something more nuanced than a basic Gaussian or linear kernel, and I predict that the most accurate models with come from polynomial kernels.


## 2. Code, parameterize, and run model (including visualization)


We'll start with the simplest SVM model: a linear kernel, and no regularization.

In [None]:
from sklearn import svm

wine = pd.read_table("../data/wine.dat", sep='\s+')
y = grape.values
wine.columns = attributes
X = wine[['Alcohol', 'Proline']].values

svc = svm.SVC(kernel='linear')
svc.fit(X, y)

In [None]:
from matplotlib.colors import ListedColormap
# Create color maps for 3-class classification problem, as with iris
cmap_light = ListedColormap(['#FFAAAA', '#AAFFAA', '#AAAAFF'])
cmap_bold = ListedColormap(['#FF0000', '#00FF00', '#0000FF'])

def plot_estimator(estimator, X, y, ax=None):
    
    try:
        X, y = X.values, y.values
    except AttributeError:
        pass
    
    if ax is None:
        _, ax = plt.subplots()
    
    estimator.fit(X, y)
    x_min, x_max = X[:, 0].min() - .1, X[:, 0].max() + .1
    y_min, y_max = X[:, 1].min() - .1, X[:, 1].max() + .1
    xx, yy = np.meshgrid(np.linspace(x_min, x_max, 100),
                         np.linspace(y_min, y_max, 100))
    Z = estimator.predict(np.c_[xx.ravel(), yy.ravel()])

    # Put the result into a color plot
    Z = Z.reshape(xx.shape)
    ax.pcolormesh(xx, yy, Z, cmap=cmap_light)

    # Plot also the training points
    ax.scatter(X[:, 0], X[:, 1], c=y, cmap=cmap_bold)
    ax.axis('tight')
    ax.axis('off')
    plt.tight_layout()

In [None]:
%matplotlib inline
plot_estimator(svc, X, y)

## 3. Cross-validation


In [None]:
from sklearn import model_selection

X_train, X_test, y_train, y_test = model_selection.train_test_split(
        wine.values, grape.values, test_size=0.4, random_state=0)

#5 fold cross-validation, 5 way partition 
#you can see you get a really good score for some but not the others 
scores = model_selection.cross_val_score(f, wine.values, grape.values, cv=5)
scoresprint
("Accuracy: %0.2f (+/- %0.2f)" % (scores.mean(), scores.std() * 2))

## 4. Goodness of fit assessments, performance characteristics (including visualization)


In [None]:
f = svm.SVC(kernel='linear', C=1)
f.fit(X_train, y_train)
f.score(X_test, y_test)

In [None]:
#5 fold cross-validation, 5 way partition 
#you can see you get a really good score for some but not the others 
scores = model_selection.cross_val_score(f, wine.values, grape.values, cv=5)
scoresprint
("Accuracy: %0.2f (+/- %0.2f)" % (scores.mean(), scores.std() * 2))

In [None]:
from sklearn.metrics import confusion_matrix

svc_poly = svm.SVC(kernel='poly', degree=3).fit(X_train, y_train)
confusion_matrix(y_test, svc_poly.predict(X_test))

## 5. Improvements to model/tuning of parameters; model selection methods, justification of improvements/tests


### Kernel Type

### Regularization

C corresponds to the inverse of the regularization parameter. The choice of C will either help reduce bias, reduce variance, or something in the middle:
large C = low bias, high variance
small C = high bias, low variance
In an SVM, a lot of regularization means that the model will have a "soft margin" that allows some points to cross the optimal decision boundary and get misclassified. 


## 6. Comparison of models; identification of best model


## 7. Results


## 8. Implications of model and conclusions