# Predicting Red Wine Quality with a Support Vector Machine

## Wine Data
Data from http://archive.ics.uci.edu/ml/datasets/Wine+Quality

### Citations
<pre>
Dua, D. and Karra Taniskidou, E. (2017). 
UCI Machine Learning Repository [http://archive.ics.uci.edu/ml/index.php]. 
Irvine, CA: University of California, School of Information and Computer Science.
</pre>

<pre>
P. Cortez, A. Cerdeira, F. Almeida, T. Matos and J. Reis. 
Modeling wine preferences by data mining from physicochemical properties.
In Decision Support Systems, Elsevier, 47(4):547-553. ISSN: 0167-9236.
</pre>

Available at:
- [@Elsevier](http://dx.doi.org/10.1016/j.dss.2009.05.016)
- [Pre-press (pdf)](http://www3.dsi.uminho.pt/pcortez/winequality09.pdf)
- [bib](http://www3.dsi.uminho.pt/pcortez/dss09.bib)

## Setup

In [None]:
%matplotlib inline

import matplotlib.pyplot as plt
import numpy as np
import pandas as pd

red_wine = pd.read_csv('../../lab_09/data/winequality-red.csv')

## EDA

In [None]:
red_wine.head()

In [None]:
red_wine.describe()

In [None]:
red_wine.info()

In [None]:
def plot_quality_scores(df, kind):
    ax = df.quality.value_counts().sort_index().plot.barh(
        title=f'{kind.title()} Wine Quality Scores', figsize=(12, 3)
    )
    ax.axes.invert_yaxis()
    for bar in ax.patches:
        ax.text(
            bar.get_width(), 
            bar.get_y() + bar.get_height()/2, 
            f'{bar.get_width()/df.shape[0]:.1%}',
            verticalalignment='center'
        )
    plt.xlabel('count of wines')
    plt.ylabel('quality score')

    for spine in ['top', 'right']:
        ax.spines[spine].set_visible(False)

    return ax

plot_quality_scores(red_wine, 'red')

## Making the `high_quality` column

In [None]:
red_wine['high_quality'] = pd.cut(red_wine.quality, bins=[0, 6, 10], labels=[0, 1])
red_wine.high_quality.value_counts(normalize=True)

## Building your first Support Vector Machine

In [None]:
from sklearn.model_selection import train_test_split
from sklearn.pipeline import Pipeline
from sklearn.preprocessing import StandardScaler
from sklearn.svm import SVC

y = red_wine.pop('high_quality')
X = red_wine.drop(columns=['quality'])

X_train, X_test, y_train, y_test = train_test_split(
    X, y, test_size=0.1, random_state=0, stratify=y
)

pipeline = Pipeline([
    ('scale', StandardScaler()), 
    ('svm', SVC(C=5, random_state=0, probability=True))
]).fit(X_train, y_train)

### Evaluating the SVM
Get the predictions:

In [None]:
quality_preds = pipeline.predict(X_test)

Look at the classification report:

In [None]:
from sklearn.metrics import classification_report
print(classification_report(y_test, quality_preds))

Review the confusion matrix:

In [None]:
from ml_utils.classification import confusion_matrix_visual

confusion_matrix_visual(y_test, quality_preds, ['low', 'high'])

Examine the precision-recall curve:

In [None]:
from ml_utils.classification import plot_pr_curve

plot_pr_curve(y_test, pipeline.predict_proba(X_test)[:,1])

<hr>
<div>
    <a href="./exercise_4.ipynb">
        <button>&#8592; Previous Solution</button>
    </a>
    <a href="../../lab_10/red_wine.ipynb">
        <button style="float: right;">Lab 10 &#8594;</button>
    </a>
</div>
<hr>