# Winery classification with the multivariate Gaussian

In this notebook, we return to winery classification, using the full set of 13 features.

## 1. Load in the data 

As usual, we start by loading in the Wine data set. Make sure the file `wine.data.txt` is in the same directory as this notebook.

Recall that there are 178 data points, each with 13 features and a label (1,2,3). As before, we will divide this into a training set of 130 points and a test set of 48 points.

In [55]:
# Standard includes
%matplotlib inline
import numpy as np
import matplotlib.pyplot as plt
# Useful module for dealing with the Gaussian density
from scipy.stats import norm, multivariate_normal

In [56]:
# Load data set.
data = np.loadtxt('wine.data.txt', delimiter=',')
# Names of features
featurenames = ['Alcohol', 'Malic acid', 'Ash', 'Alcalinity of ash','Magnesium', 'Total phenols',
                'Flavanoids', 'Nonflavanoid phenols', 'Proanthocyanins', 'Color intensity', 'Hue',
                'OD280/OD315 of diluted wines', 'Proline']
# Split 178 instances into training set (trainx, trainy) of size 130 and test set (testx, testy) of size 48
np.random.seed(0)
perm = np.random.permutation(178)
trainx = data[perm[0:130],1:14]
trainy = data[perm[0:130],0]
testx = data[perm[130:178], 1:14]
testy = data[perm[130:178],0]

## 2. Fit a Gaussian generative model

We now define a function that fits a Gaussian generative model to the data.
For each class (`j=1,2,3`), we have:
* `pi[j]`: the class weight
* `mu[j,:]`: the mean, a 13-dimensional vector
* `sigma[j,:,:]`: the 13x13 covariance matrix

This means that `pi` is a 4x1 array (Python arrays are indexed starting at zero, and we aren't using `j=0`), `mu` is a 4x13 array and `sigma` is a 4x13x13 array.

In [57]:
def fit_generative_model(x,y):
    k = 3  # labels 1,2,...,k
    d = (x.shape)[1]  # number of features
    mu = np.zeros((k+1,d))
    sigma = np.zeros((k+1,d,d))
    pi = np.zeros(k+1)
    for label in range(1,k+1):
        indices = (y == label)
        mu[label] = np.mean(x[indices,:], axis=0)
        sigma[label] = np.cov(x[indices,:], rowvar=0, bias=1)
        pi[label] = float(sum(indices))/float(len(y))
    return mu, sigma, pi

In [58]:
# Fit a Gaussian generative model to the training data
mu, sigma, pi = fit_generative_model(trainx,trainy)

## 3. Use the model to make predictions on the test set

<font color="magenta">**For you to do**</font>: Define a general purpose testing routine that takes as input:
* the arrays `pi`, `mu`, `sigma` defining the generative model, as above
* the test set (points `tx` and labels `ty`)
* a list of features `features` (chosen from 0-12)

It should return the number of mistakes made by the generative model on the test data, *when restricted to the specified features*. For instance, using the just three features 2 (`'Ash'`), 4 (`'Magnesium'`) and 6 (`'Flavanoids'`) results in 7 mistakes (out of 48 test points), so

        `test_model(mu, sigma, pi, [2,4,6], testx, testy)`

should print 7/48.

**Hint:** The way you restrict attention to a subset of features is by choosing the corresponding coordinates of the full 13-dimensional mean and the appropriate submatrix of the full 13x13 covariance matrix.

In [59]:
from math import exp

# Now test the performance of a predictor based on a subset of features
def test_model(mu, sigma, pi, features, tx, ty):
    # print("Mu shape:", mu.shape)
    # print("Sigma shape:", sigma.shape)
    # print("Pi shape:", pi.shape)
    # print("X shape:", tx.shape)
    # print("Y shape:", ty.shape)
    # sigma = np.array([sigma[:, features, :][:, :, features]] * tx.shape[0])
    # mu = np.array([mu[:, features]] * tx.shape[0])
    # x = np.transpose(np.array([tx[:, features]] * mu.shape[1]), axes=[1, 0, 2])
    # print("Mu shape:", mu.shape)
    # print("Sigma shape:", sigma.shape)
    # print("Pi shape:", pi.shape)
    # print("X shape:", x.shape)
    # c = 1 / (((2 * pi) ** (len(features) / 2)) * abs(sigma) ** (1 / 2))
    # print("C shape", c.shape)
    # # p = c * exp(-(1 / 2) * np.transpose(x - mu) * np.negative(sigma) * (x - mu))
    # print(np.transpose(x - mu).shape)
    # print(np.dot(np.negative(sigma[1, 1, :, :]),(x - mu)[1, 1, :]).shape)
    # print(np.dot(np.transpose(np.negative(sigma)), (x - mu)).shape)
    # p = exp(-(1 / 2) * np.transpose(x - mu) * np.negative(sigma) * (x - mu))
    # print(p.shape)

    ### Your code goes here
    sigma = sigma[:, features, :][:, :, features]
    mu = mu[:, features]
    x = tx[:, features]
    #
    # p = np.zeros((tx.shape[0], mu.shape[0]))
    # for i in range(p.shape[0]):
    #     for j in range(p.shape[1]):
    #         if pi[j] != 0:
    #             c = 1 / (((2 * pi[j]) ** (len(features) / 2)) * np.linalg.det(sigma[j]) ** (1 / 2))
    #             exp_inside = -(1 / 2) * np.dot(
    #                 np.dot(np.transpose(x[i] - mu[j]), np.negative(sigma[j])),
    #                 (x[i] - mu[j])
    #             )
    #             pij = c * np.exp(exp_inside)
    #             p[i][j] = np.prod(pij)
    #             # p[i][j] = pij
    #
    # print(p)
    # print([np.argmax(item) for item in p])

    k = 3 # Labels 1,2,...,k
    nt = len(testy) # Number of test points
    score = np.zeros((nt,k+1))
    for i in range(0, nt):
        for label in range(1, k+1):
            score[i,label] = np.log(pi[label]) + \
            multivariate_normal.logpdf(testx[i,features], mean=mu[label,:], cov=sigma[label,:,:])
    predictions = np.argmax(score[:,1:4], axis=1) + 1
    print(score)
    errors = np.sum(predictions != testy)
    print("Errors: " + str(errors) + "/" + str(nt))# Now test the performance of a predictor based on a subset of features
    ###

In [60]:
test_model(mu, sigma, pi, [2,4,6], testx, testy)

[[  0.          -5.48257747  -7.02873026 -80.49547062]
 [  0.          -8.70140905  -5.24238654 -20.62158301]
 [  0.         -15.56066682  -6.56581995  -4.44393313]
 [  0.         -25.20590543  -7.26695912  -5.19783691]
 [  0.          -4.75340437  -7.16251889 -64.44023941]
 [  0.          -5.52127754  -5.57689091 -51.82889197]
 [  0.         -14.67819752  -6.29121236  -5.19373703]
 [  0.          -4.77453137  -6.46508409 -36.51244011]
 [  0.         -22.83101327  -9.07386441  -5.02412607]
 [  0.          -5.67219331  -5.28040598 -41.18253445]
 [  0.          -5.00053993  -6.04951992 -29.15750168]
 [  0.          -5.58443781  -6.76127548 -44.46333078]
 [  0.          -7.14604249  -5.7260823  -32.62737505]
 [  0.          -3.78967877  -6.27102392 -65.27092605]
 [  0.         -13.25742466  -6.19480797  -7.0883189 ]
 [  0.          -8.29865651  -6.66586064 -25.0943759 ]
 [  0.         -23.26062377  -9.38033499  -5.42111763]
 [  0.          -3.98377535  -5.46762131 -42.79908048]
 [  0.    

### <font color="magenta">Fast exercises</font>

*Note down the answers to these questions. You will need to enter them as part of this week's assignment.*

Exercise 1. How many errors are made on the test set when using the single feature 'Ash'?

In [61]:
test_model(mu, sigma, pi, [2], testx, testy)

[[  0.          -0.73103155  -1.29075971  -1.14761825]
 [  0.          -0.71281882  -0.71242668  -0.79974203]
 [  0.          -0.38438893  -0.83740347  -0.47955322]
 [  0.          -1.83571576  -0.77606112  -2.17137311]
 [  0.          -0.87550022  -1.39172724  -1.3764216 ]
 [  0.          -1.83571576  -0.77606112  -2.17137311]
 [  0.          -0.43649757  -0.78292091  -0.51080773]
 [  0.          -1.23785764  -1.61770454  -1.93492599]
 [  0.          -0.82462474  -1.35718094  -1.29641686]
 [  0.          -0.59623838  -0.72791067  -0.66979803]
 [  0.          -1.04444226  -1.50070885  -1.63885751]
 [  0.          -2.25625273  -2.16582811  -3.45552555]
 [  0.          -0.44446564  -1.03594161  -0.66300357]
 [  0.          -0.43649757  -0.78292091  -0.51080773]
 [  0.          -0.43649757  -0.78292091  -0.51080773]
 [  0.          -2.0593276   -2.06556965  -3.16472278]
 [  0.          -0.98540898  -1.46349119  -1.54764192]
 [  0.          -0.43649757  -0.78292091  -0.51080773]
 [  0.    

Exercise 2. How many errors when using 'Alcohol' and 'Ash'?

In [62]:
test_model(mu, sigma, pi, [0,2], testx, testy)

[[  0.          -0.94818676  -6.85570347  -2.09919089]
 [  0.          -5.00601469  -1.00386119  -1.8736436 ]
 [  0.          -2.63004071  -1.67591558  -0.9759381 ]
 [  0.          -5.3302276   -1.13460671  -2.74525036]
 [  0.          -1.07485002  -6.70874513  -2.18726371]
 [  0.          -3.23530225  -1.99253027  -2.45989487]
 [  0.          -3.19039042  -1.39475426  -1.13069034]
 [  0.          -1.42836675  -6.6927836   -2.57566739]
 [  0.          -2.85570543  -2.42106558  -1.9233009 ]
 [  0.          -5.17191079  -1.00399442  -1.91375844]
 [  0.          -1.37058252  -5.09730131  -1.99168773]
 [  0.          -2.47974152  -8.11425869  -4.14784858]
 [  0.          -6.82362226  -1.33541577  -3.27831578]
 [  0.          -0.7237938   -4.24777887  -1.14322305]
 [  0.          -1.71724904  -2.27411393  -0.79221008]
 [  0.         -10.47968785  -2.53882784  -8.00859602]
 [  0.          -4.87584704  -1.86890578  -3.12431524]
 [  0.          -0.65395121  -4.75386814  -1.32202315]
 [  0.    

Exercise 3. How many errors when using 'Alcohol', 'Ash', and 'Flavanoids'?

In [63]:
test_model(mu, sigma, pi, [0,2,6], testx, testy)

[[  0.          -1.95171332  -8.905205   -63.54444667]
 [  0.          -6.90032227  -1.661605   -13.59299935]
 [  0.         -12.40892202  -3.5604044   -1.42189175]
 [  0.         -21.28224372  -3.73660315  -2.26089357]
 [  0.          -1.56009051  -8.33844206 -55.34089146]
 [  0.          -3.18951451  -2.99605739 -41.4082533 ]
 [  0.         -11.24608308  -2.9789003   -2.56844931]
 [  0.          -1.47660969  -7.39442697 -30.97109901]
 [  0.         -19.80388938  -6.25981824  -2.34197949]
 [  0.          -5.09515733  -1.652535   -28.77618903]
 [  0.          -1.81564972  -5.67882139 -23.29999032]
 [  0.          -2.38191794  -8.88792477 -35.32602711]
 [  0.          -6.80991536  -1.90422767 -21.09032903]
 [  0.          -0.87858047  -5.85721724 -54.06959063]
 [  0.          -9.02976786  -3.48929311  -4.15282599]
 [  0.         -10.41949103  -3.21491295 -19.51426397]
 [  0.         -20.13919323  -5.91689983  -3.83036022]
 [  0.          -0.83538513  -5.59263416 -34.81928979]
 [  0.    

Exercise 4. How many errors when using all 13 features?

In [64]:
test_model(mu, sigma, pi, range(0,13), testx, testy)

[[   0.          -22.48283899  -49.09324281 -340.32246252]
 [   0.          -68.80524881  -23.26076616 -174.24560442]
 [   0.          -69.98343826  -42.4901946   -19.86703319]
 [   0.         -139.21579023  -50.5411453   -17.3721577 ]
 [   0.          -13.89338936  -31.81459714 -303.05131792]
 [   0.          -16.61674529  -28.16915707 -181.0700004 ]
 [   0.          -56.35655397  -39.04802152  -18.64046515]
 [   0.          -12.0684532   -50.14901978 -221.46896858]
 [   0.          -96.72198558  -36.92288987  -19.10884017]
 [   0.          -33.0286042   -19.20490883 -160.13598844]
 [   0.          -11.04594356  -30.41919632 -157.73354818]
 [   0.          -14.51854212  -32.06856612 -221.98708199]
 [   0.          -33.18508617  -15.17283681 -139.90886669]
 [   0.          -11.97778379  -56.24570862 -294.84274607]
 [   0.         -111.85449513  -67.03222745  -20.38729151]
 [   0.          -46.99499127  -20.50498312 -148.15176234]
 [   0.         -145.20083331  -64.42475299  -28.0832600

Exercise 5. In lecture, we got somewhat different answers to these questions. Why do you think that might be?