## Introduction

// making great wine: Need great grapes for great wine. Need good data for good machine learning.

### Applications
1. Google: Search engine, speech recognition, language translation, self-driving cars
2. Physics: particles datasets
3. Choosing clothes, movies

## Supervised Classification
Defn: Have examples where you know the correct answer for the examples. 
e.g. Self-driving cars emulate human behaviour.

Application: Terrain classification problem. Desert terrain: slowing down at the appropriate time. Thousands of miles.


**Example of supervised classification:**
<table>
<th>Supervised classification</th><th>Not supervised classification</th>
<tr><td>Recognise someone in a picture from an album of tagged photos</td><td>Analyse bank data for weird-looking transactions and flag those for fraud</td></tr>
<tr><td>Given someone's music choices and a bunch of features of that music, recommend a new song (Netflix, Amazon) </td><td>Cluster uDacity students into types based on learning styles</td></tr>
</table>

Explanation:
* 2 Haven't defined what a weird-looking transaction is.
* 4 Don't know what the types of students are or even number of types

**Features** and **Labels**
* e.g. Music: Tempo and intensity of a song. 
* e.g. Stanley Terrain classification. Features: Steepness and ruggedness of terrain.

Visualise features via **Scatterplots**

Machine learning algorithms define a **Decision Surface** that separates different classes.
One one side they predict class A, and on the other side they predict class B.
* Linear decision surfaces
Bad ones may misclassify existing data, may come very close to misclassifying existing data.

< Decision surface diagram >

Machine learning algorithms transform:
Data -> Decision surface (DS) that for all future cases can enable you to make a determination of which class the datapoint is in.



### Using sk-learn Naive Bayes to draw Decision Surfaces

Diagrams

In [2]:
# sklearn.naive_bayes.GaussianNB

# Creating training points
import numpy as np
X = np.array([[-1, -1], [-2, -1], [-3, -2], [1, 1], [2, 1], [3, 2]])
Y = np.array([1, 1, 1, 2, 2, 2])
from sklearn.naive_bayes import GaussianNB
# Create classifier
clf = GaussianNB()
# Give classifier training data and it learns patterns
# X are features, Y are labels.
clf.fit(X, Y)
GaussianNB()
# Give classifier a new point and ask it to give a prediction
print(clf.predict([[-0.8, -1]]))

[1]


In [None]:
# Second half of example in documentation

clf_pf = GaussianNB()
clf_pf.partial_fit(X, Y, np.unique(Y))
GaussianNB()
print(clf_pf.predict([[-0.8, -1]]))

### Exercise: GaussianNB Deployment on 

In [None]:
# Visualisation

#!/usr/bin/python

#from udacityplots import *
import warnings
warnings.filterwarnings("ignore")

import matplotlib 
matplotlib.use('agg')

import matplotlib.pyplot as plt
import pylab as pl
import numpy as np

#import numpy as np
#import matplotlib.pyplot as plt
#plt.ioff()

def prettyPicture(clf, X_test, y_test):
    x_min = 0.0; x_max = 1.0
    y_min = 0.0; y_max = 1.0

    # Plot the decision boundary. For that, we will assign a color to each
    # point in the mesh [x_min, m_max]x[y_min, y_max].
    h = .01  # step size in the mesh
    xx, yy = np.meshgrid(np.arange(x_min, x_max, h), np.arange(y_min, y_max, h))
    Z = clf.predict(np.c_[xx.ravel(), yy.ravel()])

    # Put the result into a color plot
    Z = Z.reshape(xx.shape)
    plt.xlim(xx.min(), xx.max())
    plt.ylim(yy.min(), yy.max())

    plt.pcolormesh(xx, yy, Z, cmap=pl.cm.seismic)

    # Plot also the test points
    grade_sig = [X_test[ii][0] for ii in range(0, len(X_test)) if y_test[ii]==0]
    bumpy_sig = [X_test[ii][1] for ii in range(0, len(X_test)) if y_test[ii]==0]
    grade_bkg = [X_test[ii][0] for ii in range(0, len(X_test)) if y_test[ii]==1]
    bumpy_bkg = [X_test[ii][1] for ii in range(0, len(X_test)) if y_test[ii]==1]

    plt.scatter(grade_sig, bumpy_sig, color = "b", label="fast")
    plt.scatter(grade_bkg, bumpy_bkg, color = "r", label="slow")
    plt.legend()
    plt.xlabel("bumpiness")
    plt.ylabel("grade")

    plt.savefig("test.png")
    
import base64
import json
import subprocess

def output_image(name, format, bytes):
    image_start = "BEGIN_IMAGE_f9825uweof8jw9fj4r8"
    image_end = "END_IMAGE_0238jfw08fjsiufhw8frs"
    data = {}
    data['name'] = name
    data['format'] = format
    data['bytes'] = base64.encodestring(bytes)
    print image_start+json.dumps(data)+image_end

In [None]:
# Prep terrain data

#!/usr/bin/python
import random


def makeTerrainData(n_points=1000):
###############################################################################
### make the toy dataset
    random.seed(42)
    grade = [random.random() for ii in range(0,n_points)]
    bumpy = [random.random() for ii in range(0,n_points)]
    error = [random.random() for ii in range(0,n_points)]
    y = [round(grade[ii]*bumpy[ii]+0.3+0.1*error[ii]) for ii in range(0,n_points)]
    for ii in range(0, len(y)):
        if grade[ii]>0.8 or bumpy[ii]>0.8:
            y[ii] = 1.0

### split into train/test sets
    X = [[gg, ss] for gg, ss in zip(grade, bumpy)]
    split = int(0.75*n_points)
    X_train = X[0:split]
    X_test  = X[split:]
    y_train = y[0:split]
    y_test  = y[split:]

    grade_sig = [X_train[ii][0] for ii in range(0, len(X_train)) if y_train[ii]==0]
    bumpy_sig = [X_train[ii][1] for ii in range(0, len(X_train)) if y_train[ii]==0]
    grade_bkg = [X_train[ii][0] for ii in range(0, len(X_train)) if y_train[ii]==1]
    bumpy_bkg = [X_train[ii][1] for ii in range(0, len(X_train)) if y_train[ii]==1]

#    training_data = {"fast":{"grade":grade_sig, "bumpiness":bumpy_sig}
#            , "slow":{"grade":grade_bkg, "bumpiness":bumpy_bkg}}


    grade_sig = [X_test[ii][0] for ii in range(0, len(X_test)) if y_test[ii]==0]
    bumpy_sig = [X_test[ii][1] for ii in range(0, len(X_test)) if y_test[ii]==0]
    grade_bkg = [X_test[ii][0] for ii in range(0, len(X_test)) if y_test[ii]==1]
    bumpy_bkg = [X_test[ii][1] for ii in range(0, len(X_test)) if y_test[ii]==1]

    test_data = {"fast":{"grade":grade_sig, "bumpiness":bumpy_sig}
            , "slow":{"grade":grade_bkg, "bumpiness":bumpy_bkg}}

    return X_train, y_train, X_test, y_test
#    return training_data, test_data

In [None]:
# Main

#!/usr/bin/python

""" Complete the code in ClassifyNB.py with the sklearn
    Naive Bayes classifier to classify the terrain data.
    
    The objective of this exercise is to recreate the decision 
    boundary found in the lesson video, and make a plot that
    visually shows the decision boundary """


from prep_terrain_data import makeTerrainData
from class_vis import prettyPicture, output_image
from ClassifyNB import classify

import numpy as np
import pylab as pl


features_train, labels_train, features_test, labels_test = makeTerrainData()

### the training data (features_train, labels_train) have both "fast" and "slow" points mixed
### in together--separate them so we can give them different colors in the scatterplot,
### and visually identify them
grade_fast = [features_train[ii][0] for ii in range(0, len(features_train)) if labels_train[ii]==0]
bumpy_fast = [features_train[ii][1] for ii in range(0, len(features_train)) if labels_train[ii]==0]
grade_slow = [features_train[ii][0] for ii in range(0, len(features_train)) if labels_train[ii]==1]
bumpy_slow = [features_train[ii][1] for ii in range(0, len(features_train)) if labels_train[ii]==1]


def classify(features_train, labels_train):   
    ### import the sklearn module for GaussianNB
    ### create classifier
    ### fit the classifier on the training features and labels
    ### return the fit classifier
    
        
    ### your code goes here!
    
    from sklearn.naive_bayes import GaussianNB
    clf = GaussianNB()
    clf.fit(features_train, labels_train)
    return clf


# You will need to complete this function imported from the ClassifyNB script.
# Be sure to change to that code tab to complete this quiz.
clf = classify(features_train, labels_train)



### draw the decision boundary with the text points overlaid
prettyPicture(clf, features_test, labels_test)
output_image("test.png", "png", open("test.png", "rb").read())






### Evaluate classifier: Calculating NB Accuracy

Metric: Accuracy = number of points classified correctly / total number of points in the test set

In [None]:
# studentCode.py

from class_vis import prettyPicture
from prep_terrain_data import makeTerrainData
from classify import NBAccuracy

import matplotlib.pyplot as plt
import numpy as np
import pylab as pl


features_train, labels_train, features_test, labels_test = makeTerrainData()

def submitAccuracy():
    accuracy = NBAccuracy(features_train, labels_train, features_test, labels_test)
    return accuracy

In [None]:
#Exercise

def NBAccuracy(features_train, labels_train, features_test, labels_test):
    """ compute the accuracy of your Naive Bayes classifier """
    ### import the sklearn module for GaussianNB
    from sklearn.naive_bayes import GaussianNB

    ### create classifier
    clf = GaussianNB()

    ### fit the classifier on the training features and labels
    clf.fit(features_train, labels_train)

    ### use the trained classifier to predict labels for the test features
    pred = clf.predict(features_test)


    ### calculate and return the accuracy on the test data
    ### this is slightly different than the example, 
    ### where we just print the accuracy
    ### you might need to import an sklearn module
    accuracy = clf.score(features_test, labels_test)
    return accuracy

# 0.884

### Training and Testing Data

Train and test on different sets of data. e.g. 10% of data for testing.
Else can get 100% accuracy with no idea of how to generalise something.

## Bayes Rule

Cancer example P(C) = 0.01

**Sensitivity of a Test (Size?)**: 90% it is positive if you have C.
**Specitivity of a Test (Power?)** 90% it is negative if you don't have C.

Q: the prior probability of cancer is 1%, and a sensitivity and specificity are 90%, what's the probability that someone with a positive cancer test actually has the disease?

(8 %)

**Prior probability * Test Evidence (incorporates into former) -> Posterior Probability**

### Exploration

**Prior**

$P(C) = 0.01$

$P(Pos|C) = 0.9$

$P(Neg|\neg C) = 0.9$

**Joint probability**

$P(C,Pos) = P(C) * P(Pos|C)$

$P(\neg C,Pos) = P(\neg C) * P(Pos|\neg C)$

i.e. calculating absolute area in regions. But they don't add up to one.

**Normaliser**

$P(Pos) = P(C, Pos) + P(\neg C, Pos)$

**Posterior**

$P(C|Pos) = 0.0833$

$P(\neg C|Pos) = 0.9167$

Posteriors sum to 1.

(Diagram)

### Naive Bayes e.g. Text Learning

Q: Given email text and probabilities with which Chris and Sara use each word (total 3 words can be used), who is more likely to have sent the email?

Suppose prior $P(Chris) = P(Sara) = 0.5$

Diagram

Then given text: "LIFE DEAL"

Chris: $0.1*0.8*0.5 = 0.04$

Sara: $0.3*0.2*.0.5 = 0.03$

P(Chris | "LIFE DEAL") = 4/7

P(Sara | "LIFE DEAL") = 3/7

**Why is Naive Bayes Naive?**

* Labels are hidden. You only get to see things these things do. 
* Every word, say, gives you evidence as to whether it's person A or person B. 
* You multiply evidence for all the words you see and the prior, and the products give you the ratio as to whether it's from A or B.

Let's you identify from a text source whether label A or label B is more likely.

Naive because it ignores **word order**. Only looks at word frequency.

**Advantages**
* Easy to implement
* Efficient

"One particular feature of Naive Bayes is that it’s a good algorithm for working with text classification. When dealing with text, it’s very common to treat each unique word as a feature, and since the typical person’s vocabulary is many thousands of words, this makes for a large number of features. The relative simplicity of the algorithm and the independent features assumption of Naive Bayes make it a strong performer for classifying texts."

**Disadvantages**
* Can break. 
* Phrases that encompass different words and have distinct meanings don't work well: when people first searched for Chicago Bulls on Google, they were shown many images of the city Chicago and images of bulls.

## Observations from Mini-Project: Identifying author of a piece of text

Naive Bayes predicts faster than it trains.