# Learning Machine Learning

These are notes made following the Udacity course, an intro to ML.

## Lesson 2: Naive Bayes

This is a supervised classification algorithm. The idea is based on figuring out, which source something came from based on the probabilities that thing does stuff.

Think of text analysis, two people who send emails with certain word probabilities. Given an email with certain words, you can figure out the probability of it coming from each one. It's naive because it ignores word order.

One particular feature of Naive Bayes is that it’s a good algorithm for working with text classification. When dealing with text, it’s very common to treat each unique word as a feature, and since the typical person’s vocabulary is many thousands of words, this makes for a large number of features. The relative simplicity of the algorithm and the independent features assumption of Naive Bayes make it a strong performer for classifying texts.

In [8]:
import numpy as np

# These numbers are random, so this isn't a very useful classifier
features_train=np.array([[1,1],[5,8],[7,0],[1,5],[4,5],[1,2],[7,7],[3,1],[0,5]])
labels_train=np.array([1,2,1,1,2,2,1,1,1])


# Import gaussian naive Bayes model
from sklearn.naive_bayes import GaussianNB

#Generate a classifier
clf=GaussianNB()
clf.fit(features_train, labels_train)

print(clf.predict([[2,3]]))


[1]


Below is the full code written at the end of lesson 1 (it won't run on it's own) 

The timing measurements show that the prediction is much faster than the training section, by like 30 times.

In [None]:
from sklearn.naive_bayes import GaussianNB

clf = GaussianNB()
t0 = time()
clf.fit(features_train, labels_train)
print ("training time:", round(time()-t0, 3), "s")


t1 = time()
pred = clf.predict(features_test)
print ("prediction time:", round(time()-t1, 3), "s")

from sklearn.metrics import accuracy_score

Accuracy = accuracy_score(labels_test, pred)

print(Accuracy)


## Lesson 3: Support Vector Machines (SVMs)

Another Classifier. Very popular and really good, quite new.

Broadly SVMs maximise the margin between the sets, but can tolerate some degree of outliers.
The important thing about how we use these is kernels. A Kernel takes a low dimensional system to a high dimensional system, which then allows the SVM to find a linear seperation.

They are cubic with data size, so are difficult to use on large datasets. They are also very prone to noise, and vulnerable to overfitting.

A few important parameters:
- Kernel
- Gamma
- C

Control of these parameters is important to avoid overfitting.

### C
Controls the tradeoff between having points correct and having a smooth boundary. A large value of C means more points will be correct (fewer points allowed to be outliers)

### Gamma
Defines the reach of each training boundary. Low values far, high values close reach.
A low value and high reach tends to smooth the decision boundary and effectively reduces the impact of the values close to the boundary relative to the many other points.






In [None]:
from sklearn.svm import SVC

clf = SVC(kernel='linear')

t0 = time()
clf.fit(features_train, labels_train)
print ("training time:", round(time()-t0, 3), "s")

t1 = time()
pred = clf.predict(features_test)
print ("prediction time:", round(time()-t1, 3), "s")

from sklearn.metrics import accuracy_score

Accuracy = accuracy_score(labels_test, pred)

print(Accuracy)