# Introduction to Machine Learning

## Week 0 - Overview
### What's this all about?
We ( @leah and @ollie on Slack) wanted to put together a simple introductory course to machine learning. We're hoping that it will be a relatively easy to follow introduction. One thing to understand is that machine learning isn't about the buzzwords and the fact that it's pretty sexy at the moment. Machine learning at it's heart is a process. A pretty awesome process that's a lot of fun, but underneath it all, there's a common process that should be followed, all with one goal, generalisation. Over this notebook, we intend to outline the process and define what we mean by generalisation. Throughout the next few notebooks, we will go into each step a little further. We hope that this will be an interactive learing process and we encourage questions. We're available on the slack channel #ssaeas-machine-learn or #machine_learning. There's no such thing as a stupid question, and if you have any questions, please ask them. We will have skipped over a lot of details, so there's a lot of things that you could ask about. If you're thinking it, there's probably others thinking it too. We'll try to integrate answers back into these notebooks so that they become living documents.

### Introduction
When people talk about machine learning, they can be talking about a lot of things. Generally machine learning is a subset of AI. In turn, deep learning is a subset of machine learning. FYI, we're planning on getting to deep learning in weeks 2&3.

As mentioned above, machine learning is a process. It's important that this is understood so that any implementation doesn't become [a high interest credit card of technical debt](https://static.googleusercontent.com/media/research.google.com/en//pubs/archive/43146.pdf).

### Generalisation
This is the fundamental goal of machine learning. We don't want to teach a model to behave perfectly on data that it's already seen. If we wanted that, we might as well just build a database. Instead, we want a model that learns *from* some data, finding some rules that apply equally well to data that it's not seen.

### The process
As stated, machine learning is a process, with the goal being generalisation. The process that you will (**should**) see in any machine learning approach to solving is problem is..
Gather the data -> understand the data -> pre-processing -> training a model -> post-processing (optional) -> model evaluation

#### Gather the data
First, we need to have data. You could get this from scraping websites (for example with [Beautiful Soup](https://www.crummy.com/software/BeautifulSoup/bs4/doc/#)), you could so surveys or whatever data collection method you like. However, usually this gets saved into a .csv (comma separated value) file.

[Pandas](https://pandas.pydata.org/) is a really useful tool for this. 

So, let's tackle a simple problem. We'll try to predict [if a game of tic-tac-toe is winnable](https://archive.ics.uci.edu/ml/datasets/Tic-Tac-Toe+Endgame).

First let's import the data, which is conveniently held in tic-tac-toe.csv

In [1]:
import pandas as pd
data = pd.read_csv("tic-tac-toe.csv")

In [2]:
# Let's look at what's in the first 10 rows
data.iloc[:10]

Unnamed: 0,top-left,top-middle,top-right,middle-left,middle-middle,middle-right,bottom-left,bottom-middle,bottom-right,Class
0,x,x,x,x,o,o,x,o,o,positive
1,x,x,x,x,o,o,o,x,o,positive
2,x,x,x,x,o,o,o,o,x,positive
3,x,x,x,x,o,o,o,b,b,positive
4,x,x,x,x,o,o,b,o,b,positive
5,x,x,x,x,o,o,b,b,o,positive
6,x,x,x,x,o,b,o,o,b,positive
7,x,x,x,x,o,b,o,b,o,positive
8,x,x,x,x,o,b,b,o,o,positive
9,x,x,x,x,b,o,o,o,b,positive


Once we've pulled the data in, we need to split our features ($x$) and our target ($y$). Our goal is to then learn a function that takes $x$ as a parameter and returns $y$, so $f(x)=y$. In our example above, our target ($y$) is the column 'Class'.

In [3]:
y = data['Class']
x = data.drop('Class', axis=1)

#### Understand the data

I'd recommend reading about the data. But for each position, a square is either occupied by an X('x'), a 0('o'), or a blank('b').

#### Pre-processing
We now have some data that has been imported.

First, the columns are clearly categorical, so let's change them into something easier to manage using [One Hot Encoding](https://machinelearningmastery.com/how-to-one-hot-encode-sequence-data-in-python/). 

In [4]:
x = pd.get_dummies(x)
x.iloc[:10]

Unnamed: 0,top-left_b,top-left_o,top-left_x,top-middle_b,top-middle_o,top-middle_x,top-right_b,top-right_o,top-right_x,middle-left_b,...,middle-right_x,bottom-left_b,bottom-left_o,bottom-left_x,bottom-middle_b,bottom-middle_o,bottom-middle_x,bottom-right_b,bottom-right_o,bottom-right_x
0,0,0,1,0,0,1,0,0,1,0,...,0,0,0,1,0,1,0,0,1,0
1,0,0,1,0,0,1,0,0,1,0,...,0,0,1,0,0,0,1,0,1,0
2,0,0,1,0,0,1,0,0,1,0,...,0,0,1,0,0,1,0,0,0,1
3,0,0,1,0,0,1,0,0,1,0,...,0,0,1,0,1,0,0,1,0,0
4,0,0,1,0,0,1,0,0,1,0,...,0,1,0,0,0,1,0,1,0,0
5,0,0,1,0,0,1,0,0,1,0,...,0,1,0,0,1,0,0,0,1,0
6,0,0,1,0,0,1,0,0,1,0,...,0,0,1,0,0,1,0,1,0,0
7,0,0,1,0,0,1,0,0,1,0,...,0,0,1,0,1,0,0,0,1,0
8,0,0,1,0,0,1,0,0,1,0,...,0,1,0,0,0,1,0,0,1,0
9,0,0,1,0,0,1,0,0,1,0,...,0,0,1,0,0,1,0,1,0,0


This is ok, but we need to do a similar thing for the target. We can't use OHE however because we want it all in one column. Instead, we'll just use replace.

In [5]:
y = y.replace("positive", 1)
y = y.replace("negative", -1)

All the features are either 0 or 1 and the target labels are -1 or 1. However, we need to remember the goal, generalisation. To do this, we need to hold some data back as test data do see how well our model generalises.

Let's split the data 80:20. 80% of the data is for us to train our model on and 20% to test our model.

In [6]:
# First We're going to concatenate the data again so that we don't lose the mapping between x and y
data = pd.concat([x, y], axis=1, join='inner')

# Then shuffle the data as currently all the target labels that are 1 are first, so we won't get a good representation.
data = data.sample(frac=1)
data = data.reset_index()

# Split into training and test data
split_point = (data.shape[0]//5)*4
train_data = data.iloc[:split_point]
test_data = data.iloc[split_point:]

# Lastly, split again
train_y = train_data['Class']
train_x = train_data.drop('Class', axis=1)

test_y = test_data['Class']
test_x = test_data.drop('Class', axis=1)

#### Training a model
We could use a number of different classifiers. There's good [descriptions](http://scikit-learn.org/stable/supervised_learning.html#supervised-learning) of verious methods out there, and there are [cheatsheets](https://docs.microsoft.com/en-us/azure/machine-learning/studio/algorithm-cheat-sheet) available to help you choose.

But, we're going to use a simple classification method called [logistic regression](http://scikit-learn.org/stable/modules/linear_model.html#logistic-regression).

In [7]:
from sklearn import linear_model
clf = linear_model.LogisticRegression(C=1.0, penalty='l1', tol=1e-6)
clf.fit(train_x, train_y)

LogisticRegression(C=1.0, class_weight=None, dual=False, fit_intercept=True,
          intercept_scaling=1, max_iter=100, multi_class='ovr', n_jobs=1,
          penalty='l1', random_state=None, solver='liblinear', tol=1e-06,
          verbose=0, warm_start=False)

In [8]:
predictions = clf.predict(test_x)

#### Post-processing (optional)
We turned our labels to -1 and +1, but we were given them as 'positive' and 'negative'. So we need to change our output back. I've put this down as an optional step, because it's something to bear in mind.

In [9]:
def postprocess(output):
    return 'positive' if output == 1 else 'negative'

#### Model evaluation
So how well does our logistic regression classifier do? Let's find out. We'll use accuracy as a measure. So, how many did we return as 1 vs how many we should have returned as 1.

In [10]:
from sklearn.metrics import accuracy_score
accuracy_score(test_y, predictions)

1.0

Boom! Our model has 100% accuracy on unseen data. Logistic regression is a deeply unsexy and simple classifier, but it just shows the power that even simple models have (and why we shouldn't overlook them).

For completeness, let's run the post-processing function over our output.

In [11]:
predictions = [postprocess(x) for x in predictions]
predictions

['negative',
 'positive',
 'negative',
 'positive',
 'positive',
 'positive',
 'negative',
 'positive',
 'negative',
 'positive',
 'positive',
 'positive',
 'positive',
 'positive',
 'negative',
 'positive',
 'positive',
 'positive',
 'positive',
 'positive',
 'negative',
 'positive',
 'positive',
 'negative',
 'negative',
 'positive',
 'positive',
 'positive',
 'positive',
 'positive',
 'positive',
 'positive',
 'negative',
 'positive',
 'negative',
 'negative',
 'positive',
 'positive',
 'positive',
 'negative',
 'positive',
 'positive',
 'negative',
 'negative',
 'positive',
 'positive',
 'positive',
 'positive',
 'positive',
 'positive',
 'negative',
 'positive',
 'positive',
 'negative',
 'negative',
 'positive',
 'negative',
 'negative',
 'positive',
 'positive',
 'positive',
 'positive',
 'positive',
 'positive',
 'negative',
 'positive',
 'negative',
 'negative',
 'positive',
 'positive',
 'negative',
 'positive',
 'positive',
 'positive',
 'positive',
 'negative',
 'positive',

### Further Reading
We try to recommend freely available information. Amex gives us access to great resources such as pluralsight and safari online. So we consider these to be freely available as well.

- [The ML subreddit is a good place to browse](https://www.reddit.com/r/MachineLearning/)
- [scikit-learn : Machine Learning Simplified](https://www.safaribooksonline.com/library/view/scikit-learn-machine/9781788833479/)
- [
Mastering Machine Learning with scikit-learn - Second Edition](https://www.safaribooksonline.com/library/view/mastering-machine-learning/9781788299879/)
- [Understanding Machine Learning](https://www.pluralsight.com/courses/understanding-machine-learning)
- [Understanding Machine Learning with Python](https://www.pluralsight.com/courses/python-understanding-machine-learning)
- [How to Think About Machine Learning Algorithms](https://www.pluralsight.com/courses/machine-learning-algorithms)

# Your turn
You didn't think this was going to be a case of just reading did you?

We've also included "example.csv". We're not going to tell you too much about the dataset, but it's fairly similar to the tic-tac-toe one above. It's a similar number of instancces and features. The target ($y$) is marked as 'class' and is either 0 or 1. We'll leave you to follow the steps in this notebook, but with this dataset. We'd love to hear how you get on via Slack. Good luck!