# 5. Multiclass with the Perceptron
Using the perceptron as an example of Binary Classifier, we can do multiclass classification! How? Let's see... First of all, bring here the code you had from yesterday:

In [None]:
%matplotlib inline
import numpy as np
import matplotlib.pyplot as plt

def fit_line_dataset(x, y, data):
    line = np.stack((x, y), axis=1)
    flags = np.all(np.logical_and(line < np.max(data, axis=0)[0:2], line > np.min(data, axis=0)[0:2]), axis=1)
    line = line[flags, :]
    new_x = line[:, 0]
    new_y = line[:, 1]
    # Plot these returns in your code "plot_data(data, w)"
    return new_x, new_y

# Your code goes here

## 5.1 Multi-classes!

We were talking about two classes since Wednesday. It happens that our problem might present more classes than that! In this section, we'll expand the perceptron algorithm to more than two classes! Our new dataset is called ``data_3classes.txt``. It is very similar to ``data.txt``, but now the last column of it takes values from 0 to 2 (therefore three classes). Let's first create a function to visualize this data (in this function, we'll forget about the markers and the line and we'll differentiate the classes by colors).  

In [None]:
def plot_data_3classes(X, classes):
    # Your code goes here
    plt.show()
    
data = np.loadtxt('data/data_3classes.txt')
X, classes = preprocess_data(data)
plot_data_3classes(X, classes)

Now, it is your job to code the *one-vs-rest* strategy for multiclass classification as discussed in class. Make use of the perceptron function you coded previously and store the classifiers (i.e. the $w$'s) in the rows of a matrix called ``W``. Assume that you know that there are 3 classes in your dataset beforehand. Also, make a function ``replace_numbers(v, n)`` that gets a vector ``v`` and a number ``n`` and makes a new ``v`` such that ``new_v[i]`` is ``1``, if ``v[i] == n`` or it is ``-1``, otherwise (you coded this in a HW previously).

In [None]:
def replace_numbers(v, n):  
    # Your code goes here
    return v_new

def perceptron_3classes(X, classes):
    n_classes = 3
    n, d = np.shape(X)
    W = np.zeros((n_classes, d + 1))  # The '+1' is due to the bias that you'll add in the perceptron alg.
    # Your code goes here 
    return W
    
data = np.loadtxt('data/data_3classes.txt')
X, classes = preprocess_data(data)
ws = perceptron_3classes(X, classes)

Well, a result like that is quite boring... Now, improve the function ``plot_data_3classes()`` to also plot the lines that are defined by the ``w`` for each class. Hint.: First plot the points and then the lines.

In [None]:
import numpy as np
def plot_data_3classes_w_lines(data, ws):
    # Your code goes here
    plt.show()
        
data = np.loadtxt('data/data_3classes.txt')
X, classes = preprocess_data(data)
W = perceptron_3classes(X, classes)
plot_data_3classes_w_lines(data, W)

Now, our task will be to do training and testing! What to do now? Let's get the score for each points with respect to each line/$w$. Can you compute this matrix of scores?

In [None]:
data, ratio = np.loadtxt('data/data_3classes.txt'), .7
X_train, X_test, classes_train, classes_test = get_datasets(data, ratio)
W = perceptron_3classes(X_train, classes_train)

n, d = np.shape(X_test)
scores = np.zeros((3, n)) 
classification = np.zeros((3, n)) 

# Your code goes here

print(classification)

What do these numbers mean?

## 5.2 Dealing with ambiguities
In the previous result (if you used ``ratio = .7``), what do you think we should do about the ambiguous classification? What to do in that case? First, modify your ``classify(x, w)`` function to return the class *and* the dot product (call this last value ``score``).

In [None]:
def classify(x, w):
    # Your code goes here
    return cla, score

def classify_all_points(X, w):
    # Your code goes here
    return classification, scores

Now, redo the classification about and compute the scores along with the classifications. Print the matrix of scores.

In [None]:
data = np.loadtxt('data/data_3classes.txt')
X_train, X_test, classes_train, classes_test = get_datasets(data, ratio)
W = perceptron_3classes(X_train, classes_train)

n, d = np.shape(X_test)
scores = np.zeros((3, n)) 
# Your code goes here

print(scores)

Now, how do we compute the index of the maximum score for each point when classified by each line? In other words, which lines classifies the best each point?

In [None]:
data = np.loadtxt('data/data_3classes.txt')
X_train, X_test, classes_train, classes_test = get_datasets(data, ratio)
W = perceptron_3classes(X_train, classes_train)

n, d = np.shape(X_test)
scores = np.zeros((3, n)) 
# Your code goes here

What is the loss and the accuracy now? 

In [None]:
# Your code goes here

## 5.3 The softmax function and soft classification
Previously, we simply computed the maximum index among the classification scores. What if we could actually have a better idea of how certain a point is of being classified in a given class?

In order to do that, we'll use the softmax function of a vector as explained in class. Make a function ``softmax()`` that receives a matrix $M$ and compute the softmax of each of its columns (if you're are finding it complicated, it's ok! Ask the instructor if you need help!)

In [None]:
def softmax(M):
    # Your code goes here
    return soft_max_scores

Now, simply pass the scores you've gotten previously and compute the softmax of them.

In [None]:
data = np.loadtxt('data/data_3classes.txt')
ratio = .7
X_train, X_test, classes_train, classes_test = get_datasets(data, ratio)
W = perceptron_3classes(X_train, classes_train)

# Your code goes here

soft_max_scores = softmax(scores)

Now, take a time to see what ``soft_max_scores`` is saying (actually, print the transpose of ``soft_max_scores`` as ``soft_max_scores.T`` in order to visualize it better).
Lastly, compute the index of maximum softmax score of each column.

In [None]:
# Your code goes here

## 5.4 Using Matrix Multiplication
Ok, so far we had to use more loops than necessary and, honestly, that's not very efficient, even computationally (it gets the code slower). So far, I've been making you guys code an unnecessarily long classification step, when it could have being done in *one* line of code. Let's get there! 

Firstly, remembering what we discussed about matrix multiplication, how can I change the function ``classify_all_points()`` so it receives a matrix $W$ and the datapoints ``X`` and does all the dot products of each $x[i]$ and the perceptron lines at once (call this new function ``classify_all_points_MM_v1(W, X)``)? Hint.: This will be using multiplications of matrices and vectors. 

In [None]:
def classify(x, w):
    # Your code goes here
    return cla, score

def classify_all_points_MM_v1(W, X):
    # Your code goes here  
    return classification

data, ratio = np.loadtxt('data/data_3classes.txt'), .7
X_train, X_test, classes_train, classes_test = get_datasets(data, ratio)
W = perceptron_3classes(X_train, classes_train)

classification_scores = classify_all_points_MM_v1(W, X_test)

(Challenge) Now, there is still a for loop in ``classify_all_points_MM_v1(W, X)``. How can I get rid of it with a matrix multiplication between two matrices?

In [None]:
def classify(x, w):
    # Your code goes here
    return cla, score

def classify_all_points_MM_v2(W, X):
    # Your code goes here   
    return classification

data, ratio = np.loadtxt('data/data_3classes.txt'), .7
X_train, X_test, classes_train, classes_test = get_datasets(data, ratio)
W = perceptron_3classes(X_train, classes_train)

classification_scores = classify_all_points_MM_v2(W, X_test)

## 5.5 (Extra) Non Linearly separable datasets

### 5.5.1 The XOR Function
In logic, there is a very important function that is also very simple. You have two binary (either 0 or 1) inputs $x$, $y$ and it returns $1$ if $x = y$ or $-1$ if $x \neq y$. It's name is XOR. Given the following dataset (that represents all possible XOR inputs/outputs), draw its scatter plot as we did yesterday:  

In [None]:
import numpy as np
import matplotlib.pyplot as plt

X = np.array([[0,0], [0,1], [1,0], [1,1]])
classes = np.array([1, -1, -1, 1])

# Your code goes here

In your mind, can you find a straight line that separates pluses and minuses? 

### 5.5.2 The Moons dataset
In ``/data`` there is a dataset called ``moons.txt``. Can you also plot it? After doing that, can you find a line that separates the pluses  from the minuses?

In [None]:
# Your code goes here

What is now your definition of Non Linearly separable dataset?

### 5.5.3 The MNIST dataset

Finally, let's just check our future data, the MNIST dataset. It's a non linearly separable database of handwritten digits from 0 to 9 (therefore, 10 classes). For now, let's simply check it out. The file ``mnnist_small.txt`` contains a small part of the whole original dataset and it follows the dataset format that we've been working with. Let's check it out. Just print one of the points of it (remember that the last element is the class the digit belongs to). Call this point ``p``. Also, print the classification of that point and store it in a variable ``c``.

In [None]:
data = np.loadtxt('data/mnnist_small.txt')
# Your code goes here

The vector that you just printed is a handwritten digit! We can't see a lot from it, can we? That's because we need to convert the vector in a matrix and then *plot the matrix itself*! You can plot matrices with the function ``plt.imshow()`` from ``matplotlib.pyplot``.In order to convert the vector to a matrix, you need to reshape it with ``np.reshape()``.

Now type ``np.reshape(p, (28, 28))``, followed by ``np.imshow(m)`` and then ``plt.show()``. Is that what the classification ``c`` tells you? 

In [None]:
# Your code goes here
plt.show()

One last thing, what does this plot of a matrix tell you about *images* in general?