# Neural Networks with scikit-learn

This notebook will go through some examples of using scikit-learn's neural network. First we'll run through a simple example with one of scikit-learn's built-in datasets, to introduce the syntax. Then we'll dive into a slightly harder example, and finally we'll load up a third example for you to practice on.

# Example 1 - cancer classification

This example is covered in more detail at https://www.kdnuggets.com/2016/10/beginners-guide-neural-networks-python-scikit-learn.html - the code here has been taken from that site and the scikit-learn documentation.

### Loading the data
First, loading an example dataset packaged with sckit-learn. There are a number of these standard datasets, some of which are often used as canonical examples when documenting machine learning methods. In this case, the breast cancer database contains descriptors for a large number of tumours that are split into two classes - benign or malignant. 

In [None]:
from sklearn.datasets import load_breast_cancer
cancer = load_breast_cancer()

It's a dictionary containing the dataset, as well as some other information:

In [None]:
cancer.keys()

In [None]:
# Print full description by running:
# print(cancer['DESCR'])
# 569 data points with 30 features. Each set of 30 features represents one tumour.
cancer['data'].shape

It's already close to what we want for our model building:

In [None]:
X = cancer['data']
y = cancer['target']

In [None]:
print(X[0]) # The first sample, with 30 different features

In [None]:
print(y[0]) # The corresponding class - malignant or benign coded as 0 or 1

### Splitting and Scaling the data

We'll split the data into a training set and a test set - the second so that we can see the model's performance on previously unseen data.

Neural networks are partiularly sensitive to input scaling, so we'll adjust our inputs to be values from 0 to 1 to improve performance - more information on why we do this will come in the next unit.

In [None]:
# The test/train split
from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(X, y)

In [None]:
# Scaling the data. 
from sklearn.preprocessing import StandardScaler
scaler = StandardScaler()
# Fit only to the training data - Remember, from this point until the model is trained, we can't make 
# any decisions, including scaling factor, based on the test data.
scaler.fit(X_train)

In [None]:
# Now apply the transformations to the data:
X_train = scaler.transform(X_train)
X_test = scaler.transform(X_test) # This would be done to any new data you wanted to run the model on as well

### Creating the network

Since we're trying to solve a classification problem, we'll use the MLPClassifier network proided by scikit-learn

In [None]:
from sklearn.neural_network import MLPClassifier

We create a network with three hidden layers, each with 30 neurons. This is a somewhat arbitrary choice - hyperparameter selection is still occasionally more art than science, although there are some good rules of thumb. For now, let's stick with this for the examples.

In [None]:
mlp = MLPClassifier(hidden_layer_sizes=(30,30,30))

In [None]:
mlp.fit(X_train,y_train)

At this point, our model has been created! We can now use it to make predictions on new data that we want to classify. Now, we'll score it.

In [None]:
predictions = mlp.predict(X_test)

In [None]:
# The score - pretty good! ~96% accuracy (will vary depending on the random weights the model started with)
print(mlp.score(X_test, y_test))
# The confusion matrix gives a better picture of what errors are slipping in
from sklearn.metrics import classification_report,confusion_matrix
print(confusion_matrix(y_test,predictions))

# Example 2 - Now for something more technical

The data comes from https://archive.ics.uci.edu/ml/datasets/Robot+Execution+Failures
It records the forces and torqes experienced when a robot tries a certain operation. Attempts are classed as 'normal', 'collision', 'obstruction' or 'fr_collision'. The goal is to try and predict which of these has ocured from the measured forces and tourques. As before, we'll be using scikit's MLPClassifier for this, but note how much extra work we need to do to get data ready for the modelling process.

In [None]:
f = open('robot_execution_failure/lp1.data')

In [None]:
lines = f.readlines() # Read the file line by line into a list

In [None]:
lines[:5] #The first 5 lines.

The file contains a class on one line followed by 15 lines, each with three force readings and three torque readings separated by tabs. You could observe this by opening the file in a text editor. These readings represent the forces etc measured over an interval after a given event. We'll start by using just the first line of readings after a class line. 

In [None]:
X = [] # inputs
y = [] # true values
classes = {'normal':0, 'collision':1, 'obstruction':2, 'fr_collision':3} # Encoding the classes as integers

# Here, we iterate over the lines of the file. If a line matches one of our classes, we split the next line 
# to get the six readings and use those as our features. 
for i in range(len(lines) - 1):
    line = lines[i].strip() # .strip() removes the line endings \n
    if line in classes.keys(): # If the line matches one of our classes (for eg, 'normal')
        features = [int(x) for x in lines[i+1].strip().split('\t')] # Split the next line to get our features
        X.append(features)
        y.append(classes[line]) # And record which class this set of features belongs to


Now we have our X and Y, we can proceed as before

In [None]:
from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(X, y)

In [None]:
print(len(X_train)) # 66 training points - not much to work with!

In [None]:
# Preparing to scale the inputs
scaler = StandardScaler()
scaler.fit(X_train)

In [None]:
# Scaling the inputs 
X_train = scaler.transform(X_train)
X_test = scaler.transform(X_test)

In [None]:
# Creating the neural network
mlp = MLPClassifier(hidden_layer_sizes=(30,30,30))

In [None]:
# Training
mlp.fit(X_train, y_train)

You may get a convergence warning for the above step - this means the network hasn't yet learnt enough. It will still work, but this is generally a sign that we need more data or should let the network train for longer - adjusted by changing the max_iter parameter.

In [None]:
predictions = mlp.predict(X_test)
print(confusion_matrix(y_test,predictions))

In [None]:
print(mlp.score(X_test, y_test))

The network may not be performing very well - this is expected, as we have so little training data. It's amazing that it's able to get >50% accuracy, considering that we're training it with only 10 or so samples from each class. However, let's try training for a little longer, using max_iter = 1000 rather than the default: 100.

In [None]:
mlp = MLPClassifier(hidden_layer_sizes=(30,30,30), max_iter = 1000)
mlp.fit(X_train, y_train)
print(mlp.score(X_test, y_test))

In [None]:
predictions = mlp.predict(X_test)
print(confusion_matrix(y_test,predictions))

Take a look at the confusion matrix - perhaps something stands out. Maybe normal operations are classified correctly, but the network struggles to distinguish between obstructions and collisions, or that one class is particularly tricky. It may be that this problem is not possible to solve given so little data...

### Feature engineering - what if we add a variable for combined torque?

In [None]:
import math

In [None]:
X_new = []
for x in X:
    torque = x[3] + x[4] + x[5]
    X_new.append(x + [torque])
print(X_new[0]) # Look at the first input

In [None]:
X_train, X_test, y_train, y_test = train_test_split(X_new, y)
scaler = StandardScaler()
scaler.fit(X_train)
X_train = scaler.transform(X_train)
X_test = scaler.transform(X_test)

In [None]:
mlp = MLPClassifier(hidden_layer_sizes=(30,30,30), max_iter = 1000)
mlp.fit(X_train, y_train)
print(mlp.score(X_test, y_test))

An improvement! But it can still get much better. In the HLT assignment, you'll see how much difference adding some extra features will make, and will use more training data to improve the predictions.

# Example 3 - Wall Following Robot

This is yet another classification problem. The data is in a slightly different format, and we'll load it into a pandas dataframe to show another way we might pre-process data. The data comes from https://archive.ics.uci.edu/ml/datasets/Wall-Following+Robot+Navigation+Data. An interesting feature of this data is that the task cannot be accomplisjed with a linear moded, but our multi-layer network is up to the task.

The data consists of distance sensor readings, pointing out in different directions around the robot. An example line from the file:
'0.438,0.498,3.625,3.645,5.000,2.918,5.000,2.351,2.332,2.643,1.698,1.687,1.698,1.717,1.744,0.593,0.502,0.493,0.504,0.445,0.431,0.444,0.440,0.429,Slight-Right-Turn'

In [None]:
import pandas as pd
import numpy as np

In [None]:
# The data doesn't come with column names. Since there are 24 columns of data and then a class, we'll 
# label the columns accordingly
column_names = [i+1 for i in range(24)]+['Class']
data = pd.read_csv('wall_following/sensor_readings_24.data', names = column_names)
data.head()

We'll convert the classes as before, and then get our X and y inputs from the dataframe

In [None]:
classes = {'Move-Forward':0, 'Slight-Right-Turn':1, 'Sharp-Right-Turn':2, 'Slight-Left-Turn':3}
data = data.replace({'Class': classes}) # There are various ways of doing this - the get_dummies method is an alternative
data.head()

In [None]:
X = np.array(data.loc[:,data.columns[:24]]) # The first 24 columns, corresponding to all the sensor readings
y = data.loc[:, 'Class'] # The class column

In [None]:
# Practice: As before, train a network (with hidden_layer_sizes=(30,30,30)) and see ow well it performs

And if we remove those hidden layers, replacing them with a single perceptron? In this case, we have one linear set of weights multiplied by the input, and mapped through a sigmoid - similar to logistic regression. As you can see, this performs far worse.

In [None]:
# Make a new network with hidden_layer_sizes=(1) and compare the score

### Feature engineering

Sometimes, simplifying features or calculating new, composite features can help model accuracy. For example, the file wall_following/sensor_readings_4.data contains only 4 features per line - the minimum distance from any sensor in one of four directions. These have been calculated from the set of 24 readings we've been looking at.

In [None]:
data = pd.read_csv('wall_following/sensor_readings_4.data', names = [1, 2, 3, 4, 'Class'])
X = np.array(data.loc[:,[1, 2,3, 4]])
y = data.loc[:, 'Class']
X_train, X_test, y_train, y_test = train_test_split(X, y)
scaler = StandardScaler()
scaler.fit(X_train)
X_train = scaler.transform(X_train)
X_test = scaler.transform(X_test)
mlp = MLPClassifier(hidden_layer_sizes=(30,30,30))
mlp.fit(X_train, y_train)
print(mlp.score(X_test, y_test))

Notice, the network performs even better, despite the fact that we've gone from 24 features to just four. This is worth bearing in mind, especially when you have some knowledge of the underlying system. Here, we know that sensors are noisy, but what we really want to do is avoid the walls - so looking at minimum distances makes sense. In other cases, it may be worth introducing a quadratic term to capture physical relationships (for example, given mass and velocity as features, finding mv^2 to capture some energy related relationship might help) or just simplifying sets of related features. There are few hard and fast rules for feature engineering - it's an area where experience and luck still play a major role in finding ones that work.