# COSC 411: Artificial Intelligence

Instructor: Dr. Shuangquan (Peter) Wang

Email: spwang@salisbury.edu

Department of Computer Science, Salisbury University


# Module 3_ML algorithms and application

## 3. Artificial Neural Networks (ANN)


**Contents of this note are mainly from 1) Dr.Robert Michael Lewis's teaching materials at Department of Computer Science, William & Mary; and 2) https://scikit-learn.org/stable/auto_examples/classification/plot_digits_classification.html**

**<font color=red>All rights reserved. Dissemination or sale of any part of this note is NOT permitted.</font>**

# Artificial neural networks

**Artificial neural networks (ANN)**, also called **multilayer perceptrons (MLP)** or neural networks, for short, are an elaboration of the perceptron we have seen previously.

ANNs are characterized by a large number of relatively simple, parallel computations being combined to approximate complex input-output relationships.

**The input-output relationships ANNs model are not restricted to classification.**

In classification, we have the input-output relationship
$$
\mbox{features} \longrightarrow \mbox{class label}.
$$

More generally we can imagine a generic input-output relationship
$$
\mbox{features $x$} \longrightarrow \mbox{output $y = F(x)$}.
$$

Fitting models to general input-output relationships is broadly known as **regression**.

## When are ANNs appropriate?

* There is an input-output relationship we wish to model with many attribute-value pairs.  Inputs can be any real numerical values.
* The target output is one or more discrete values or real values.
* The training set may contain noise.  ANN learning methods are robust with respect to noise in the training set (provided you avoid overfitting!).
* Long model calibration times are acceptable.  Training an ANN is typically a more computationally intensive task than constructing a decision tree, for instance.
* Fast evaluation of the input-output function is desirable.  While training an ANN takes time, applying it to new cases is very fast.
* An opaque model is acceptable.  ANNs are frequently difficult to interpret.

# Multilayer networks

To overcome the limitations of a single perceptron, we need a network of perceptron-like units.

The building blocks of the multilayer ANN will be functions of the form
$$
o(x_{1}, \ldots, x_{n}) = \sigma(w^{T}x).
$$

The original perceptron is a discontinuous function of its inputs.  It is more convenient to have a differentiable function.  A standard choice is the sigmoid function (a.k.a. the logistic function)
$$
\sigma(z) = \frac{1}{1+e^{-z}}.
$$
Its output ranges between 0 and 1.

Other common choices are 
$$
\sigma(z) = \frac{1}{1+e^{-kz}},\quad k > 0
$$
The function $\sigma$ is called the **activation function**.

# ANNs in scikit-learn

Let's look at an example using the ANN in scikit-learn, the [MLP class](http://scikit-learn.org/stable/modules/neural_networks_supervised.html).

We will use an ANN to model the input-output relationship $y = \sqrt{x}$.

This is **not** a classification problem.  It is a regression problem.

We will create an ANN with
* a single input, $x$, 
* a single hidden layer with 10 nodes, and
* a single output, the estimate of $\sqrt{x}$.

For unit $i$ in the hidden layer, the ANN takes $x$ and maps it to $w_{0,i} + w_{1,i}x$ and feeds it to the activation function for unit $i$.

This means there are 20 model parameters for the map from the inputs to the hidden layer: the weights $w_{0,i}$ and the weight $w_{1,i}$ for each unit in the hidden layer.

There are also 11 weights that take the hidden layer outputs and map them to the final scalar output.

Thus, there is a total of 31 model parameter.

We will create a training set of 200 equally spaced values of $x$.

In [None]:
%matplotlib inline
import matplotlib.pyplot as plt
import numpy as np
from sklearn.neural_network import MLPRegressor

# Here is the training set.
X_train = np.linspace(0, 10, 200)
X_train = X_train.reshape(-1, 1)
y_train = np.sqrt(X_train)
y_train = y_train.ravel()

plt.plot(X_train, y_train, 'b-')
plt.title('Plot of $y = \sqrt{x}$', fontsize=18)

ann = MLPRegressor(solver='lbfgs', alpha=1e-5, activation='relu',
                   hidden_layer_sizes=(10), random_state=1)

ann.fit(X_train, y_train)                         

Now create a test set of 20 uniformly distributed random variables in the interval [0, 10] and use the ANN to predict the values of their square root.

In [None]:
X_test = 10*np.random.rand(20, 1)
y_test = np.sqrt(X_test)
y_pred = ann.predict(X_test)

In [None]:
plt.plot(X_train, y_train, 'b-', X_test, y_pred, 'ro')
plt.title('ANN predictions in red', fontsize=18)

Now let's look at the model coefficients.

In [None]:
c = ann.coefs_
b = ann.intercepts_

print('coefficients for inputs to hidden layer: ')
print(c[0])
print('intercepts (weight 0) for inputs to hidden layer:')
print(b[0])
print()

print('coeffcients for hidden layer to output:')
print(c[1])
print('intercept (weight 0) for hidden layer to output:')
print(b[1])


# Recognizing hand-written digits

This example shows how ANN can be used to recognize images of hand-written digits, from 0-9.


In [None]:
# Revised from the following example
# Author: Gael Varoquaux <gael dot varoquaux at normalesup dot org>
# License: BSD 3 clause

# Standard scientific Python imports
import matplotlib.pyplot as plt

# Import datasets, classifiers and performance metrics
from sklearn import datasets, metrics
from sklearn.neural_network import MLPClassifier
from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score

## Digits dataset

The digits dataset consists of 8x8
pixel images of digits. The ``images`` attribute of the dataset stores
8x8 arrays of grayscale values for each image. We will use these arrays to
visualize the first 4 images. The ``target`` attribute of the dataset stores
the digit each image represents and this is included in the title of the 4
plots below.

Note: if we were working from image files (e.g., 'png' files), we would load
them using `matplotlib.pyplot.imread`.



In [None]:
digits = datasets.load_digits()

_, axes = plt.subplots(nrows=1, ncols=4, figsize=(10, 3))
for ax, image, label in zip(axes, digits.images, digits.target):
    ax.set_axis_off()
    ax.imshow(image, cmap=plt.cm.gray_r, interpolation="nearest")
    ax.set_title("Training: %i" % label)

## Classification

To apply ANN on this data, we need to flatten the images, turning
each 2-D array of grayscale values from shape ``(8, 8)`` into shape
``(64,)``. Subsequently, the entire dataset will be of shape
``(n_samples, n_features)``, where ``n_samples`` is the number of images and
``n_features`` is the total number of pixels in each image.

We can then split the data into train and test subsets and fit an ANN classifier on the train samples. The fitted classifier can
subsequently be used to predict the value of the digit for the samples
in the test subset.



In [None]:
# flatten the images
n_samples = len(digits.images)
data = digits.images.reshape((n_samples, -1))

# Create a classifier
ann = MLPClassifier(solver='lbfgs', alpha=1e-4, activation= "logistic",
                    hidden_layer_sizes=(100), max_iter = 1000, random_state=1)

# Split data into 50% train and 50% test subsets
X_train, X_test, y_train, y_test = train_test_split(
    data, digits.target, test_size=0.5, shuffle=False
)

# Learn the digits on the train subset
ann.fit(X_train, y_train)

# Predict the value of the digit on the test subset
predicted = ann.predict(X_test)

Below we visualize the first 4 test samples and show their predicted
digit value in the title.



In [None]:
_, axes = plt.subplots(nrows=1, ncols=4, figsize=(10, 3))
for ax, image, prediction in zip(axes, X_test, predicted):
    ax.set_axis_off()
    image = image.reshape(8, 8)
    ax.imshow(image, cmap=plt.cm.gray_r, interpolation="nearest")
    ax.set_title(f"Prediction: {prediction}")

`sklearn.metrics.classification_report` builds a text report showing
the main classification metrics.



In [None]:
print(
    f"Classification report for classifier {ann}:\n"
    f"{metrics.classification_report(y_test, predicted)}\n"
)

We can also plot a `confusion matrix <confusion_matrix>` of the
true digit values and the predicted digit values.



In [None]:
disp = metrics.ConfusionMatrixDisplay.from_predictions(y_test, predicted)
disp.figure_.suptitle("Confusion Matrix")
print(f"Confusion matrix:\n{disp.confusion_matrix}")

plt.show()