# Perceptron
* same as `SGDClassifier` with `loss="perceptron", learning_rate='constant', eta0=1, penalty=None`

In [2]:
import numpy as np
from sklearn.datasets import load_iris
from sklearn.linear_model import Perceptron

iris = load_iris()
X = iris.data[:, (2,3)]
y = (iris.target == 0).astype(np.int)

perceptron_clf = Perceptron()
perceptron_clf.fit(X, y)

y_pred = perceptron_clf.predict([[2, 0.5]])

## Etymology
* Artificial Neural Network ANN
* Deep Neural Network DNN
* Feedforward Neural Network FNN

## Glossary
* epoch: pass of a mini-batch

## Training of ANNs
* [Backpropagation](https://apps.dtic.mil/docs/citations/ADA164453): one pass forward, one pass backward to compute gradients, then perform Gradient Descent
 0. initialize hidden layers, e.g. randomly but for sure unequally to break symmetry between neurons
 1. pass in input
 2. compute hidden layer outputs and store
 3. produce output
 4. compute error
 5. *apply chain rule* to compute how much each connection to output contributed to error
 6. work backwards to next layer via chain rule to propagate the error backwards
* autodiff (reverse mode autodiff) automatically computing gradients for backpropagation
* activation functions: Nonlinearity is key as chain of linear functions only yield another linear function 
  * heaviside step function -> no defined derivative at 0
  * logistic (sigmoid) function $\sigma(z) = 1 / (1 + exp(-z))$ (output between 0 and 1)
  * hyperbolic tangent function $\tanh(z) = 2 \sigma(2z) -1$ (output between -1 and 1)
  * Rectified Linear Unit function $\rm{ReLU}(z) = max(0,z)$ (output above 0, but not differentiable at 0)
  * soft-plus function $\rm{softplus}(z) = log(1+exp(z))$ (close to 0 when input negative, almost linear when z positive)

# Regression
## Metrics/Loss functions
* Mean squared error
* Mean absolute error (Root-mean-squared error)
* Huber loss (combination of the two above)
* Accuracy, Specificity, Recall, F-Score

## Hyperparameters
* #_ Input neurons (one per feature)
* #_ Hidden layers (1-5 usually)
* #_ Neurons in each layer
* #_ output neurons
* Type of activation function (ReLU/Sigmoid)
* Output activation
* Loss function (MSE, MAE, RMS)

# Classification
* Activation function (usually sigmoid, softmax for multinomial)
* output neurons (one for binary, one per label for multilabel binary, one per class for multinomial)
* Loss function (Cross entropy)

In [2]:
import tensorflow as tf
from tensorflow import keras

In [3]:
fashion_mnist = keras.datasets.fashion_mnist
(X_train_full, y_train_full), (X_test, y_test) = fashion_mnist.load_data()

Downloading data from https://storage.googleapis.com/tensorflow/tf-keras-datasets/train-labels-idx1-ubyte.gz
Downloading data from https://storage.googleapis.com/tensorflow/tf-keras-datasets/train-images-idx3-ubyte.gz
Downloading data from https://storage.googleapis.com/tensorflow/tf-keras-datasets/t10k-labels-idx1-ubyte.gz
Downloading data from https://storage.googleapis.com/tensorflow/tf-keras-datasets/t10k-images-idx3-ubyte.gz


dtype('uint8')

In [4]:
X_train_full.shape

(60000, 28, 28)

In [5]:
X_train_full.dtype

dtype('uint8')

In [6]:
X_valid, X_train = X_train_full[:5000] / 255.0, X_train_full[:5000] / 255.0
y_valid, y_train = y_train_full[:5000], y_train_full[:5000]
X_test = X_test / 255.0

class_names = ["T-shirt/top", "Trouser", "Pullover", "Dress", "Coat", "Sandal", "Shirt", "Sneaker", "Bag", "Ankle boot"]

In [10]:
class_names[y_train[0]]

'Ankle boot'

In [12]:
model = keras.models.Sequential()
model.add(keras.layers.Flatten(input_shape=[28, 28]))
model.add(keras.layers.Dense(300, activation='relu'))
model.add(keras.layers.Dense(100, activation='relu'))
model.add(keras.layers.Dense(100, activation='softmax'))

In [13]:
model.summary()

Model: "sequential_1"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
flatten_1 (Flatten)          (None, 784)               0         
_________________________________________________________________
dense (Dense)                (None, 300)               235500    
_________________________________________________________________
dense_1 (Dense)              (None, 100)               30100     
_________________________________________________________________
dense_2 (Dense)              (None, 100)               10100     
Total params: 275,700
Trainable params: 275,700
Non-trainable params: 0
_________________________________________________________________


In [15]:
hidden1=model.layers[1]
weights, biases = hidden1.get_weights()

In [17]:
weights.shape

(784, 300)

In [None]:
model.compile(loss='sparse_categorical_crossentropy',
             optimizer='sgd',
             metrics='accuracy')

In [None]:
history = model.fit(X_train, y_train, epochs=30, validation_data=(X_valid, y_valid))

In [None]:
import pandas as pd
import matplotlib.pyplot as plt

pd.DataFrame(history.history).plot(figsize=(8, 5))
plt.grid(True)
plt.gca().set_ylim(0,1)
plt.show()

In [None]:
model.evaluate(X_test, y_test)
X_new = X_test[:3]
y_proba = model.predict(X_new)
y_pred = model.predict_classes(X_new)
np.array(class_names)[y_pred]

In [None]:
from sklearn.datasetssets import fetch_california_housing
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler

housing = fetch_california_housing()
X_train_full, X_test, y_train_full, y_test = train_test_split(housing.data, housing.target)
X_train, X_valid, y_train, y_valid = train_test_split(X_train_full, y_train_full)

scaler = StandardScaler()
X_train = scaler.fit_transform(X_train)
X_valid = scaler.transform(X_valid)
X_test = scaler.transform(X_test)

In [None]:
model = keras.models.Sequential([
    keras.layers.Dense(30, activation='relu', input_shape=X_train.shape[1:]),
    keras.layers.Dense(1)
])
model.compile(loss='mean_squared_error', optimizer='sgd')
history = model.evaluate(X_test, y_test)
X_new = X_test[:3]
y_pred = model.predict(X_new)