<a href="https://colab.research.google.com/github/re114/re114.github.io/blob/main/machine_learning_introduction.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

##Approaches to machine learning
Supervised Learning  
In supervised learning we take a set of data and a set of labels that map to the data, xi -> yi, the dataset is split to provide a training set and a test set, the model is trained on the data and labels, and tested on the test data, the aim is to reduce the error. Supervised learning is typically used for Classification or Regression problems.
A Regression problem involves predicting a numerical label, an example of regression would be projections of new coronavirus cases based on historic data, or data from other countries such as provided by worldmeter (“United Kingdom Coronavirus cases,” n.d.).
A Classification problem aims to identify a class label, an example of a classification problem would be the digit identification we looked at with the MNIST database (LeCunn et al., n.d.), which takes data and label pairs (x,y) x is data y is label, with the goal of learning a function to map x-> y.
The MNIST dataset consists of 60000 labelled images of the digits 0-9. (LeCunn et al., n.d.) So a supervised model might take 40000 images and labels as the training set and run the model on these. Once the model is trained it is run with the test data.
A good model will give similar test results to training results. A model that has been too closely training is said to have been overfitted to the training data which. This highlights the importance of splitting the data into at least training and test data (better still, training, verification and testing data) sets to prevent unseen correlations from swaying the model.
  
“Learning is a search through the space of possible hypotheses for one that will perform well, even on new examples beyond the training set. To measure the accuracy of a hypothesis we give it a test set of examples that are distinct from the training set.”(Norvig & Stuart, 2010).

Unsupervised learning.  
Unsupervised learning does not require labelled data, and instead actively looks for correlations within the data to learn the underlying structure - "is this thing like another thing?"
Unsupervised learning is typically used for clustering or density estimation problems.
An example of a problem addressed by clustering is spam filtering, which can use K-Means clustering to at the email header and content and create groups, or clusters to identify problem emails.(“(28) Lecture 13.2 — Clustering | KMeans Algorithm — [ Machine Learning | Andrew Ng ] - YouTube,” n.d.)
“The most common unsupervised learning task is clustering detecting potentially useful clusters of input examples. For example, a taxi agent might gradually develop a concept of “good traffic days” and “bad traffic days” without ever being given labelled examples of each by a teacher.”(Norvig & Stuart, 2010).  

Reinforcement learning  
Reinforcement learning places the machine learning agent in an environment and lets it learn using feedback and success against a success criterion. It takes data as state action pairs and sets goals based on maximum future rewards over many time steps.
“Reinforcement learning is learning what to do — how to map situations to actions—so as to maximize a numerical reward signal. The learner is not told which actions to take, but instead must discover which actions yield the most reward by trying them.”(Sutton & Barto, 2018) An example of reinforcement learning would be the (very popular with Computer Science students) work on AI systems learning to play Video games (Shao, Tang, Zhu, Li, & Zhao, 2018).  

  
  


Neural Networks  
It's interesting to consider that the original work undertaken by McCulloch and Pitts (Mcculloch & Pitts, 1990) on neural brain structures as logic gates informed and inspired Von-Neumann's architecture (Ohta, 2015) and his view of the computer as a brain. The paradigm shifted and we began to view the brain as a type of computer, and as we came round to neural networks the analogy switched back once more (Cobb, 2020).  
Early work on Neural Networks was carried out by Frank Rosenblatt, who described the structure of the perceptron (Rosenblatt, 1958). Rosenblatt may have overhyped his findings, and fed a media circus instead of managing expectations (Boden, 2006).  
  

 Figure 9 An image of Rosenblatt's perceptron.
Minksy at MIT published a damning mathematical analysis of Rosenblatt's work (Minsky, 1961) which many cite as precipitating the first AI winter (Boden, 2006) (Norvig & Stuart, 2010). The paper suggested that the perceptron was a dead end for AI as it could not internally represent the things it was learning (Cobb, 2020) and it was not until the adoption of backpropagation that the approach became ascendant again (Y. LeCun et al., 1989).  
  
The structure of Neural Networks  
Bengio quotes Hinton: "You have relatively simple processing elements that are very loosely models of neurons. They have connections coming in , each connection has a weight on it, and that weight can be changed through learning” (LeCunn, Bottou, & Haffner, 1998).  
  
Chollet outlines the processes in the operation of a Neural Network (François Chollet, 2019).  
• define  
• fit  
• predict  
• and evaluate  
• Initialise weights randomly - or by some insight into the relative importance of hyperparameters.  
• loop till convergence  
• compute gradient - (derivative)  
• update weights  
• return weights  
In practical terms the aim is to minimise the error. We can describe the error as the absolute difference between the prediction and the results.  

```
error = ((input * weight) - goal.pred) **2
```

The weight determines the significance of the input, for example in calculating the number of cases of COVID19 we might look at a datapoint such as age and give that factor a weighting.  




In [9]:
weight = 0.7
def neural_network(input, weight):
    prediction = input * weight
    return prediction

age_of_person = [3, 13, 23, 33, 43, 53, 63, 73, 83, 93]

for i in age_of_person:
  pred = neural_network(i,weight)
  print(pred)

2.0999999999999996
9.1
16.099999999999998
23.099999999999998
30.099999999999998
37.099999999999994
44.099999999999994
51.099999999999994
58.099999999999994
65.1



The example above gives a result of 2.1 where the input is the first element in the array, 9.1 for the second, and 65.1 for the last.  
This shows the prediction rises with an increase in the input factor- age.
We can test our predictions by comparing the predictions against labelled data.









In [23]:
import tensorflow as tf
import numpy as np
from tensorflow.keras import Sequential
from tensorflow.keras.layers import Dense

Covid_prediction_model = Dense(units=1, input_shape=[1])
model = Sequential([Covid_prediction_model])
model.compile(optimizer='sgd', loss='mean_squared_error')

age = np.array([3, 13, 23, 43, 53, 63, 73, 83, 93], dtype=float)
label = np.array([1, 1, 2, 30, 40, 60, 80, 85, 95], dtype=float)

model.fit(age, label, epochs=5)

print(model.predict([25]))
print("Here is what I learned: {}".format(Covid_prediction_model.get_weights()))

Epoch 1/5
Epoch 2/5
Epoch 3/5
Epoch 4/5
Epoch 5/5
[[-9.640592e+08]]
Here is what I learned: [array([[-38539550.]], dtype=float32), array([-570462.94], dtype=float32)]


We can gauge how accurate our predictions are by comparing the prediction with a known result or label.  
This approach is linear, and doesn't benefit from additional layers, which could practically be collapsed into a single layer (H. Li, Ouyang, & Wang, 2016).  
In order to benefit from multiple layers, we need to add an activation function.  
This maps back to the neural structure of biological systems where synaptic inputs are expressed or repressed (Hawkins, 2005) to determine their activation.
We can use a range of activation functions, such as RELU - rectified linear units which only activates if the input is above zero.
The result is to provide an inflexion point where a threshold must be crossed.  
For example, if we train a simple network with the MNIST dataset by using activation functions we can benefit from additional hidden layers:   
