# Today

# Attendance

[The link to SEAts](https://goldsmithscollege.sharepoint.com/sites/intranet-supporting-students/SitePages/SEAtS--QR-codes-for-check-in.aspx?utm_campaign=SEAtS%20reminder%2027.09.22&utm_source=emailCampaign&utm_content=&utm_medium=email)

----

## 9.110 Various approaches to AI

*Artificial Intelligence* is the automation of thought: from spreadsheets to humanoid robots

*Machine Learning*, a subfield of AI 

= automated thought by training programs by exposure to data 

*Deep Learning* is one of many branches of ML

DL models are long chains of geometric transformations

Implemented as neural networks



The operations are structured into modules called *layers*

DL models are graphs of layers 

DL layers are parameterised by *weights* (and biases)

Weights change during training - learning

----

## 9.120 Why DL is special

DL, in only a few years, has achieved a range of tasks that were previously considered to be very difficult for computers:
- machine perception
- information extraction from images, videos and sound
- speech recognition
- smart assistants
- machine translation
- many more

Some practitioners (e.g. F. Chollet) claim that given enough human-annotated training data, DL models can be equivalent to humans in these kind of tasks

Some claim that 'perception has been solved'

We are in the middle of a period of intense interest and hype - an AI 'summer'

DL stands to transform many industries and businesses *even if no further progress is made*

----

## 9.130 How to think about DL

DL - it's not that hard to understand

DL models apply a series of simple geometric transformations to vectorised data

The total of the chain of simple transformations is a complex transformation from an input space to a target space

The transformations must be *differentiable*

The layer parameters that determine transformations change during training in order to lower a loss function 


This loss must also be differentiable

Premise: meaning can be derived by smooth mappings between vector spaces

But meaning can be represented in other ways, for example by graphs

The representation of meaning by graphs was the original stimulus for *connectionism* and was realised in artificial neural networks

However ANN's bear little resemblance to the brain 

DL experts talk about *layered representations*, *hierarchical representation learning* and *deep differentiable models of chained geometric transforms* 

Despite this jargonised, abstract viewpoint - promoted by DLWP - deep learning concepts and implementations derive from (artificial) neural networks

Always good to have several ways of thinking about the same thing

Especially if these ways are complementary

*Artificial neurons sending signals along connecting wires*

----

## 9.140 Key enabling technologies

- Algorithms: incremental innovations, especially since 2012
- Big data: Large perceptual data bases


- GPU: fast parallel GPU computation: the NVIDIA gaming GPU was particularly important
- Open source/public libraries/APIs: CUDA, TensorFlow, PyTorch, Numpy...

----

## 9.150 The DL workflow

-  Define the problem: 
    - what data is available, 
    - what are you trying to predict?


-  Identify a measure of success 
    -  e.g. the prediction accuracy
-  Define an evaluation procedure 
    -  training and validation sets and a reserved test set

-  Vectorise and if necessary normalise data
-  Develop a baseline model that beats a common sense baseline
-  Tune hyper-parameters according to validation performance 
-  Overfit and regularise/downsize

----

## 9.160 Key network architectures

Three families: dense, convolutional and recurrent

Each type matches a particular task and encodes assumptions about the structure of the data i.e. a hypothesis space

The network types can be combined

- General data - fully connected (dense) layers
- Image data - 2D convnets
- Text data - 1D convnets (preferred) or RNNs

- Time series data - RNNs (preferred) or 1D convnets
- Other types of sequence data - RNNs or convnets
    - Prefer RNNs if the order is important (as in time series) and convnets otherwise (for example, text).

- Video data - 3D convnets for motion or 2D convnets for frame level feature extraction followed by an RNN or a convnet to process the resulting sequence
- Volumetric data - 3D convnets

### Densely connected networks

A stack of dense layers

No specific structure in the data is assumed. (Unlike, for example, 2D convolutional layers which assume *local* structure.)

Commonly used for categorical data (e.g. the input features are lists of attributes such as the Boston Housing dataset.)

Also used as the final classification or regression stage of most other networks

#### Binary classification

The final layer is a single sigmoid unit. 

The model is trained with the binary cross-entropy loss function

In [None]:
model = models.Sequential()
model.add(layers.Dense(32, activation = 'relu', 
                       input_shape = (num_input_features, )))
model.add(layers.Dense(32, activation = 'relu'))
model.add(layers.Dense(1, activation = 'sigmoid'))

model.compile(optimizer = 'rmsprop', loss = 'binary_crossentropy')


##### Single label multi-class classification

Number of final dense layer units = number of classes

Softmax activation

Categorical cross-entropy loss

In [None]:
model = models.Sequential()
model.add(layers.Dense(32, activation = 'relu', 
                       input_shape = (num_input_features, )))
model.add(layers.Dense(32, activation = 'relu'))
model.add(layers.Dense(num_classes, activation = 'softmax'))

model.compile(optimizer = 'rmsprop', loss = 'categorical_crossentropy')


##### Multi-label multi-class classification

Final layer of sigmoid units

Binary cross-entropy
   
'num_classes' copies of binary classification

In [None]:
model = models.Sequential()
model.add(layers.Dense(32, activation = 'relu', 
                       input_shape = (num_input_features, )))
model.add(layers.Dense(32, activation = 'relu'))
model.add(layers.Dense(num_classes, activation = 'sigmoid'))

model.compile(optimizer = 'rmsprop', loss = 'binary_crossentropy')


#### Regression

Number of final units = number of values

Final units have no activation (any continuous value is possible)

Loss is commonly MSE, but could be MAE

In [None]:
model = models.Sequential()
model.add(layers.Dense(32, activation = 'relu', 
                       input_shape = (num_input_features, )))
model.add(layers.Dense(32, activation = 'relu'))
model.add(layers.Dense(num_values))

model.compile(optimizer = 'rmsprop', loss = 'mse')

### Convnets

The same geometric transformation is applied to different spatial locations (local receptive fields/patches)

Resulting representations are translation invariant

Applicable in any dimensionality

Can be used for sequence processing if the sequence is translation invariant (not time series, but e.g. text)

Convnet architecure:
- Stacks of convolutional and pooling layers
- Pooling layers downsize and increase the spatial extent of the filter

- End the stack with a flattening or a global pooling layer in order to turn the feature maps into vectors
- Then add a dense classifier

Models with depthwise convolutions (these perform spatial convolutions separately on each channel) are smaller, quicker to train and, it seems, better 

In [None]:
model - models.Sequential()
model.add(layers.SeparableConv2D(32, 
                                 3, 
                                 activation = 'relu', 
                                 input_shape = (height, width, channels)))
model.addlayers.SeparableConv2D(64, 3, activation = 'relu')
model.add(layers.MaxPooling2D(2)) # i.e. pool_size = (2, 2)

model.addlayers.SeparableConv2D(64, 3, activation = 'relu')
model.addlayers.SeparableConv2D(128, 3, activation = 'relu')
model.add(layers.MaxPooling2D(2)) 

model.addlayers.SeparableConv2D(64, 3, activation = 'relu')
model.addlayers.SeparableConv2D(128, 3, activation = 'relu')
model.add(layers.GlobalAveragePooling2D()) 

model.add(layers.Dense(32, activation = 'relu'))
model.add(layers.Dense(num_classes, activation = 'softmax'))

model.compile(optimizer = 'rmsprop', loss = 'categorical_crossentropy')
          
        

### RNNs

Process sequences one step at a time

They maintain a 'state' - the previous output

Used for sequences without translation invariance e.g. for time series where the recent past is more informative than the distant past

`tensorflow.keras` has three RNN layers: `SimpleRNN`, `GRU` and `LSTM`

Normally avoid `SimpleRNN`

LSTM is more powerful than the GRU, but is computationally more exensive 

Interior layers in RNN stacks should return the full sequence of outputs 

The final RNN layer commonly return only the last output which contains information about the whole sequence

In [None]:
model = models.Sequential()
model.add(layers.LSTM(32, 
                      return_sequences = True, 
                      input_shape = (num_timesteps, num_features)))
model.add(layers.LSTM(32, return_sequences = True))
model.add(layers.LSTM(32))

model.add(layers.Dense(num_classes, activation = 'sigmoid')) # multi-label

model. compile(optimizer = 'rmsprop', loss = 'binary_crossentropy')
          

----

## 9.170 Possibilites

All of the following are learnable even if generalisation might be impossible.


| Input | Output | Task | Example |
|---|---|---|---|
| category | category | predictive healthcare | medical records -> prediction of treatment outcome  |
| category | continuous value |  behaviour | website attributes -> time spent on website |
| " | " | quality control | product attributes -> probability of failure | 
| image | category | Doctor assistant | medical slides -> diagnosis |
| " | " | self-driving vehicle | dash-cam -> steering commands |
| " | " | board game | Go, chess images -> next move |
| " | " | age prediction | selfies -> age |
| image | continuous value | Diet helper | image of dish -> calorie count |
| time series | category | weather | weather data in a grid of locations -> prediction at a single location |
| " | " | brain-computer interface | EEG -> computer commands |
| text | text | smart reply | emails -> one-lne replies |
| " | " | answering questions | general knowledge questions -> answers |
| " | " | summarising | long article -> short summary |
| images | text | captioning | image -> short caption
| text | images | conditioned image generation | short text -> matching image |
| " | " | logo generation/selection | name and company description -> logo |
| images | images | super-resolution | downsized inmages -> hi res versions |
| " | " | visual depth sensing | image of indoor environment -> depth map |
| images and text | text | visual QA | images and questions -> answers |
| video and text | text | video QA | short videos and questions about the contents -> answers |







