
---

Lecture 5
=======

Outline
---------

### I Introduction to Keras
### II Prominent Machine Learning problem classes revisited in Keras
> A. MNIST in DenseNet
> 
> B. Binary Classsification
>
> C. Multiclass Single Label Classification
>
> D. Scalar Regression
>
> E. Regularization in Keras

### III Introduction to ConvNets
> A. Structures and motivations behind the ConvNet architecture
>
> B. The use case: MNIST

---


---

### Sidekick: Pójdźże, kiń tę chmurność w głąb flaszy!

***


---
### 1. The first four lectures covered fundamental and universal concepts of Deep Neural Networks
### 2. The second part of the course will take mostly pragmatic approach
### 3. We would like to move towards high level applications and cover major advanced Deep Learning architectures and domains:
> A. Computer Vision: ConvNets
> 
> B. Texts and sequences: RNN
>
> C. Generative Deep Learning: LSTM, VAE, GAN, DeepDream, Neural Style Transfer
>
> D. Contemporary topics, like Reinforcement Learning, Attention, ...
### 4. We are changing PyTorch for higher level framework: Keras
### 5. We shall think of using Keras as a tool and cover relevant mathematics whenever possible
---

---

I Introduction to Keras
-----------------------

### A. Installation: 

> 1. conda install -c anaconda keras-gpu 
>
>or
>
> 2. pip install keras

### B. Importing submodules to notebook:
> from keras import ...

### C. What is it?
> Keras is a high level Deep Learning framework based on TensorFlow and supporting GPU
>
> Designed for rapid prototyping in AI research, took leading role on e.g. Kaggle
>
> It offers two development modes: 
>> Out of the box models and layers
>>
>> Low level functional API for advanced use 
> 
### D. Let us reconsider a few of the major Machine Learning problem classes using Keras

---



---

II A: MNIST digits classification using fully connected layers
---------------------

### The problem: classify handwritten digits
> ### What kind of a problem is it? 
>
>>10-class single label image classification
> ### What is the dataset? 
>
>> 60000 (training+validation) + 10000 (testing) grayscale pictures stored in 28 x 28 x 1 3D tensors
>
> ### What is the hypothesis?
>> Fully connected layers, vectorized input data and 10 softmax outputs
>
> ### Why do we consider this case?
>> To show off Keras simplicity and superiority of Convolutional Networks behave in later comparison

---



---

II B: IMDB dataset
---------------------

### The problem: determine if movies reviews are positive or negative
> ### What kind of a problem is it? 
>
>> Binary variable length text sentiment classification
> ### What is the dataset? 
>
>> 25000 (training+validation) + 25000 (testing) multihot, vectorized, english reviews
>
> ### What is the hypothesis?
>> Fully connected layers with sigmoid output and binary_crossentropy
>
> ### Why do we consider this case?
>> To show text data vectorization and dense layers binary classification

---



---

II C: The Reuters dataset
---------------------

### The problem: classify short newswires on various topics
> ### What kind of a problem is it? 
>
>> Variable length text topic classification into 46 excluding categories
> ### What is the dataset? 
>
>> 8982 (training+validation) + 2246 (testing) multihot, vectorized, english reviews
>
> ### What is the hypothesis?
>> Fully connected layers with 46 softmax outputs and categorical_crossentropy
>
> ### Why do we consider this case?
>> To show text data vectorization and dense layers multilabel classification

---



---

II D: Boston Housing Price Dataset
---------------------

### The problem: predict real estate value
> ### What kind of a problem is it? 
>
>> Scalar regression
> ### What is the dataset? 
>
>> Small set of 404 (training+validation) and 102 (testing) 13-feature numerical vectors
>
> ### What is the hypothesis?
>> Fully connected layers with linear output and mse loss
>
> ### Why do we consider this case?
>> To show small dataset regression with k-fold validation and dataset normalization

---



---

II E: IMDB dataset and Regularization
---------------------

### The problem: DenseNet Overfitting!
>
> ### Why are we reconsidering IMDB case?
>> To show regularization and dropout in Keras

---


---

III Introducing Convolutional Networks
----------------

### This Lecture: an intuitive and practical introduction, formal description left for the next week

### A: New (old) Convolutional Networks: structures and motivations behind the ConvNet architecture:
> Inspired by visual signals processing by the brain
>
> Sparse architecture: radically reduced spato-temporal complexity
>
> Locality: focus on local, granular structures
>
> Capable of developing invariances (w/r to translations, rotations, etc.)
>
> Capable of capturing hierarchies of features and complex strucutres
>
> Less prone to overfitting (yet still!)
>
> Not only for visual data! E.g. text, sound.


---

---


### A. The idea: hierarchical image decomposition preserving local spatial correlations:

![Image](gfx/cnn-cat-assembly.jpg "CNN Cat Assembly")

### B. In practise, introduce local receptive fields: patches observing small portions of the feature map

![Image](gfx/cnn-action.jpg "CNN Action")

### C. Basic ConvNet stack for images

> ### 1. Start from 3D (not 1D) input image tensor (Height, Width, Channels): (28, 28, 1) or (H, W, 3) for RGB
>
> ### 2. Introduce new elements: many small "convolutional kernels" defining local trainable feature filters, typically shaped (3, 3) or (5, 5). There can be great many of them in the given layer.
>
> ### 3. Scan each pixel of the image with kernels and produce corresponding new output map pixel channels
>
> ### 4. Downsample image by MaxPooling: select the most prominent local features
> 
> ### 5. Repeat layer stack as many times as needed
>
> ### 6. Introduce Dense layers and output to construct classifier mapping to cathegories

### D. Convolution action in detail (no pooling: HxW size from valid pixels)

![Image](gfx/cnn-conv-layer-decomposition.png "CNN Conv Layer Decomposition")

### E. Local patches: valid convolution points

![Image](gfx/cnn-conv-padding.png "CNN Valid Convolution")

### F. Padding for boundary effects mitigation with (5, 5) kernel

![Image](gfx/cnn-conv-padding2.png "CNN Padding")

### G. Receptive field action: same size, filtered output

![Image](gfx/cnn-receptive-field.jpg "CNN receptive field")

### H. Strides: 
> ### Downsampling, used with pooling and sometimes with kernels
>
> ### This step is crucial for reducing network parameters count and allowing for wide scope of the deep receptive fields: 'hyperbolic' vs linear structure of the network (like AdS space!)
>
> ### Translation invariance and global correlations appear due to the locally maximal feature selection

![Image](gfx/cnn-conv-stride.png "CNN Stride - Downsampling")

### I. Cat image activations in a deeper (4th) layer
![Image](gfx/cnn-4th-layer-activations.png "4th layer activations")

---

---

### B: MNIST use case
> ### This network reduces error by over 50% relative to the DenseNet considered earler

---