<center><h1>Very Brief Introduction to Deep Learning</h1></center>
<center><h3>Paul Stey</h3></center>


# Impact of Deep Learning

It would be difficult the overstate the impact.
  * Facial recognition
  * Image processing
  * Voice recognition 
  * Medical imaging (e.g., tumor detection, pathology)
  * Self-driving cars
  * Game-playing AIs
  * Virtual assistants (e.g., Siri, Alexa, Google)

# Nomenclature

Deep learning is a family modeling approaches with many names:

  * Neural networks (NN)
  * Deep neural networks (DNN)
  * Artificial neural networks (ANN)

## Neural Network Basics


What is a neural network?
  * Universal function approximator
  * A species of directed acyclic graphs (usually)
		

## What do neural networks do?

Like many other statistical or machine learning models (e.g., GLM, random forests, boosting), neural networks:
  * Attempt to approximate a data-generating mechanism
  * Can be used for classification problems
  * Can be used for regression problems
  * Can also be used for dimension reduction like principal components analysis (PCA)


## Neural Networks vs. other ML Modeling

Similarities to other types of machine learning models

  * Input variables (i.e., _**X**_, features, predictors, etc.) and output variable (i.e., _y_)

  
<center><img src="images/input_output.png" width=420/></center>

## Applications of Deep Learning
Deep learning is extremely flexible, and can be applied to many domains.
  
<center><img src="images/self-driving_car.jpg" width=860/></center>

## History of Neural Networks

The history of neural networks is long and somewhat tumultuous

  * McCulloch and Pitts (1943) _A Logical Calculus of Ideas Immanent in Nervous Activity_
  * Rosenblatt (1958) _The Perceptron: A Probabilistic Model For Information Storage And Organization In The Brain_

### More Recently

Neural networks are experiencing a major resurgence. There are at least three reasons.

  * Better algorithms for back-propagation
  * GPUs are well suited to building neural networks
    - Matrix multiplies can be made embarrassingly parallel 
    - GPUs have much better memory bandwidth
  * More labeled data
  
  
<center><img src="images/two_johns.jpg" width=380/></center>

# Multi-Layer Perceptron

An early and fairly straightforward example of a neural network.

<center><img src="images/neural_network3.png" width = 420/></center>

## Single Neuron

A single neuron takes inputs, $x_j$, and applies the weights, $w_{\cdot j}$ to the input by computing the dot product of the vectors $x$ and $w$. The result is the input to the "activation function".

<center><img src="images/neuron2.png" width = 420/></center>

# Multi-Layer Perceptron

Larger networks can have many, _many_ weights!
  * Origin of the term "deep" neural networks 
  * Largest models have _trillions_ of weights (i.e., parameters)

<center><img src="images/neural_network4.png" width = 420/></center>


### Activation Functions
  * The notion of an activation function comes again from the theoretical relationship to neurons in the brain.

  * Activation functions are analogous to "link" functions in generalized linear models (GLMs). 
    
  * In fact, one common activation function is the sigmoid function, which is just our old friend the logistic function which you are using when you fit logistic regression models.

### Purpose of Activation Functions

There are a few reasons we use activation functions.    

  * Need to take some linear predictor and transform it so that it is bounded appropriate. For instance, the value of logistic function is in the range $(0, 1)$. 
  * Allows us to introduce non-linearities. 
    - Approximate a data-generating mechanism 
    - Trying to approximate a function that might be very complicated and include non-linearities

### Common Activation Functions

Some common activation functions include the following: 
  * Sigmoid (i.e., logistic)
  * Hyperbolic tangent: $tanh$
  * Rectified linear unit (ReLU)
  * softplus
  
<center><img src="images/activation_functions.png" width = 420/></center>

<center><h1>Challenge Question</h1></center>

The sigmoid and the ReLU activation functions are two of the most common in deep learning. The formulas for these are below. Write a `sigmoid()` and a `relu()` function in Python that implements these.

$$s(x) = \frac{1}{1 + e^{-x}}$$

$$r(x) = \text{max}(0, x)$$

<br>
<br>

**Hint:** Note that the NumPy module has the `e` constant included as a  part of the module.


# Varieties of Neural Network (and layers)

1. The "feed-forward" layer/network
  * Mult-layer perceptron is a feed-foward network
  * Most networks involve at least _some_ feed-foward layer
2. Convolutional neural network (CNN)
  * Ubiquitous in computer vision (i.e., image classification, object detection, facial recognition)
3. Recurrent neural networks (RNN)
  * Long short-term memory (LSTM) networks
4. Generative adversarial network (GAN)
  * Widely used in game-playing AI
5. Autoencoders
  

# Convolutional Neural Networks (CNNs)

* Regular neural nets don't scale well to images
  - For images of size $32 \times 32 \times 3$, a _single_ fully-connected neuron in the first layer would have $3072$ weights.
  - Images of size $200 \times 200 \times 3$, a _single_ neuron gives $120000$ weights.
* Full connectivity is wasteful and the huge number of parameters would quickly lead to overfitting.

## CNNs (cont.)
What are CNNs?
  * ConvNets are very similar to neural networks discussed thus far. Dot product, followed by non-linearity, and loss function at the end.
  * Explicit assumption that input are images.
  * Layers have neurons arranged in 3 dimensions (width, height, depth) to form an **activation volume**
  
<center><img src="images/cnn.png" width="750"></center>

<center><img src="images/convolution.gif" width="750"></center>

### Orginal Image
<center><img src="images/building.jpg" width="750"></center>

### Apply Sobel operator filter

<center><img src="images/building_sobel.jpg" width="750"></center>

## Architecture of CNN
	

Types of layers used to build ConvNets
  * Convolutional Layer
    - Input: 3-d volume
    - Output: 3-d volume
    - Convolutional "filters" with small regions in the image
    - Output depth, depends on the number of filters
  * Pooling Layer
    - Downsampling along spatial dimensions (width, height)
  * Fully-Connected Layer (what we've seen so far)
    - Compute class score. Dimensions are transformed to $1 \times 1 \times k$, where $k$ is number of classes 

# Training vs. Inference

1. Training neural network
  * Process that computes weights (i.e., parameter estimates)
  * Can take hours, days, weeks, or months
  * Typically done on specialized hardware
    - GPUs, TPUs, FPGAs
2. Inference
  * Use existing network (i.e., weights)
  * Make predictions (i.e., classification, or numerical prediction)
  * Happens fast; in many cases _extemely_ fast (e.g., milliseconds)
  * Needs to happen on all kinds of devices (e.g., phones, cameras, sensors)

# Deep Learning Packages

1. TensorFlow
  * Free, open-source software
  * Primarily developed by Google
  * C++ library, callable from C++, Python, or R
2. Keras
  * "Front-end" API for TensorFlow
3. PyTorch
  * Free, open-source software
  * Developed in large part by Facebook
  * C++/Cython library callable from Python
