# Deep Learning - Nasir Hussain - 2021/02/07

# 1 What is deep learning?

## 1.1 Artificial intelligence, machine learning, and deep learning

![Artificial intelligence, machine learning, and deep learning](./snaps/one.PNG)

### 1.1.1 Artificial intelligence
- AI can be described as the effort to automate intellectual tasks normally performed by humans.
    - Symbolic AI
    - Machine Learning

### 1.1.2 Machine learning
- Analytical Engine: the first-known general-purpose mechanical computer
- The Analytical Engine has no pretensions whatever to originate anything. It can do whatever we know how to order it to perform. . . . Its province is to assist us in making available what we’re already acquainted with.
- Turing test
- the machine looks at the input data and the corresponding answers, and figures out what the rules should be
- A machine learning system is trained rather than explicitly programmed.
- When dealing with large, complex datasets - classical statistical analysis such as Bayesian analysis would be impractical

![Machine learning: a new programming paradigm](./snaps/two.PNG)

### 1.1.3 Learning rules and representations from data
- deep learning
- what machine learning algorithms do
- Requirements for machine learning
    - Input data points
        - For instance, if the task is speech recognition, these data points could be sound files of people speaking. If the task is image tagging, they could be pictures.
    - Examples of the expected output
        - In a speech-recognition task, these could be human-generated transcripts of sound files. In an image task, expected outputs could be tags such as “dog,” “cat,” and so on.
    - A way to measure whether the algorithm is doing a good job
        - This is necessary in order to determine the distance between the algorithm’s current output and its expected output. The measurement is used as a feedback signal to adjust the way the algorithm works. This adjustment step is what we call learning.
- machine learning model transforms its input data into meaningful outputs
- the central problem in machine learning and deep learning is to meaningfully transform data
    - representations
        - a different way to look at data
        - to represent or encode data.
    - Machine learning models are all about finding appropriate representations for their input data
- Learning
    - in the context of machine learning, describes an automatic search process for data transformations that produce useful representations
- Machine learning algorithms aren’t usually creative in finding these transformations; they’re merely searching through a predefined set of operations, called a hypothesis space.
- searching for useful representations and rules over some input data, within a predefined space of possibilities, using guidance from a feedback signal

### 1.1.4 The “deep” in “deep learning”
- successive layers
- “deep” in “deep learning”
    - stands for this idea of successive layers of representations
- layered representations learning or hierarchical representations learning
- shallow learning
- neural networks
- deep learning is a mathematical framework for learning representations from data
- purified

### 1.1.5 Understanding how deep learning works, in three figures
- layer does to its input data is stored in the layer’s weights
- transformation
    - parameterized by its weights
- learning means finding a set of values for the weights of all layers in a network
- loss function / objective function / cost function.
- optimizer / Backpropagation 
- training loop

![The loss score is used as a feedback signal to adjust the weights.](./snaps/three.PNG)

### 1.1.6 What deep learning has achieved so far
- Near-human-level image classification
- Near-human-level speech transcription
- Near-human-level handwriting transcription
- Dramatically improved machine translation
- Dramatically improved text-to-speech conversion
- Digital assistants such as Google Assistant and Amazon Alexa
- Near-human-level autonomous driving
- Improved ad targeting, as used by Google, Baidu, or Bing
- Improved search results on the web
- Ability to answer natural language questions
- Superhuman Go playing

### 1.1.7 Don’t believe the short-term hype

### 1.1.8 The promise of AI

## 1.2 Before deep learning: A brief history of machine learning

- Deep learning isn’t always the right tool for the job—sometimes there isn’t enough data for deep learning to be applicable, and sometimes the problem is better solved by a different algorithm

### 1.2.1 Probabilistic modeling
- Probabilistic modeling
- Naive Bayes algorithm
- assuming that the features in the input data are all independent
- logistic regression
-  classification algorithm rather than a regression algorithm

### 1.2.2 Early neural networks
- Backpropagation algorithm—a way to train chains of parametric operations using gradient-descent optimization

### 1.2.3 Kernel methods
- Kernel methods are a group of classification algorithms, the best known of which is the Support Vector Machine (SVM).
- SVM is a classification algorithm that works by finding “decision boundaries” separating two classes
    1. The data is mapped to a new high-dimensional representation where the decision boundary can be expressed as a hyperplane
    2. A good decision boundary (a separation hyperplane) is computed by trying to maximize the distance between the hyperplane and the closest data points from each class, a step called maximizing the margin. This allows the boundary to generalize well to new samples outside of the training dataset
-  A kernel function is a computationally tractable operation that maps any two points in your initial space to the distance between these points in your target representation space, completely bypassing the explicit computation of the new representa

### 1.2.4 Decision trees, random forests, and gradient boosting machines
- Decision trees are flowchart-like structures
- Random Forest algorithm
- gradient boosting machine

![decision tree](./snaps/four.PNG)

### 1.2.5 Back to neural networks
- "Top-five accuracy" measures how often the model selects the correct answer as part of its top five guesses (out of 1,000 possible answers, in the case of ImageNet).

### 1.2.6 What makes deep learning different
- deep learning
- automates
- feature engineering

### 1.2.7 The modern machine learning landscape
- gradient boosted trees, for shallow-learning problems
- deep learning, for perceptual problems

## 1.3 Why deep learning? Why now?
### 1.3.1 Hardware
### 1.3.2 Data
### 1.3.3 Algorithms
- Better activation functions for neural layers
- Better weight-initialization schemes, starting with layer-wise pretraining, which was then quickly abandoned
- Better optimization schemes, such as RMSProp and Adam
### 1.3.4 A new wave of investment
### 1.3.5 The democratization of deep learning
### 1.3.6 Will it last?
- Simplicity
    - Deep learning removes the need for feature engineering, replacing complex, brittle, engineering-heavy pipelines with simple, end-to-end trainable models that are typically built using only five or six different tensor operations.
- Scalability
    - Deep learning is highly amenable to parallelization on GPUs or TPUs, so it can take full advantage of Moore’s law. In addition, deep learning models are trained by iterating over small batches of data, allowing them to be trained on datasets of arbitrary size. (The only bottleneck is the amount of parallel computational power available, which, thanks to Moore’s law, is a fast-moving barrier.)
- Versatility and reusability
    - Unlike many prior machine learning approaches, deep learning models can be trained on additional data without restarting from scratch, making them viable for continuous online learning—an important property for very large production models. Furthermore, trained deep learning models are repurposable and thus reusable: for instance, it’s possible to take a deep learning model trained for image classification and drop it into a video-processing pipeline. This allows us to reinvest previous work into increasingly complex and powerful models. This also makes deep learning applicable to fairly small datasets.

### END