<img src ="images/ch3_0.png">

- This chapter is designed to get you started with **using neural networks to solve real problems**. 
- You’ll consolidate the knowledge you gained from our first practical example in chapter 2
- You’ll apply what you’ve learned to three new problems covering **the three most common use cases of neural networks**: 
   1. binary classification
   1. multiclass classification
   1. scalar regression.

- We will take a closer look at the core components of neural networks: **layers, networks, objective functions, and optimizers**. 
- We’ll give you a quick introduction to Keras, the Python deep-learning library that we’ll use throughout this course. 
- We’ll dive into three introductory examples of how to use neural networks to address real problems:

    1). Classifying movie reviews as positive or negative (binary classification)
    2). Classifying news wires by topic (multiclass classification)
    3). Estimating the price of a house, given real-estate data (regression)

By the end of this chapter, you’ll be able to use neural networks to solve simple machine problems such as classification and regression over vector data. You’ll then be ready to start building a more principled, theory-driven understanding of machine learning in chapter 4.

# 3.1. Anatomy of a neural network

As you saw in the previous chapters, training a neural network revolves around the following objects:
- **Layers**, which are combined into a network (or model)
- **The input data** and corresponding **targets**
- **The loss function**, which defines the feedback signal used for learning
- **The optimizer**, which determines how learning proceeds

You can visualize their interaction as illustrated in figure 3.1: the network, composed of layers that are chained together, maps the input data to predictions. The loss function then compares these predictions to the targets, producing a loss value: a measure of how well the network’s predictions match what was expected. The optimizer uses this loss value to update the network’s weights.

<img src = "images/Fig3-1.png">

Let’s take a closer look at layers, networks, loss functions, and optimizers.

## 3.1.1. Layers: the building blocks of deep learning

- The fundamental data structure in neural networks is the layer. 
- **A layer** is a data-processing module that takes as input one or more tensors and that outputs one or more tensors. 
- Some layers are stateless, but **more frequently layers have a state: the layer’s weights**, one or several tensors learned with stochastic gradient descent, which together contain the network’s knowledge.

@ Examples of Stateless layers (No weigths) : Flatten, Pooling, and Dropout layers

- Different layers are appropriate for different tensor formats and different types of data processing. 
- For instance, **simple vector data**, stored in **2D tensors of shape (samples, features)**, is often processed by densely connected layers, also called **fully connected** or **dense layers** (the Dense class in Keras). 
- **Sequence data**, stored in **3D tensors of shape (samples, timesteps, features)**, is typically processed by **recurrent layers** such as an **LSTM layer**. 
- Image data, stored in **4D tensors**, is usually processed by **2D convolution layers (Conv2D)**.

- You can think of layers as the LEGO bricks of deep learning, a metaphor that is made explicit by frameworks like Keras. 
- Building deep-learning models in Keras is done by **clipping together compatible layers to form useful data-transformation pipelines** (케라스에서는 호환 가능한 층들을 엮어 데이터 변환 pipline을 구성함). 
- The notion of **layer compatibility** here refers specifically to the fact that every layer will **only accept input tensors of a certain shape and will return output tensors of a certain shape**. 

Consider the following example:

In [1]:
from keras import layers

layer = layers.Dense(32, input_shape=(784,))

Using TensorFlow backend.


- We’re creating a layer that will only accept as input 2D tensors where the first dimension is 784 (**axis 0, the batch dimension, is unspecified, and thus any value would be accepted**). 
- This layer will return a tensor where the first dimension has been transformed to be 32.

- Thus this layer can only be connected to a downstream layer that expects 32-dimensional vectors as its input. 
- When using Keras, you don’t have to worry about compatibility, because the layers you add to your models are dynamically built to match the shape of the incoming layer. 

For instance, suppose you write the following:

In [2]:
from keras import models
from keras import layers

model = models.Sequential()
model.add(layers.Dense(32, input_shape=(784,)))
model.add(layers.Dense(32))

**The second layer didn’t receive an input shape argument**—instead, it automatically inferred its input shape as being the output shape of the layer that came before.

## 3.1.2. Models: networks of layers

- A deep-learning model is a **directed, acyclic graph (DAG)** of layers. 
- The most common instance is a linear stack of layers, mapping a single input to a single output.

But as you move forward, you’ll be exposed to a much broader variety of network topologies. Some common ones include the following:

- Two-branch networks
- Multihead networks
- Inception blocks

- The topology of a network defines a **hypothesis space**. 
- Defined machine learning "**Searching for useful representations of some input data, within a predefined space of possibilities, using guidance from a feedback signal**". (가능성 있는 공간을 사전에 정의하고 피드백 신호의 도움을 받아 입력 데이터에 대한 유용한 변환을 찾는 것)
- By choosing a network topology, you **constrain your space of possibilities (hypothesis space)** to a specific series of tensor operations, mapping input data to output data. 
- What you’ll then be searching for is **a good set of values for the weight tensors** involved in these tensor operations.

- Picking the right network architecture is more an art than a science (No theoretical proof) 
- Although there are some best practices and principles you can rely on, only practice can help you become a proper neural-network architect (Know-how). 

The next few chapters will both teach you explicit principles for building neural networks and help you develop intuition as to what works or doesn’t work for specific problems.

## 3.1.3. Loss functions and optimizers: keys to configuring the learning process

Once the network architecture is defined, you still have to choose two more things:

- **Loss function (objective function)**— The quantity that will be minimized during training. It represents a measure of success for the task at hand.
- **Optimizer**— Determines how the network will be updated based on the loss function. It implements a specific variant of stochastic gradient descent (SGD).

### Multiloss networks:
- A neural network that has multiple outputs may have **multiple loss functions (one per output)**. 
- But the gradient-descent process must be based on **a single scalar loss value**; 
- So, for multiloss networks, all losses are combined (via averaging) into **a single scalar quantity**.

### Objective function (Loss function):

- **Choosing the right objective function for the right problem is extremely important**: your network will take any shortcut it can, to minimize the loss; so if the objective doesn’t fully correlate with success for the task at hand, your network will end up doing things you may not have wanted. 
- **Just remember that all neural networks you build will be just as ruthless in lowering their loss function**—so choose the objective wisely, or you’ll have to face unintended side effects.

- Fortunately, when it comes to common problems such as classification, regression, and sequence prediction, there are simple guidelines you can follow to choose the correct loss. 
- For instance, you’ll use 
  1. **Binary crossentropy** for a two-class classification problem, 
  1. **Categorical crossentropy** for a many-class classification problem,
  1. **Mean-squared error** for a regression problem, 
  1. **Connectionist temporal classification (CTC)** for a sequence-learning problem, and so on. 


- Only when you’re working on truly new research problems will you have to develop your own objective functions. 
- In the next few chapters, we’ll detail explicitly which loss functions to choose for a wide range of common tasks.

# 3.2. Introduction to Keras

Keras (https://keras.io) is a deep-learning framework for Python that provides a convenient way to define and train almost any kind of deep-learning model. 

Keras was initially developed for researchers, with the aim of enabling fast experimentation.

### Key features of Keras:

- It allows the same code to run seamlessly on CPU or GPU.
- It has a user-friendly API that makes it easy to quickly prototype deep-learning models.
- It has built-in support for convolutional networks (for computer vision), recurrent networks (for sequence processing), and any combination of both.
- It supports arbitrary network architectures: multi-input or multi-output models, layer sharing, model sharing, and so on. This means Keras is appropriate for building essentially any deep-learning model, from a generative adversarial network to a neural Turing machine.

Keras is distributed under the permissive MIT license, which means it can be freely used in commercial projects. It’s compatible with any version of Python from 2.7 to 3.6 (as of mid-2017).

Keras has well over 200,000 users, ranging from academic researchers and engineers at both startups and large companies to graduate students and hobbyists. 

Keras is used at Google, Netflix, Uber, CERN, Yelp, Square, and hundreds of startups working on a wide range of problems. 

Keras is also a popular framework on Kaggle, the machine-learning competition website, where almost every recent deep-learning competition has been won using Keras models.

<img src = "images/Fig3-2.png">

### 2019 Deep Learning Frameworks:
https://www.einfochips.com/blog/deep-learning-frameworks/

<img src = "images/Fig3-2_1.png">
<img src = "images/Fig3-2_2.png">
<img src = "images/Fig3-2_3.png">


<img src = "images/Fig3-2_4.png">

## 3.2.1. Keras, TensorFlow, Theano, and CNTK

Keras is a model-level library, providing high-level building blocks for developing deep-learning models. 

It doesn’t handle low-level operations such as tensor manipulation and differentiation. 

Instead, it relies on a **specialized, well-optimized tensor library** to do so, serving as the **backend engine** of Keras. 

Rather than choosing a single tensor library and tying the implementation of Keras to that library, Keras handles the problem in a modular way (see figure 3.3); 

Thus several different backend engines can be plugged seamlessly into Keras. 

Currently, the three existing backend implementations are the **TensorFlow backend**, the **Theano backend**, and the Microsoft Cognitive Toolkit **(CNTK) backend**. 

In the future, it’s likely that Keras will be extended to work with even more deep-learning execution engines.

<img src = "images/Fig3-3.png">

TensorFlow, CNTK, and Theano are some of the primary platforms for deep learning today. 

Theano (http://deeplearning.net/software/theano) is developed by the MILA lab at Université de Montréal, 
TensorFlow (www.tensorflow.org) is developed by Google, and 
CNTK (https://github.com/Microsoft/CNTK) is developed by Microsoft. 

**Any piece of code that you write with Keras can be run with any of these backends without having to change anything in the code**: 
- you can seamlessly switch between the two during development, which often proves useful—for instance, if one of these backends proves to be faster for a specific task. 
- We recommend using the TensorFlow backend as the default for most of your deep-learning needs, because it’s the most widely adopted, scalable, and production ready.

Via TensorFlow (or Theano, or CNTK), Keras is able to run seamlessly on both CPUs and GPUs. 

When running on CPU, TensorFlow is itself wrapping a low-level library for tensor operations called Eigen (http://eigen.tuxfamily.org). 

On GPU, Tensor-Flow wraps a library of well-optimized deep-learning operations called the NVIDIA CUDA Deep Neural Network library (**cuDNN**).

## 3.2.2. Developing with Keras: a quick overview

You’ve already seen one example of a Keras model: the MNIST example. 

The typical Keras workflow looks just like that example:

1. Define your training data: input tensors and target tensors.
1. Define a network of layers (or model) that maps your inputs to your targets.
1. Configure the learning process by choosing a loss function, an optimizer, and some metrics to monitor.
1. Iterate on your training data by calling the fit() method of your model.

### How to define a model:

- There are two ways to define a model: 
  - Using the **Sequential class** (only for linear stacks of layers, which is the most common network architecture by far) 
  - The **functional API** (for directed acyclic graphs of layers, which lets you build completely arbitrary architectures).

### 1).  A two-layer model defined using the Sequential class:

Note that we’re passing the expected shape of the input data to the first layer

In [3]:
from keras import models
from keras import layers

model = models.Sequential()
model.add(layers.Dense(32, activation='relu', input_shape=(784,)))
model.add(layers.Dense(10, activation='softmax'))

### 2).  A two-layer model defined using the functional API:


In [4]:
input_tensor = layers.Input(shape=(784,))
x = layers.Dense(32, activation='relu')(input_tensor)
output_tensor = layers.Dense(10, activation='softmax')(x)

model = models.Model(inputs=input_tensor, outputs=output_tensor)

With the functional API, you’re manipulating the data tensors that the model processes and applying layers to this tensor as if they were functions. (함수형 API를 사용하면 모델이 처리할 데이터 텐서를 만들고 마치 함수처럼 이 텐서에 층을 적용함)

<img src = "images/note_1.png">

We will only be using the Sequential class in our code examples.

Once your model architecture is defined, it doesn’t matter whether you used a Sequential model or the functional API. 

All of the following steps are the same.

The learning process is configured in the compilation step, where you specify the optimizer and loss function(s) that the model should use, as well as the metrics you want to monitor during training. 

Here’s an example with a single loss function, which is by far the most common case:

In [6]:
from keras import optimizers

model.compile(optimizer=optimizers.RMSprop(lr=0.001),
              loss='mse',
              metrics=['accuracy'])

- Finally, the learning process consists of passing Numpy arrays of input data (and the corresponding target data) to the model via the **fit() method**, 
- It is similar to what you would do in Scikit-Learn and several other machine-learning libraries: (사이킷런의 API 중 학습을 하는 fit() 메서드와 예측을 만드는 predict() 메서드가 케라스에서 같은 이름과 역할로 사용됨)

In [None]:
model.fit(input_tensor, target_tensor, batch_size=128, epochs=10)

We’ll look at three basic examples in sections 3.4, 3.5, and 3.6: a two-class classification example, a many-class classification example, and a regression example.