<a href="https://colab.research.google.com/github/motorio0829/DL-for-AI/blob/main/0320%EC%8B%A4%EC%8A%B5.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

This is a companion notebook for the book [Deep Learning with Python, Second Edition](https://www.manning.com/books/deep-learning-with-python-second-edition?a_aid=keras&a_bid=76564dff). For readability, it only contains runnable code blocks and section titles, and omits everything else in the book: text paragraphs, figures, and pseudocode.

**If you want to be able to follow what's going on, I recommend reading the notebook side by side with your copy of the book.**

This notebook was generated for TensorFlow 2.6.

# The mathematical building blocks of neural networks

## A first look at a neural network

**Loading the MNIST dataset in Keras**

In [None]:
from tensorflow.keras.datasets import mnist
(train_images, train_labels), (test_images, test_labels) = mnist.load_data()

In [None]:
train_images.shape

In [None]:
len(train_labels)

In [None]:
train_labels

In [None]:
test_images.shape

In [None]:
len(test_labels)

In [None]:
test_labels

**The network architecture**

In [4]:
from tensorflow import keras
from tensorflow.keras import layers
model = keras.Sequential([
    layers.Dense(512, activation="relu"),
    layers.Dense(10, activation="softmax")
])

**The compilation step**

In [None]:
model.compile(optimizer="rmsprop",
              loss="sparse_categorical_crossentropy",
              metrics=["accuracy"])

**Preparing the image data**

In [None]:
train_images = train_images.reshape((60000, 28 * 28))
train_images = train_images.astype("float32") / 255
test_images = test_images.reshape((10000, 28 * 28))
test_images = test_images.astype("float32") / 255

**"Fitting" the model**

In [None]:
model.fit(train_images, train_labels, epochs=5, batch_size=128)

**Using the model to make predictions**

In [None]:
test_digits = test_images[0:10]
predictions = model.predict(test_digits)
predictions[0]

In [None]:
predictions[0].argmax()

In [None]:
predictions[0][7]

In [None]:
test_labels[0]

**Evaluating the model on new data**

In [None]:
test_loss, test_acc = model.evaluate(test_images, test_labels)
print(f"test_acc: {test_acc}")

## Data representations for neural networks

### Scalars (rank-0 tensors)

In [None]:
import numpy as np
x = np.array(12)
x

In [None]:
x.ndim

### Vectors (rank-1 tensors)

In [None]:
x = np.array([12, 3, 6, 14, 7])
x

In [None]:
x.ndim

### Matrices (rank-2 tensors)

In [None]:
x = np.array([[5, 78, 2, 34, 0],
              [6, 79, 3, 35, 1],
              [7, 80, 4, 36, 2]])
x.ndim

### Rank-3 and higher-rank tensors

In [None]:
x = np.array([[[5, 78, 2, 34, 0],
               [6, 79, 3, 35, 1],
               [7, 80, 4, 36, 2]],
              [[5, 78, 2, 34, 0],
               [6, 79, 3, 35, 1],
               [7, 80, 4, 36, 2]],
              [[5, 78, 2, 34, 0],
               [6, 79, 3, 35, 1],
               [7, 80, 4, 36, 2]]])
x.ndim

### Key attributes

In [1]:
from tensorflow.keras.datasets import mnist
(train_images, train_labels), (test_images, test_labels) = mnist.load_data()

Downloading data from https://storage.googleapis.com/tensorflow/tf-keras-datasets/mnist.npz
[1m11490434/11490434[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 0us/step


In [None]:
train_images.ndim

In [None]:
train_images.shape

In [None]:
train_images.dtype

**Displaying the fourth digit**

In [None]:
import matplotlib.pyplot as plt
digit = train_images[4]
plt.imshow(digit, cmap=plt.cm.binary)
plt.show()

In [None]:
train_labels[4]

### Manipulating tensors in NumPy

In [None]:
my_slice = train_images[10:100]
my_slice.shape

In [None]:
my_slice = train_images[10:100, :, :]
my_slice.shape

In [None]:
my_slice = train_images[10:100, 0:28, 0:28]
my_slice.shape

In [None]:
my_slice = train_images[:, 14:, 14:]

In [None]:
my_slice = train_images[:, 7:-7, 7:-7]

### The notion of data batches

In [None]:
batch = train_images[:128]

In [None]:
batch = train_images[128:256]

In [None]:
n = 3
batch = train_images[128 * n:128 * (n + 1)]

### Real-world examples of data tensors

### Vector data

### Timeseries data or sequence data

### Image data

### Video data

## The gears of neural networks: tensor operations

### Element-wise operations

In [56]:
import numpy as np
ab = np.ones((2,2))
ab[0,0] = 5
ab[1,1] = -1
ab

array([[ 5.,  1.],
       [ 1., -1.]])

In [57]:
def naive_relu(x):
    assert len(x.shape) == 2
    x = x.copy()
    for i in range(x.shape[0]):
        for j in range(x.shape[1]):
            x[i, j] = max(x[i, j], 0)
    return x

In [58]:
naive_relu(ab)
#음수는 0으로

array([[5., 1.],
       [1., 0.]])

In [59]:
def naive_add(x, y):
    assert len(x.shape) == 2
    assert x.shape == y.shape
    x = x.copy()
    for i in range(x.shape[0]):
        for j in range(x.shape[1]):
            x[i, j] += y[i, j]
    return x

In [60]:
bb = np.ones((2,2))
naive_add(ab, bb)

array([[6., 2.],
       [2., 0.]])

In [68]:
import time

x = np.random.random((20, 100))
y = np.random.random((20, 100))

t0 = time.time()
for _ in range(1000):
    z = x + y
    z = np.maximum(z, 0.)
print("Took: {0:.2f} s".format(time.time() - t0))
#vectorized (연산량 적음 i번 연산)

Took: 0.01 s


In [70]:
t0 = time.time()
for _ in range(1000):
    z = naive_add(x, y)
    z = naive_relu(z)
print("Took: {0:.2f} s".format(time.time() - t0))
#element-wise (연산량 많음 i * j 번 연산)

Took: 1.56 s


### Broadcasting

In [72]:
import numpy as np
X = np.random.random((32, 10))
y = np.random.random((10,))

In [73]:
print(X.shape)
print(y.shape)

(32, 10)
(10,)


In [74]:
y = np.expand_dims(y, axis=0)

In [75]:
print(y.shape)

(1, 10)


In [76]:
Y = np.concatenate([y] * 32, axis=0)

In [77]:
print(Y.shape)

(32, 10)


In [40]:
def naive_add_matrix_and_vector(x, y):
    assert len(x.shape) == 2
    assert len(y.shape) == 1
    assert x.shape[1] == y.shape[0]
    x = x.copy()
    for i in range(x.shape[0]):
        for j in range(x.shape[1]):
            x[i, j] += y[j]
    return x

In [78]:
import numpy as np
x = np.random.random((64, 3, 32, 10))
y = np.random.random((32, 10))
z = np.maximum(x, y)

In [80]:
print(x.shape)
print(y.shape)

(64, 3, 32, 10)
(32, 10)


In [82]:
z2 = x + y
z2.shape

(64, 3, 32, 10)

### Tensor product

In [83]:
x = np.random.random((32,))
y = np.random.random((32,))
z = np.dot(x, y)

In [None]:
def naive_vector_dot(x, y):
    assert len(x.shape) == 1
    assert len(y.shape) == 1
    assert x.shape[0] == y.shape[0]
    z = 0.
    for i in range(x.shape[0]):
        z += x[i] * y[i]
    return z

In [None]:
def naive_matrix_vector_dot(x, y):
    assert len(x.shape) == 2
    assert len(y.shape) == 1
    assert x.shape[1] == y.shape[0]
    z = np.zeros(x.shape[0])
    for i in range(x.shape[0]):
        for j in range(x.shape[1]):
            z[i] += x[i, j] * y[j]
    return z

In [None]:
def naive_matrix_vector_dot(x, y):
    z = np.zeros(x.shape[0])
    for i in range(x.shape[0]):
        z[i] = naive_vector_dot(x[i, :], y)
    return z

In [None]:
def naive_matrix_dot(x, y):
    assert len(x.shape) == 2
    assert len(y.shape) == 2
    assert x.shape[1] == y.shape[0]
    z = np.zeros((x.shape[0], y.shape[1]))
    for i in range(x.shape[0]):
        for j in range(y.shape[1]):
            row_x = x[i, :]
            column_y = y[:, j]
            z[i, j] = naive_vector_dot(row_x, column_y)
    return z

### Tensor reshaping

In [51]:
train_images = train_images.reshape((60000, 28 * 28))

In [53]:
train_images.shape

(60000, 784)

In [54]:
print(train_images.reshape((60000, 28, 28)).shape)

(60000, 28, 28)


In [46]:
x = np.array([[0., 1.],
             [2., 3.],
             [4., 5.]])
x.shape

(3, 2)

In [47]:
x

array([[0., 1.],
       [2., 3.],
       [4., 5.]])

In [48]:
x = x.reshape((6, 1))
x

array([[0.],
       [1.],
       [2.],
       [3.],
       [4.],
       [5.]])

In [50]:
x = np.zeros((300, 20))
print(x.shape)
x = np.transpose(x)
print(x.shape)

(300, 20)
(20, 300)


### Geometric interpretation of tensor operations

### A geometric interpretation of deep learning

## The engine of neural networks: gradient-based optimization

### What's a derivative?

### Derivative of a tensor operation: the gradient

### Stochastic gradient descent

### Chaining derivatives: The Backpropagation algorithm

#### The chain rule

#### Automatic differentiation with computation graphs

#### The gradient tape in TensorFlow

In [86]:
import tensorflow as tf
x = tf.Variable(0.)
with tf.GradientTape() as tape:
    y = 2 * x + 3
grad_of_y_wrt_x = tape.gradient(y, x)

In [95]:
x = tf.Variable(tf.random.uniform((3,3)))
with tf.GradientTape() as tape:
    y = 3*x + 5
grad2 = tape.gradient(y,x)
print(grad2)

tf.Tensor(
[[3. 3. 3.]
 [3. 3. 3.]
 [3. 3. 3.]], shape=(3, 3), dtype=float32)


In [88]:
print(grad_of_y_wrt_x)

tf.Tensor(2.0, shape=(), dtype=float32)


In [90]:
x = tf.Variable(tf.random.uniform((2, 2)))
with tf.GradientTape() as tape:
    y = 2 * x + 3
grad_of_y_wrt_x = tape.gradient(y, x)

In [100]:
x = tf.constant(np.array([1.,4.,3.]).reshape(1,3), dtype = tf.float32)
W = tf.Variable(tf.random.uniform((3,2)), dtype = tf.float32)
b = tf.Variable(tf.zeros((2,)), dtype = tf.float32)

some questions

In [102]:
with tf.GradientTape() as tape:
    y = tf.matmul(x,W) + b
grad_question1 = tape.gradient(y,[W,b])
print(grad_question1)

[<tf.Tensor: shape=(3, 2), dtype=float32, numpy=
array([[1., 1.],
       [4., 4.],
       [3., 3.]], dtype=float32)>, <tf.Tensor: shape=(2,), dtype=float32, numpy=array([1., 1.], dtype=float32)>]


In [103]:
with tf.GradientTape() as tape:
    y = tf.pow((tf.matmul(x,W) + b), 3)
grad_question2 = tape.gradient(y, [W,b])
print(grad_question2)

[<tf.Tensor: shape=(3, 2), dtype=float32, numpy=
array([[ 10.953334,  49.71186 ],
       [ 43.813335, 198.84744 ],
       [ 32.86    , 149.13559 ]], dtype=float32)>, <tf.Tensor: shape=(2,), dtype=float32, numpy=array([10.953334, 49.71186 ], dtype=float32)>]


In [91]:
print(grad_of_y_wrt_x)

tf.Tensor(
[[2. 2.]
 [2. 2.]], shape=(2, 2), dtype=float32)


In [93]:
W = tf.Variable(tf.random.uniform((2, 2)))
b = tf.Variable(tf.zeros((2,)))
x = tf.random.uniform((2, 2))
with tf.GradientTape() as tape:
    y = tf.matmul(x, W) + b
grad_of_y_wrt_W_and_b = tape.gradient(y, [W, b])
print(grad_of_y_wrt_W_and_b)

[<tf.Tensor: shape=(2, 2), dtype=float32, numpy=
array([[0.951882, 0.951882],
       [1.133261, 1.133261]], dtype=float32)>, <tf.Tensor: shape=(2,), dtype=float32, numpy=array([2., 2.], dtype=float32)>]


## Looking back at our first example

In [96]:
(train_images, train_labels), (test_images, test_labels) = mnist.load_data()
train_images = train_images.reshape((60000, 28 * 28))
train_images = train_images.astype("float32") / 255
test_images = test_images.reshape((10000, 28 * 28))
test_images = test_images.astype("float32") / 255

In [97]:
model = keras.Sequential([
    layers.Dense(512, activation="relu"),
    layers.Dense(10, activation="softmax")
])

In [98]:
model.compile(optimizer="adamw",
              loss="sparse_categorical_crossentropy",
              metrics=["accuracy"])

In [99]:
model.fit(train_images, train_labels, epochs=5, batch_size=128)

Epoch 1/5
[1m469/469[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m3s[0m 4ms/step - accuracy: 0.8707 - loss: 0.4605
Epoch 2/5
[1m469/469[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m1s[0m 3ms/step - accuracy: 0.9656 - loss: 0.1167
Epoch 3/5
[1m469/469[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m2s[0m 2ms/step - accuracy: 0.9775 - loss: 0.0736
Epoch 4/5
[1m469/469[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m2s[0m 4ms/step - accuracy: 0.9863 - loss: 0.0477
Epoch 5/5
[1m469/469[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m3s[0m 4ms/step - accuracy: 0.9897 - loss: 0.0363


<keras.src.callbacks.history.History at 0x7ea403e93b10>

### Reimplementing our first example from scratch in TensorFlow

#### A simple Dense class

In [104]:
import tensorflow as tf

class NaiveDense:
    def __init__(self, input_size, output_size, activation):
        self.activation = activation

        w_shape = (input_size, output_size)
        w_initial_value = tf.random.uniform(w_shape, minval=0, maxval=1e-1)
        #케라스에는 가중치 초깃값을 랜덤으로 설정하지 않고 기준에 따라 설정
        self.W = tf.Variable(w_initial_value)

        b_shape = (output_size,)
        b_initial_value = tf.zeros(b_shape)
        self.b = tf.Variable(b_initial_value)

    def __call__(self, inputs):
        return self.activation(tf.matmul(inputs, self.W) + self.b)

    @property
    def weights(self):
        return [self.W, self.b]

In [106]:
exD = NaiveDense(5, 10, activation = tf.nn.relu)

In [112]:
exD.weights[0].shape

TensorShape([5, 10])

#### A simple Sequential class

In [113]:
class NaiveSequential:
    def __init__(self, layers):
        self.layers = layers

    def __call__(self, inputs):
        x = inputs
        for layer in self.layers:
           x = layer(x)
        return x

    @property
    def weights(self):
       weights = []
       for layer in self.layers:
           weights += layer.weights
       return weights

In [122]:
model = NaiveSequential([
    NaiveDense(input_size=28 * 28, output_size=512, activation=tf.nn.relu),
    NaiveDense(input_size=512, output_size=10, activation=tf.nn.softmax)
])
assert len(model.weights) == 4

for i in range(4):
    print(model.weights[i].shape)

(784, 512)
(512,)
(512, 10)
(10,)


#### A batch generator

In [123]:
import math

class BatchGenerator:
    def __init__(self, images, labels, batch_size=128):
        assert len(images) == len(labels)
        self.index = 0
        self.images = images
        self.labels = labels
        self.batch_size = batch_size
        self.num_batches = math.ceil(len(images) / batch_size)

    def next(self):
        images = self.images[self.index : self.index + self.batch_size]
        labels = self.labels[self.index : self.index + self.batch_size]
        self.index += self.batch_size
        return images, labels

### Running one training step

In [124]:
def one_training_step(model, images_batch, labels_batch):
    with tf.GradientTape() as tape:
        predictions = model(images_batch)
        per_sample_losses = tf.keras.losses.sparse_categorical_crossentropy(
            labels_batch, predictions)
        average_loss = tf.reduce_mean(per_sample_losses)
    gradients = tape.gradient(average_loss, model.weights)
    update_weights(gradients, model.weights)
    return average_loss

In [125]:
learning_rate = 1e-3

def update_weights(gradients, weights):
    for g, w in zip(gradients, weights):
        w.assign_sub(g * learning_rate)

In [126]:
from tensorflow.keras import optimizers

optimizer = optimizers.SGD(learning_rate=1e-3)

def update_weights(gradients, weights):
    optimizer.apply_gradients(zip(gradients, weights))

### The full training loop

In [127]:
def fit(model, images, labels, epochs, batch_size=128):
    for epoch_counter in range(epochs):
        print(f"Epoch {epoch_counter}")
        batch_generator = BatchGenerator(images, labels)
        for batch_counter in range(batch_generator.num_batches):
            images_batch, labels_batch = batch_generator.next()
            loss = one_training_step(model, images_batch, labels_batch)
            if batch_counter % 100 == 0:
                print(f"loss at batch {batch_counter}: {loss:.2f}")

In [128]:
from tensorflow.keras.datasets import mnist
(train_images, train_labels), (test_images, test_labels) = mnist.load_data()

train_images = train_images.reshape((60000, 28 * 28))
train_images = train_images.astype("float32") / 255
test_images = test_images.reshape((10000, 28 * 28))
test_images = test_images.astype("float32") / 255

fit(model, train_images, train_labels, epochs=10, batch_size=128)

Epoch 0
loss at batch 0: 5.49
loss at batch 100: 2.23
loss at batch 200: 2.20
loss at batch 300: 2.10
loss at batch 400: 2.18
Epoch 1
loss at batch 0: 1.91
loss at batch 100: 1.86
loss at batch 200: 1.81
loss at batch 300: 1.72
loss at batch 400: 1.79
Epoch 2
loss at batch 0: 1.59
loss at batch 100: 1.56
loss at batch 200: 1.49
loss at batch 300: 1.43
loss at batch 400: 1.48
Epoch 3
loss at batch 0: 1.33
loss at batch 100: 1.32
loss at batch 200: 1.22
loss at batch 300: 1.21
loss at batch 400: 1.26
Epoch 4
loss at batch 0: 1.13
loss at batch 100: 1.14
loss at batch 200: 1.03
loss at batch 300: 1.05
loss at batch 400: 1.10
Epoch 5
loss at batch 0: 0.98
loss at batch 100: 1.00
loss at batch 200: 0.90
loss at batch 300: 0.93
loss at batch 400: 0.98
Epoch 6
loss at batch 0: 0.87
loss at batch 100: 0.90
loss at batch 200: 0.80
loss at batch 300: 0.84
loss at batch 400: 0.90
Epoch 7
loss at batch 0: 0.79
loss at batch 100: 0.81
loss at batch 200: 0.72
loss at batch 300: 0.77
loss at batch 40

### Evaluating the model

In [133]:
predictions = model(test_images)
predictions = predictions.numpy()
predicted_labels = np.argmax(predictions, axis=1)
matches = predicted_labels == test_labels
print(f"accuracy: {matches.mean():.2f}")

model.weights[0]

accuracy: 0.82


<tf.Variable 'Variable:0' shape=(784, 512) dtype=float32, numpy=
array([[0.06967751, 0.08549366, 0.0670405 , ..., 0.04000448, 0.03147677,
        0.01539539],
       [0.09861852, 0.09752019, 0.01119453, ..., 0.07409252, 0.04331451,
        0.00068858],
       [0.0603548 , 0.07060611, 0.05053982, ..., 0.09726965, 0.06125103,
        0.06673223],
       ...,
       [0.06727455, 0.02582065, 0.00470216, ..., 0.07401376, 0.0265793 ,
        0.04210525],
       [0.07521515, 0.04340803, 0.01967042, ..., 0.04703747, 0.07654732,
        0.04233619],
       [0.01659234, 0.07572229, 0.04730619, ..., 0.06749951, 0.03281405,
        0.02971658]], dtype=float32)>

## Summary