
Copyright (C) 2020-2023 Software Platform Lab, Seoul National University

Licensed under the Apache License, Version 2.0 (the "License"); 

you may not use this file except in compliance with the License. 

You may obtain a copy of the License at http://www.apache.org/licenses/LICENSE-2.0 

Unless required by applicable law or agreed to in writing, software 

distributed under the License is distributed on an "AS IS" BASIS, 


WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. 


See the License for the specific language governing permissions and


limitations under the License.

# **1. TensorFlow Operations**

## Constant Tensor

Let's create a constant tensor in TensorFlow.

**```tf.constant(
    value, dtype=None, shape=None, name='Const'
)```**

In [None]:
import tensorflow as tf

# constant of 1d tensor, or a vector
a = tf.constant([2,2], name = 'vector')

# constant of 2x2 tensor, or a matrix
b = tf.constant([[0,2], [1,3]], name = 'matrix')#tensor의 이름을 지어주는 형태로 되어져 있다 이 말이다.

print(a.numpy())#텐서플로우는 그래프를 정의하고 실행하겠다 하면 실행을 하는 방식이고
print(b.numpy())#위에서 numpy로 메소드를 부르면 시스템 속에서 이미 정의된 그래프를 실제로 구현하는 것이다

[2 2]
[[0 2]
 [1 3]]


## Mathematical Operations

The following example shows a matrix division operation.

In [None]:
# Create constant tensors a and b
a = tf.constant([2,4], name = 'a', dtype = tf.float32)
b = tf.constant([[0,1], [2,3]], name = 'b', dtype = tf.float32)
print(a.numpy())
print(b.numpy())#텐서를 만들어 놓는 것이다.
  
# Execute division operation using b and a
div = tf.divide(b, a)# or equivalently, div = b / a
#broadcast 연산을 통해서 계산을 해주는 것이다.

print('\nPrint div')
print(div.numpy())

[2. 4.]
[[0. 1.]
 [2. 3.]]

Print div
[[0.   0.25]
 [1.   0.75]]


## Quiz 1
**Create two constants with shape=[2,2] and perform matrix multiplication. (HINT: use `tf.matmul`)**

In [None]:
import tensorflow as tf
x = [[1, 2], [3, 4]]
y = [[5, 6], [7, 8]]
z = tf.matmul(x,y)
print(z.numpy())

[[19 22]
 [43 50]]


## Variables

Shared, mutable states (e.g., model parameters).

### Creating Variable

To declare a variable, you create an instance of the class `tf.Variable`.

#### Usage of TF Variable


```
x = tf.Variable(...)
x.read_value()      # read value
x.assign(...)       # x = ...
x.assign_add(...)   # x += ...
```



One way to create a variable is: 

**```tf.Variable(< initial-value >, name = < optional-name >)```**

This example creates three variables using `tf.Variable`.

In [None]:
# Create scalar variable
s = tf.Variable(2, name = 'scalar')
# Create matrix variable
m = tf.Variable([[0,1], [2,3]], name = 'matrix')
# Create zero matrix using tf.zeros
W = tf.Variable(tf.zeros((784,10)))

In [None]:
# Print values of Variable s, m and W
print('s:')
print(s.read_value().numpy())
print('\nm:')
print(m.read_value().numpy())
print('\nW:')
print(W.read_value().numpy())

s:
2

m:
[[0 1]
 [2 3]]

W:
[[0. 0. 0. ... 0. 0. 0.]
 [0. 0. 0. ... 0. 0. 0.]
 [0. 0. 0. ... 0. 0. 0.]
 ...
 [0. 0. 0. ... 0. 0. 0.]
 [0. 0. 0. ... 0. 0. 0.]
 [0. 0. 0. ... 0. 0. 0.]]


### Changing values of variables

To change the value of a variable, we need to assign a new value to the variable.
You can see variable `v` changes after `assign` operations are executed.

In [None]:
# v is a 2 x 3 variable of random values
initializer = tf.random_normal_initializer(mean=1., stddev=2.)#정규분포를 따르는 임의의 값을 뽑아내는 것이다.
v = tf.Variable(initializer(shape=[2, 3]))

# c is a 2 x 3 constant with 1.0
c = tf.constant(1.0, shape=(2,3))

# Get value
print('v:')
print(v.read_value().numpy())

# Assign new value to the variable 새로운 값으로 갱신하기
v.assign(c)

# Get value again
print('v:')
print(v.read_value().numpy())

# Assign new value to the variable
v.assign([[1., 2., 3.], [4., 5., 6.]])

# Get value again
print('v:')
print(v.read_value().numpy())

v:
[[-0.30118573  1.8997161   3.5982037 ]
 [ 2.1961584  -3.0604954   1.4173982 ]]
v:
[[1. 1. 1.]
 [1. 1. 1.]]
v:
[[1. 2. 3.]
 [4. 5. 6.]]


## Quiz 2
Define a variable (name : "term") with shape = [] and dtype = `tf.float64`. Initialize the variable as `2` first.
Define another variable (name : "sum") with shape = [] and dtype = `tf.float64`. Initialize the variable as zeros. (remember: shape = [] does not mean wrapping the initial value with [])

By using these two variables, compute the following:
$sum = 1/term_1 + 1/term_2 + ... + 1/term_{10}$
where
$term_i = term_{i-1} * (term_{i-1} - 1) + 1$
and $term_1 = 2$.
(This recurrence relation is known as Sylvester's sequence.)

Hint: Repeat updating the variables "sum" and "term" 10 times.


In [None]:
import tensorflow as tf

############# Write here. ################
t = tf.Variable(2, name="term", dtype=tf.float64)
s = tf.Variable(0, name="sum", dtype=tf.float64)

for _ in range(10):
    print('t: ', t.read_value().numpy())
    s.assign(s+1/t)
    t.assign(t*(t-1)+1)

##########################################

#print('s:', s.read_value().numpy())
print(s)

t:  2.0
t:  3.0
t:  7.0
t:  43.0
t:  1807.0
t:  3263443.0
t:  10650056950807.0
t:  1.1342371305542185e+26
t:  1.2864938683278672e+52
t:  1.6550664732451996e+104
<tf.Variable 'sum:0' shape=() dtype=float64, numpy=0.9999999999999999>


# **2. Dataset API**

## Dataset

The `tf.data` API is the most advanced API for writing TensorFlow input pipelines.

It allows you to build complex pipelines by composing simple building blocks. 

`tf.data.Dataset` is an abstraction representing a sequence of elements (each element represents one or more `tf.Tensor`s)

Users can create new Datasets from existing `tf.Tensor`s by using static methods like `Dataset.from_tensor_slices()`. 

For example, you can create a Dataset of string Tensors that represents input file names. 

Transformation of exisiting Datasets is another way of creating new dataset. 

TensorFlow provides frequently-used Dataset transformations such as `Dataset.batch` or `Dataset.shuffle` (please refer to https://www.tensorflow.org/api_docs/python/tf/data/Dataset). 

### `tf.data.Dataset.from_tensor_slices()`



Creates a Dataset whose elements are slices of the given *python array* or *numpy array* or *tensors*.

In [None]:
import numpy as np
arr = np.arange(10) # [0, 1, 2, 3, 4, 5, 6, 7, 8, 9]

# Create a dataset from a numpy array
dataset = tf.data.Dataset.from_tensor_slices(arr)#리스트를 데이터 셋으로 만들기

# Iterate through the dataset
for element in dataset:
  # Print multiplied value for each element in the dataset
  print((element * 2).numpy())

# You can also use an iterator like this:
# iterator = iter(dataset)

0
2
4
6
8
10
12
14
16
18


## Create a dataset from files using the Dataset API

### Create dummy binary files

In [None]:
import os
import numpy as np

def create_bin_file(file_name, value):
  with open(file_name, 'wb') as f:
    f.write(np.arange(value, value+4, dtype=np.int32))
    
bin_filenames = []
for i in range(3):
  file_name = 'binary_file_%d'% i#폴더를 열어서 확인을 해 보시오
  create_bin_file(file_name, i)
  bin_filenames.append(file_name)

# first file:
# 0 1 2 3

# second file:
# 1 2 3 4

# third file:
# 2 3 4 5

### FixedLengthRecordDataset : each fixed-length slice of bytes is a dataset element.

In this example, each data instance is a 8-byte integer.

In [None]:
# create a Dataset that contains slices (size: 8 bytes) of the files
dataset = tf.data.FixedLengthRecordDataset(bin_filenames, 8)#bin_files가 세가지 파일이름이 list로 되어있다
# or equivalently,
# ds = tf.data.Dataset.from_tensor_slices(bin_filenames)
# ds = ds.apply(lambda filename: tf.data.FixedLengthRecordDataset(filename, 8))

# Iterate through the dataset
for i, element in enumerate(dataset):
  # convert 8 bytes into int32 => two int32 value per each element
  print('step %d, data: %s' % (i, tf.io.decode_raw(element, 'int32').numpy()))#32비트 형태로 저장되어있는걸 그대로 가져와달라고 decode_raw를 사용함.

step 0, data: [0 1]
step 1, data: [2 3]
step 2, data: [1 2]
step 3, data: [3 4]
step 4, data: [2 3]
step 5, data: [4 5]


### Create dummy text files

In [None]:
def create_text_file(file_name, index):
  with open(file_name, 'w') as f:
    f.write('Hello_%d\n' % index)
    f.write('TensorFlow_%d\n' % index)

text_filenames = []
for i in range(3):
  file_name = 'text_file_%d'% i
  create_text_file(file_name, i)
  text_filenames.append(file_name)#파일이름까지 생성함.

# first file:
# Hello_0
# TensorFlow_0

# second file:
# Hello_1
# TensorFlow_1

# third file:
# Hello_2
# TensorFlow_2

### TextLineDataset : each text line is a dataset element.


In [None]:
def iterate_and_print(iterator, count=6):
  for i in range(count):
    print('step %d, data: %s' % (i, next(iterator).numpy()))

In [None]:
# create a Dataset that contains each line of the text files
ds = tf.data.TextLineDataset(text_filenames)
# or equivalently,
# ds = tf.data.Dataset.from_tensor_slices(text_filenames)
# ds = ds.apply(lambda filename: tf.data.TextLineDataset(filename))

# Create iterator for the dataset
iterator = iter(ds)

iterate_and_print(iterator)

step 0, data: b'Hello_0'
step 1, data: b'TensorFlow_0'
step 2, data: b'Hello_1'
step 3, data: b'TensorFlow_1'
step 4, data: b'Hello_2'
step 5, data: b'TensorFlow_2'


## Transform dataset

**ds.shuffle(buffer_size)**

shuffle: shuffle data instances randomly. buffer size represents the number of data instances to be sampled.


`ds.shuffle` with N > 1 can pick data instances randomly from the buffer containing N instances. The code snippet below shows that we always do not get the 5th or 6th element of the dataset (Hello_2 or TensorFlow_2) at step 0.

In [None]:
# Load the text file created previously
ds = tf.data.TextLineDataset(text_filenames) 
# shuffle the dataset using buffer size 4
ds = ds.shuffle(4)

iterator = iter(ds)
iterate_and_print(iterator)

step 0, data: b'Hello_1'
step 1, data: b'Hello_2'
step 2, data: b'TensorFlow_0'
step 3, data: b'Hello_0'
step 4, data: b'TensorFlow_1'
step 5, data: b'TensorFlow_2'


`ds.shuffle` with N == 1 has no shuffling effect.

In [None]:
# Load the text file created previously
ds = tf.data.TextLineDataset(text_filenames)
# shuffle the dataset using buffer size 1
ds = ds.shuffle(1)

iterator = iter(ds)
iterate_and_print(iterator)#packing 하나로 했으니까 바뀌는 것이 없다.

step 0, data: b'Hello_0'
step 1, data: b'TensorFlow_0'
step 2, data: b'Hello_1'
step 3, data: b'TensorFlow_1'
step 4, data: b'Hello_2'
step 5, data: b'TensorFlow_2'


**ds.repeat(count)**

Repeat the data instances count times. 

An error is raised when an iterator calls a next element after reading all the data from the dataset. `ds.repeat(count)` repeats the dataset `ds` so each original value is seen `count` times.

In [None]:
ds = tf.data.TextLineDataset(text_filenames)

iterator = iter(ds)
iterate_and_print(iterator, count=7) # error; 6개의 데이터를 뛰어넘는 count를 7로주면 그 다음에 데이터가 없어서 오류

step 0, data: b'Hello_0'
step 1, data: b'TensorFlow_0'
step 2, data: b'Hello_1'
step 3, data: b'TensorFlow_1'
step 4, data: b'Hello_2'
step 5, data: b'TensorFlow_2'


StopIteration: ignored

`ds.repeat(count)` repeats iterating the dataset `count` times. If we do not pass the `count` argument, the dataset repeats forever.

In [None]:
ds = tf.data.TextLineDataset(text_filenames)

# repeat twice
ds = ds.repeat(2)#데이터를 한 번 더 해서 12개가 되도록 해준다

iterator = iter(ds)
iterate_and_print(iterator, count=12) # error #error가 안나려면 count를 12로 하면 됨.

step 0, data: b'Hello_0'
step 1, data: b'TensorFlow_0'
step 2, data: b'Hello_1'
step 3, data: b'TensorFlow_1'
step 4, data: b'Hello_2'
step 5, data: b'TensorFlow_2'
step 6, data: b'Hello_0'
step 7, data: b'TensorFlow_0'
step 8, data: b'Hello_1'
step 9, data: b'TensorFlow_1'
step 10, data: b'Hello_2'
step 11, data: b'TensorFlow_2'


Another common pattern is to use try-except clause to detect the end of epoch. Once we finisn an epoch, we re-initialize the iterator to start from the beginning again.

In [None]:
ds = tf.data.TextLineDataset(text_filenames)

iterator = iter(ds)

epoch = 0
step = 0

while True:
  # repeat until we detect an error
  try:
    v = next(iterator).numpy()
    print('step %d, data: %s' % (step, v))
    step += 1
  # iterator raises StopIteration once we finish an epoch
  except StopIteration:
    print('Finished epoch', epoch)
    epoch += 1
    # if we are done with 2 epochs, break
    if epoch >= 2:
      break
    # otherwise, re-create an iterator
    iterator = iter(ds)

step 0, data: b'Hello_0'
step 1, data: b'TensorFlow_0'
step 2, data: b'Hello_1'
step 3, data: b'TensorFlow_1'
step 4, data: b'Hello_2'
step 5, data: b'TensorFlow_2'
Finished epoch 0
step 6, data: b'Hello_0'
step 7, data: b'TensorFlow_0'
step 8, data: b'Hello_1'
step 9, data: b'TensorFlow_1'
step 10, data: b'Hello_2'
step 11, data: b'TensorFlow_2'
Finished epoch 1


**ds.batch(batch_size)**

Combines elements of this dataset into batches. `batch_size` represents the number of data instances to combine.

In [None]:
ds = tf.data.TextLineDataset(text_filenames) 
# batch elements using batch_size 3
ds = ds.batch(3)#2개의 element가 들어가서 batch가 3번이 되고
ds = ds.repeat(3)#6개의 batch

iterator = iter(ds)
iterate_and_print(iterator)

step 0, data: [b'Hello_0' b'TensorFlow_0' b'Hello_1']
step 1, data: [b'TensorFlow_1' b'Hello_2' b'TensorFlow_2']
step 2, data: [b'Hello_0' b'TensorFlow_0' b'Hello_1']
step 3, data: [b'TensorFlow_1' b'Hello_2' b'TensorFlow_2']
step 4, data: [b'Hello_0' b'TensorFlow_0' b'Hello_1']
step 5, data: [b'TensorFlow_1' b'Hello_2' b'TensorFlow_2']


**ds.map(fn)**

Apply `fn` to each element of the dataset.

In [None]:
# split the `data` tensor into 3 pieces and concatenate the pieces by inserting '+' between them
def split_join(data):
  data = tf.split(data, 3)
  return tf.strings.join(data, '+')

ds = tf.data.TextLineDataset(text_filenames)
ds = ds.batch(3)
ds = ds.repeat(3)
ds = ds.map(split_join)#map이라는 함수를 넣어서 loop를 돌지 않아도 됨.

iterator = iter(ds)
iterate_and_print(iterator)

step 0, data: [b'Hello_0+TensorFlow_0+Hello_1']
step 1, data: [b'TensorFlow_1+Hello_2+TensorFlow_2']
step 2, data: [b'Hello_0+TensorFlow_0+Hello_1']
step 3, data: [b'TensorFlow_1+Hello_2+TensorFlow_2']
step 4, data: [b'Hello_0+TensorFlow_0+Hello_1']
step 5, data: [b'TensorFlow_1+Hello_2+TensorFlow_2']


##  Speed up Dataset processing



**ds.interleave(map_func, cycle_length)**

map_func : map function to apply to each data instance

cycle_length : the number of data instances to process concurrently

We can use this feature to read and process multiple files concurrently.

In [None]:
ds = tf.data.Dataset.from_tensor_slices(text_filenames)
# consume the first two files in concurrently, and then the third file 
ds = ds.interleave(lambda filename: tf.data.TextLineDataset(filename),
                    cycle_length=2)#동시에 여러 사람이 작업을 하면 속도 차이가 나긴 하는데 다른 것이 하는 속도대로 하고 하면 순서는 보장이 되지 않은채로 나오게된다.

iterator = iter(ds)
iterate_and_print(iterator)

Instructions for updating:
Lambda fuctions will be no more assumed to be used in the statement where they are used, or at least in the same block. https://github.com/tensorflow/tensorflow/issues/56089


step 0, data: b'Hello_0'
step 1, data: b'Hello_1'
step 2, data: b'TensorFlow_0'
step 3, data: b'TensorFlow_1'
step 4, data: b'Hello_2'
step 5, data: b'TensorFlow_2'


**ds.prefetch(buffer_size)** 

prefetch elements from a dataset. buffer size represents the maximum buffer size

In [None]:
ds = tf.data.Dataset.from_tensor_slices(text_filenames)
ds = ds.interleave(lambda filename: tf.data.TextLineDataset(filename),
                    cycle_length=2)
ds = ds.prefetch(3)#dataset의 경우 

iterator = iter(ds)
iterate_and_print(iterator)

step 0, data: b'Hello_0'
step 1, data: b'Hello_1'
step 2, data: b'TensorFlow_0'
step 3, data: b'TensorFlow_1'
step 4, data: b'Hello_2'
step 5, data: b'TensorFlow_2'


## Quiz 3
Create a dataset following the instructions. (data type should be number, not string)

1. Create a textline dataset using files named `ex_filenames`. 
2. Shuffle the dataset with buffer size 15.
3. Repeat the dataset for 2 epochs.
4. Convert each data instance using the `cast` function defined below.
5. Make the data instances as a batch (batch size = 3).



In [None]:
import random

def create_text_file(index):
  with open('ex_file_%d'%index, 'w') as f:
    for i in range(3):
      f.write('%d.%d\n' % (index, i))
    
ex_filenames = []
for i in range(5):
  create_text_file(i)
  ex_filenames.append('ex_file_%d'% i)

def cast(data):
  data = tf.strings.to_number(data, out_type=tf.float32)
  return data


############# Write here. ################
# Create a Dataset
ds = tf.data.TextLineDataset(ex_filenames)
# Shuffle
ds = ds.shuffle(15)
# Repeat
ds = ds.repeat(2)
# Transformation
ds = ds.map(cast)
# Create a mini-batch
ds = ds.batch(3)

##########################################

iterator = iter(ds)
iterate_and_print(iterator, count=10)

step 0, data: [3.2 2.1 0.2]
step 1, data: [3.1 3.  4.1]
step 2, data: [1.  4.2 4. ]
step 3, data: [1.2 0.1 2.2]
step 4, data: [1.1 2.  0. ]
step 5, data: [3.  3.2 3.1]
step 6, data: [4.1 4.2 1.1]
step 7, data: [0.1 4.  2.1]
step 8, data: [1.  2.  2.2]
step 9, data: [1.2 0.  0.2]


# **FYI: TensorFlow v1 vs. TensorFlow v2** 

Throughout the tutorial, we used the latest release of TensorFlow. Check out the version:

In [None]:
print("TensorFlow version: ", tf.__version__)

TensorFlow version:  2.11.0


Previously in TensorFlow v1, we construct a graph using **Graph** which consists of TensorFlow operations (Ops). 

After defining a Graph, we could run it via **Session**.

In [None]:
# graph = tf.Graph()
# with graph.as_default():
      # Dataset
      # Build a model
      # Training
#     // Here we load dataset, define operations, etc.

# with tf.Session(graph=graph) as sess:
#     sess.run(...)

Switching from TensorFlow v1 to TensorFlow v2, there have been many things changed.
To sum up the update, TensorFlow shifted to **eager execution** (imperative execution) by default. 

Eager execution provides an intuitive interface to structure the code naturally and use Python data structures. It is also easier to debug and test changes by using standard Python debugging tools.
Lastly, it has natural Python control flow instead of graph control flow, simplifying the specification of dynamic models. The downside is that eager execution is much slower than symbolic execution, the default TensorFlow v1 mode.

Let's see the difference of the two versions with an example of dynamic control flow.

## Dynamic control flow
A major benefit of eager execution is that all the functionality of the host language is available while your model is executing. So, for example, it is easy to write fizzbuzz game where any number divisible by three is replaced with the word "fizz", and any number divisible by five is replaced with the word "buzz" (similar to the 3-6-9 game).

In [None]:
# Native python code
def fizzbuzz(max_num):
    counter = 0
    for num in range(1, max_num+1):
        if int(num % 3) == 0 and int(num % 5) == 0:
            print('FizzBuzz')
        elif int(num % 3) == 0:
            print('Fizz')
        elif int(num % 5) == 0:
            print('Buzz')
        else:
            print(num)
        counter += 1
    
fizzbuzz(20)

1
2
Fizz
4
Buzz
Fizz
7
8
Fizz
Buzz
11
Fizz
13
14
FizzBuzz
16
17
Fizz
19
Buzz


In eager execution, we need to add minor changes 
in a few lines.

In [None]:
def fizzbuzz_eager(max_num):
    counter = tf.constant(0) # counter = 0
    max_num = tf.convert_to_tensor(max_num) #
    for num in range(1, max_num.numpy() + 1): #
        num = tf.constant(num) # 
        if int(num % 3) == 0 and int(num % 5) == 0:
            print('FizzBuzz')
        elif int(num % 3) == 0:
            print('Fizz')
        elif int(num % 5) == 0:
            print('Buzz')
        else:
            print(num.numpy()) # print(num)
        counter += 1
    
fizzbuzz_eager(20)

1
2
Fizz
4
Buzz
Fizz
7
8
Fizz
Buzz
11
Fizz
13
14
FizzBuzz
16
17
Fizz
19
Buzz


Implementing the same thing using graph mode in TensorFlow v1: [Tensorflow FizzBuzz Revisited (Ricky Han blog)](https://rickyhan.com/jekyll/update/2018/02/16/tensorflow-fizzbuzz-revisited.html)

In [None]:
# This code does not run in tensorflow 2.x

import tensorflow as tf

def fizzbuzz_graph(max_num):
    # Define variable and while_loop
    graph = tf.Graph()
    with graph.as_default():
        arr = tf.Variable([str(i) for i in range(1, max_num+1)])
        # nasty tf.while_loop and tf.cond ops
        while_op = tf.while_loop(
            (lambda i, _: tf.less(i, max_num+1)), 
            (lambda i, _: (tf.add(i,1), tf.cond(
                tf.logical_and(tf.equal(tf.mod(i, 3), 0), tf.equal(tf.mod(i, 5), 0)),
                (lambda : tf.assign(arr[(i - 1)], 'FizzBuzz')),
                (lambda : tf.cond(tf.equal(tf.mod(i, 3), 0),
                    (lambda : tf.assign(arr[(i - 1)], 'Fizz')),
                    (lambda : tf.cond(tf.equal(tf.mod(i, 5), 0),
                        (lambda : tf.assign(arr[(i - 1)], 'Buzz')),
                        (lambda : arr)))))))),
            [1, arr])

    # Call Session.run()
    with tf.Session(graph = graph) as sess:
        sess.run(tf.global_variables_initializer())
        idx, array = sess.run(while_op)
        print(array)


fizzbuzz_graph(100)

AttributeError: ignored

TensorFlow2 can still benefit from graph-based execution as it provides `tf.function` where we can define a function with operations. TensorFlow constructs a graph for the function automatically and apply possible optimizations.

For more information on `tf.function`, refer to this website: https://www.tensorflow.org/guide/function?hl=ko