## https://docs.google.com/document/d/1kMGs68rIHWHifBiqlU3j_2ZkrNj9RquGTe8tJ7eR1sE/edit

# Placeholder

Pro: put the data processing outside TensorFlow, making it easy to do in Python

Cons: users often end up processing their data in a single thread and creating data bottleneck that slows execution down.



```
data, n_samples = utils.read_birth_life_data(DATA_FILE)
X = tf.placeholder(tf.float32, name='X')
Y = tf.placeholder(tf.float32, name='Y')
…
with tf.Session() as sess:
       …
	# Step 8: train the model
	for i in range(100): # run 100 epochs
		for x, y in data:
			# Session runs train_op to minimize loss
			sess.run(optimizer, feed_dict={X: x, Y:y})
```



# tf.data

Instead of doing inference with **placeholders** and feeding in data later, do inference directly with data

tf.data.Dataset

tf.data.Iterator

## Store data in tf.data.Dataset

```
tf.data.Dataset.from_tensor_slices((features, labels))
tf.data.Dataset.from_generator(gen, output_types, output_shapes)

tf.data.Dataset.from_tensor_slices((features, labels))
dataset = tf.data.Dataset.from_tensor_slices((data[:,0], data[:,1]))

tf.data.Dataset.from_tensor_slices((features, labels))
dataset = tf.data.Dataset.from_tensor_slices((data[:,0], data[:,1]))
print(dataset.output_types)		# >> (tf.float32, tf.float32)
print(dataset.output_shapes)		# >> (TensorShape([]), TensorShape([]))

```
## Can also create Dataset from files



```
tf.data.TextLineDataset(filenames)
tf.data.FixedLengthRecordDataset(filenames)
tf.data.TFRecordDataset(filenames)

```
## tf.data.Iterator

```
Create an iterator to iterate through samples in Dataset

iterator = dataset.make_one_shot_iterator()
iterator = dataset.make_initializable_iterator()

iterator = dataset.make_one_shot_iterator()
# Iterates through the dataset exactly once. No need to initialization.

iterator = dataset.make_initializable_iterator()
# Iterates through the dataset as many times as we want. Need to initialize with each epoch.

terator = dataset.make_one_shot_iterator()
X, Y = iterator.get_next()         # X is the birth rate, Y is the life expectancy
with tf.Session() as sess:
	print(sess.run([X, Y]))		# >> [1.822, 74.82825]
	print(sess.run([X, Y]))		# >> [3.869, 70.81949]
	print(sess.run([X, Y]))		# >> [3.911, 72.15066]
  
 
terator = dataset.make_initializable_iterator()
..................................................
for i in range(100): 
        sess.run(iterator.initializer) 
        total_loss = 0
        try:
            while True:
                sess.run([optimizer]) 
        except tf.errors.OutOfRangeError:
            pass

```





##  Handling data in TensorFlow

```
dataset = dataset.shuffle(1000)
dataset = dataset.repeat(100)
dataset = dataset.batch(128)
dataset = dataset.map(lambda x: tf.one_hot(x, 10)) 
# convert each elem of dataset to one_hot vector
```
# Does tf.data really perform better?

With placeholder: **9.05271519** seconds

With **tf.data**: **6.12285947** seconds

## Should we always use tf.data?

For prototyping, feed dict can be faster and easier to write (pythonic)

tf.data is tricky to use when you have complicated preprocessing or multiple data sources

NLP data is normally just a sequence of integers. In this case, transferring the data over to GPU is pretty quick, 

so the speedup of tf.data isn't that large


# Optimizer

```
optimizer = tf.train.GradientDescentOptimizer(learning_rate=0.01).minimize(loss)

_, l = sess.run([optimizer, loss], feed_dict={X: x, Y:y})

```
Session looks at all trainable variables that loss depends on and update them

## Trainable variables

Specify if a variable should be trained or not

By default, all variables are trainable

```

tf.Variable(initial_value=None, trainable=True,...)
```
For example, global_steps shouldn’t be trainable

Or in double q-learning, you want to alternate which q-value functions to update






## good code
```
mnist_folder = 'data/mnist'
utils.download_mnist(mnist_folder)
train, val, test = utils.read_mnist(mnist_folder, flatten=True)

train_data = tf.data.Dataset.from_tensor_slices(train)
train_data = train_data.shuffle(10000) # optional
test_data = tf.data.Dataset.from_tensor_slices(test)
### 이부분이 굳 잡 두번할꺼 한번으로 줄임 
iterator = tf.data.Iterator.from_structure(train_data.output_types, 
                                           train_data.output_shapes)
img, label = iterator.get_next()

train_init = iterator.make_initializer(train_data)	# initializer for train_data
test_init = iterator.make_initializer(test_data)	# initializer for train_data
 =-------==============================================================================
## Initialize iterator with the dataset you want 


with tf.Session() as sess:
    ...
    for i in range(n_epochs):       
        sess.run(train_init)	       	# use train_init during training loop
        try:
            while True:
                _, l = sess.run([optimizer, loss])
        except tf.errors.OutOfRangeError:
            pass
 
    # test the model
    sess.run(test_init)				# use test_init during testing
    try:
        while True:
            sess.run(accuracy)
    except tf.errors.OutOfRangeError:
        pass

```



# tf.get_varialbe()

With tf.get_variable, we can provide variable’s internal name, shape, type, and initializer to give the variable its initial value. Note that when we use tf.constant as an initializer, we don’t need to provide shape.

```
tf.get_variable(
    name,
    shape=None,
    dtype=None,
    initializer=None,
    regularizer=None,
    trainable=True,
    collections=None,
    caching_device=None,
    partitioner=None,
    validate_shape=True,
    use_resource=None,
    custom_getter=None,
    constraint=None
)

s = tf.get_variable("scalar", initializer=tf.constant(2)) 
m = tf.get_variable("matrix", initializer=tf.constant([[0, 1], [2, 3]]))
W = tf.get_variable("big_matrix", shape=(784, 10), initializer=tf.zeros_initializer())

```
You have to initialize a variable before using it. If you try to evaluate the variables before initializing them you'll run into FailedPreconditionError: Attempting to use uninitialized value. To get a list of uninitialized variables, you can just print them out:

```
print(session.run(tf.report_uninitialized_variables()))
```


