#  How to manage your experiments in TensorFlow
## tf.train.Saver()

-----
A good practice is to periodically save the model’s parameters after a certain number of steps so that we can restore/retrain our model from that step if need be. The tf.train.Saver() class allows us to do so by saving the graph’s variables in binary files.
```python
tf.train.Saver.save(sess,
                    save_path, 
                    global_step=None,
                    latest_filename=None,
                    meta_graph_suffix='meta',
                    write_meta_graph=True,
                    write_state=True)

# define model
# create a saver object
saver = tf.train.Saver()
# launch a session to compute the graph
with tf.Session() as sess:
    # actual training loop
    for step in range(training_steps):
        sess.run([optimizer])
        
        if (step + 1) % 1000==0:
            saver.save(sess, 'checkpoint_directory/model
```
In TensorFlow lingo, the step at which you save your graph’s variables is called a checkpoint.
Since we will be creating many checkpoints, it’s helpful to append the number of training steps our model has gone through in a variable called global_step. It’s a very common variable to see in TensorFlow program. We first need to create it, initialize it to 0 and set it to be not trainable, since we don’t want to TensorFlow to optimize it.
```python
self.global_step = tf.Variable(0, dtype=tf.int32, trainable=False, name='global_step')
```
We need to pass global_step as a parameter to the optimizer so it knows to increment global_step by one with each training step:
```python
self.optimizer = tf.train.GradientDescentOptimizer(self.lr).minimize(self.loss,
 global_step=self.global_step)
 ```
To save the session’s variables in the folder ‘checkpoints’ with name model-name-global-step, we use this:
```python
saver.save(sess, 'checkpoints/skip-gram', global_step=model.global_step)
```
To restore the variables, we use tf.train.Saver.restore(sess, save_path). For example, you want to restore the checkpoint at 10,000th step.
```python
saver.restore(sess, 'checkpoints/skip-gram-10000')
```
But of course, we can only load saved variables if there is a valid checkpoint. What you probably want to do is that if there is a checkpoint, restore it. If there isn’t, train from the start. TensorFlow allows you to get checkpoint from a directory with tf.train.get_checkpoint_state(‘directory-name’). The code for checking looks something like this:
```python
ckpt = tf.train.get_checkpoint_state(os.path.dirname('checkpoints/checkpoint'))
if ckpt and ckpt.model_checkpoint_path:
   saver.restore(sess, ckpt.model_checkpoint_path)
```

In [24]:
import tensorflow as tf
import os
print(tf.train.get_checkpoint_state(os.path.dirname("./graph/word2vec/checkpoint")))

model_checkpoint_path: "./graph/word2vec\\skip-gram.ckpt-1"
all_model_checkpoint_paths: "./graph/word2vec\\skip-gram.ckpt-1"



By default, saver.save() stores all variables of the graph, and this is recommended. However, you can also choose what variables to store by passing them in as a list or a dict when we create the saver object. Example from TensorFlow documentation.
```python
v1 = tf.Variable(..., name='v1')
v2 = tf.Variable(..., name='v2')
# pass the variables as a dict:
saver = tf.train.Saver({'v1': v1, 'v2': v2})
# pass them as a list
saver = tf.train.Saver([v1, v2])
# passing a list is equivalent to passing a dict with the variable op names 
# as keys
saver = tf.train.Saver({v.op.name: v for v in [v1, v2]})
```
Note that savers only save variables, not the entire graph, so we still have to create the graph ourselves, and then load in variables. The checkpoints specify the way to map from variable names to tensors.
## tf.summary
----
We’ve been using matplotlib to visualize our losses and accuracy, which is cool but unnecessary because TensorBoard provides us with a great set of tools to visualize our summary statistics during our training. Some popular statistics to visualize is loss, average loss, accuracy. You can visualize them as scalar plots, histograms, or even images. So we have a new namescope in our graph to hold all the summary ops.
```python
def _create_summaries(self):
    with tf.name_scope("summaries"):
        tf.summary.scalar("loss", self.loss
        tf.summary.scalar("accuracy", self.accuracy)
        tf.summary.histogram("histogram loss", self.loss)
        # because you have several summaries, we should merge them all
        # into one op to make it easier to manage
        self.summary_op = tf.summary.merge_all()
```
Because it’s an op, you have to execute it with sess.run()
```python
loss_batch, _, summary = sess.run([model.loss, model.optimizer, model.summary_op], feed_dict=feed_dict)
```
Now you’ve obtained the summary, you need to write the summary to file using the same FileWriter object we created to visual our graph.
```python
writer.add_summary(summary, global_step=step)
```
Now, if you go run tensorboard and go to http://localhost:6006/, in the Scalars page, you will see the plot of your scalar summaries. your loss in scalar plot.
And the loss in histogram plot.
If you save your summaries into different sub-folder in your graph folder, you can compare your progresses. For example, the first time we run our model with
learning rate 1.0, we save it in ‘improved_graph/lr1.0’ and the second time we run our model, we save it in ‘improved_graph/lr0.5’, on the left corner
of the Scalars page, we can toggle the plots of these two runs to compare them. This can be really helpful when you want to compare the progress made
with different optimizers or different parameters.
You can write a Python script to automate the naming of folders where you store the graphs/plots of each experiment. You can visualize the statistics as images using tf.summary.image.
```python
tf.summary.image(name, tensor, max_outputs=3, collections=None)
```
## Control randomization
----
I never realized what an oxymoron this sounds like until I’ve written it down, but the truth is that you often have to control the randomization process to get stable results for your experiments. You’re probably familiar with random seed and random state from NumPy. TensorFlow doesn’t allow to you to get random state the way numpy does (at least not that I know of -- I will double check), but it does allow you to get stable results in randomization through two ways:
### Set random seed at operation level. 
All random tensors allow you to pass in seed value in their initialization. For example:
```python
my_var = tf.Variable(tf.truncated_normal((-1.0,1.0), stddev=0.1, seed=0))
```

In [25]:
c = tf.random_uniform([], -10, 10, seed=2)
with tf.Session() as sess:
    print(sess.run(c)) # >> 3.57493
    print(sess.run(c)) # >> -5.97319

1.0421
-4.85772


Note that, session is the thing that keeps track of random state, so each new session will start the random state all over again.

In [26]:
c = tf.random_uniform([], -10, 10, seed=2)
with tf.Session() as sess:
    print(sess.run(c)) # >> 3.57493
with tf.Session() as sess:
    print(sess.run(c)) # >> 3.57493


1.0421
1.0421


With operation level random seed, each op keeps its own seed.

In [27]:
c = tf.random_uniform([], -10, 10, seed=2)
d = tf.random_uniform([], -10, 10, seed=2)
with tf.Session() as sess:
    print(sess.run(c)) # >> 3.57493
    print(sess.run(d)) # >> 3.57493

1.0421
1.0421


### 2. Set random seed at graph level with tf.Graph.seed
```python
tf.set_random_seed(seed)
```
If you don’t care about the randomization for each op inside the graph, but just want to be able to replicate result on another graph (so that other people can replicate your results on their own graph), you can use tf.set_random_seed instead. Setting the current TensorFlow random seed affects the current default graph only.
For example, you have two models a.py and b.py that have identical code:

In [28]:
tf.set_random_seed(2)
c = tf.random_uniform([], -10, 10)
d = tf.random_uniform([], -10, 10)
with tf.Session() as sess:
    print(sess.run(c))
    print(sess.run(d))

8.90151
-4.93518


Without graph level seed, running python a.py and b.py will return 2 completely different results, but with tf.set_random_seed, you will get two identical results:

In [29]:
%run a.py

-4.68877
2.22114


In [30]:
%run b.py

-8.92643
6.14358


## Reading Data in TensorFlow
----
There are two main ways to load data into a TensorFlow graph: one is through feed_dict that we are familiar with, and another is through readers that allow us to read tensors directly from file. There is, of course, the third way which is to load in your data using constants, but you should only use this if you want your graph to be seriously bloated and un-runnable (I made up another
word but you know what I mean).
To see why we need something more than feed_dict, we need to look into how feed_dict works under the hood. Feed_dict will first send data from the storage system to the client, and then from client to the worker process. This will cause the data to slow down, especially if the client is on a different machine from the worker process. TensorFlow has readers that allow us to load data directly into the worker process.
The improvement will not be noticeable when we aren’t on a distributed system or when our dataset is small, but it’s still something worth looking into. TensorFlow has several built in readers to match your reading needs.
```python
tf.TextLineReader
Outputs the lines of a file delimited by newlines
E.g. text files, CSV files

tf.FixedLengthRecordReader
Outputs the entire file when all files have same fixed lengths
E.g. each MNIST file has 28 x 28 pixels, CIFAR-10 32 x 32 x 3

tf.WholeFileReader
Outputs the entire file content

tf.TFRecordReader
Reads samples from TensorFlow's own binary format (TFRecord)

tf.ReaderBase
Allows you to create your own readers
```

Run [word2vec.py](https://github.com/AppleFairy/CS20SI-Tensorflow-for-Deep-Learning-Research/blob/master/word2vec.py) and check visalization result from tensorboard.