# Tensorflow Basics
*By Ronny Restrepo*

# Save and Restore Variables
This tutorial will go through how to save and restore the variables in a tensorflow graph. 

## Setting up

Lets start by setting up the environment.

In [1]:
# ================================================
#                                          IMPORTS
# ================================================
from __future__ import print_function
import tensorflow as tf
import os

We can now create a simple graph containing some variables that get updated.
Operation `c` updates the value of the `w1` variable by performing an 
elementwise adition between `w1` and `w2`.

In [2]:
# ================================================
#                                   CREATE A GRAPH
# ================================================
graph = tf.Graph()
with graph.as_default():
    w1 = tf.Variable(tf.constant(1, shape=[2, 3]), name="weights_1")
    w2 = tf.Variable(tf.constant(1, shape=[2, 3]), name="weights_2")
    c = w1.assign(w1 + w2) # update the value of w1

## Checkpoint File

Tensorflow has built in functions that allow you to save and restore the values 
of variables into files called `checkpoint` files. When creating a checkpoint 
files, it actually creates several accompanying files along with it. It is 
therefore a good idea to place the checkpoint file in a dedicated subdirectory, 
to keep all the related files nicely organised. 

So lets start by creating a subdirectory called `"checkpoints"`, and 
specifying the name of the checkpoint file to be `"checkpoint.chk"`. 

In [3]:
# ================================================
#                                 SETUP FILE PATHS
# ================================================
# Specify the name of the checkpoints directory  
checkpoint_dir = "checkpoints"

# Create the directory if it does not already exist
if not os.path.exists(checkpoint_dir):
    os.makedirs(checkpoint_dir)

# Specify the path to the checkpoint file
checkpoint_file = os.path.join(checkpoint_dir, "checkpoint.chk")

## Saving, Restoring, and Initializing Variables

The code below defines a function that will run a session that does the following:

- The first time it is run, it initializes the variables from scratch. 
- Runs the graph to the point where the variable `w1` gets updated.
- It saves the variable values to a checkpoint file. 
- Given that the checkpoint file is created, any subsequent calls to this 
  function will run a session which loads the variable values from file before 
  performing operations on them. 
  
Dont worry if any of the sections of this code to not make much sense, they 
will be explained in the following few sections. 

In [4]:
# ================================================
#                                 DEFINE A SESSION
# ================================================
def run_session(graph):
    with tf.Session(graph=graph) as session:
        # Create a Saver Object
        saver = tf.train.Saver(name="saver")

        # Initialize Variables
        if tf.train.checkpoint_exists(checkpoint_file):
            print("Restoring from file: ", checkpoint_file)
            saver.restore(session, checkpoint_file)
        else:
            print("Initializing from scratch")
            session.run(tf.global_variables_initializer())

        # Run graph up to the `c` operation
        session.run(c)

        # Retrieve the value of w1
        w1_val = session.run(w1)
        print("Value of w1 a after running: \n", w1_val)

        # Save a snapshot of the current value of the variables
        saver.save(session, checkpoint_file)

If you run the code above, and then run the following block of code that calls 
the function, you will see in the printout that the variables are being 
initialized from scratch, and the value of `w1` after running the `c` operation 
is the elementwise addition of two tensors whose values are `1`, giving a matrix 
full of `2`. 

In [5]:
# Run Session for First Time
run_session(graph=graph)

Initializing from scratch
Value of w1 a after running: 
 [[2 2 2]
 [2 2 2]]


If we call the function again, you will see from the printout that this time 
the variables are being initialized from the checkpoint file. This means the 
tensor `w1` will take on the updated values from last time, and will be updated 
once again in operation `c`, giving up a tensor of all `3`. 

In [6]:
# Run Session for Second Time
run_session(graph=graph)

Restoring from file:  checkpoints/checkpoint.chk
Value of w1 a after running: 
 [[3 3 3]
 [3 3 3]]


## Each Component in Detail

Each of the components within the `run_session()` function that was defined 
above will be explained in the subsections below. 

### Saver Object

Tensorflow has an object called a `saver`. This object has methods that allow 
you to save and restore variables. This is created using: 

```python
saver = tf.train.Saver(name="saver")
```

### Saving Variables to File
In order to save the variables of a graph to a checkpoint file you can use: 

```python
saver.save(session, "path/to/checkpoint_file.chk")
```

**TODO: ** Add details on how to selectively save variables. 

### Load Previously Saved Variables from File
In order to load previously saved variables from a checkpoint file, 
you can use the restore method: 

```python
saver.restore(session, "path/to/checkpoint_file.chk")
```

The first time that you run a session, there will not be a any previously 
saved  variables to restore from file. As such you will need to initialize 
the variables from scratch using something like: 

```python
session.run(tf.global_variables_initializer())
```

You can check if there is a previously saved checkpoint file by making use of the `tf.train.checkpoint_exists()` function as follows: 

```python
if tf.train.checkpoint_exists("path/to/checkpoint_file.chk"):
    # Initialize from file
    saver.restore(session, "path/to/checkpoint_file.chk")
else:
    # Initialize from scratch
    session.run(tf.global_variables_initializer())
```

You should use the above method, and not the `os.path.exists()` function like the following: 


```python
# DO NOT DO USE THIS
if os.path.exists("path/to/checkpoint_file.chk"):
    # Initialize from file
    saver.restore(session, "path/to/checkpoint_file.chk")
else:
    # Initialize from scratch
    session.run(tf.global_variables_initializer())
```

Since version 0.12 of Tensorflow, checkpoint files might have suffix names such as 
`".data-00000-of-00001"` added to the filename. So you might specify that the 
checkpoint file be called `"checkpoint.chk"`, but it might actually get saved as 
`"checkpoint.chk.data-00000-of-00001"`. The tensorflow function 
`tf.train.checkpoint_exists()` will ignore these suffixes, and return `True` 
when the checkpoint file exists. If you were to make use of python's `os` 
library to do it, you would need to go through the extra hassle of accounting 
for aditional suffixes that might be added to the filename. 


**TODO: ** Add details on how to selectively load variables. 


## Compatability
This tutorial was written for use with tensorflow 0.12 (and possibly above). It will not work on 
older versions of Tensorflow, but you can get it to work by replacing occurences 
of:

```python
session.run(tf.global_variables_initializer())
```

with 

```python
session.run(tf.initialize_all_variables())
```

<hr>

In [7]:
# IGNORE THIS CODE: Irrelevant to the tutorial 
# For Prettyfying the notebook fonts and styles when run locally
from IPython.core.display import HTML
HTML("""<link rel="stylesheet" href="./custom.css" type="text/css" />""")