# Putting it all together in TensorFlow/Keras

# Computation graphs

## Intuitive explanation

When you first look at TensorFlow code, it looks like your familiar imperative program:
- familiar operators
    - assignment, addition, multiplication
    - may overload operators `=`, `+`, `*`
    


It is **not** the same.

- The statements in TensorFlow are not executed immediately (as in an imperative program)
    - they are defining a future computation (the "computation graph")
    - think of it a defining the *body* of a function
        
- In order to evaluate (i.e., "call") the function ('computation graphs")
    - You must create a "session" in TensorFlow
    - All code must be run within a session
    - The code is evaluated by explicitly asking for something to be "evaluated" or "run"
        - When evaluating/running: you must pass in actual values for the formal parameters (function arguments/place holders)
We've swept some subtle but important details under the rug.


Consider the imperative Python program

[Raw Tensorflow Notebook on Drive](https://urldefense.proofpoint.com/v2/url?u=https-3A__colab.research.google.com_drive_1vwX0IbztsybVlh9oYtfpBXFLWcI-2DAatr-3Fts-3D5da7c011&d=DwMFaQ&c=slrrB7dE8n7gBJbeO0g-IQ&r=rGnWpQIpf3UyN5UxbAs9oPgrYvuF0fDXAV2g1_EuNCg&m=L4mP2aZE1cYG9qRa6ZoTwAeunh66tXR2gWEp26TZpPQ&s=502bRW0foGgB8rCia3EU58_JTAt86qhSJLP3SV0qOaU&e=)

In [2]:
a = 0
b = 1
c = a + b

print(c)

1


and the very similar looking TensorFlow

You are literally building a graph of information flow

To actually do something, you have to "evaluate" part of the graph

In [None]:
init = tf.global_variables_initializer()

with tf.Session() as sess:
  init.run()
 
  c_value = sess.run(c)
  print("c value:", c_value)


In the imperative program, each line is evaluated immediately after it is executed.

In the declarative program, it is not evaluated: it just creates a dependence between outputs (c) and inputs (a and b). When you evaluate c, it recursively evaluates all the things that c depends on.

Hence, you are declaring a graph that is evaluated later.

## Computation graph: a node is an expression, not a value

Imagine that a variable has two attributes
- `c.value`: the current "value" of the variable
- `c.expr` - the expression that computes `c`

When we write
>`c = a + b`

in our familiar imperative programming languages, this really denotes the imperative
>`c.value = a.value + b.value`

That is, the string `c = a + b` is a *command* to modify the value of `c`.

In a declarative program, the string `c = a + b` defines a *function* that computes `c` from two
inputs `a, b`

>`c.expr = lambda a,b: plus(a,b)`

Thus, it's possible to write the string `c = a + b` even before `a, b` have been initialized
because `a, b` are just formal parameters to the function `c.expr`.

In order to *evaluate* `c.expr` (i.e., compute the concrete value `c.value`) we must first evaluate

>`a.expr, b.expr`

Note that the declarative program distinguishes between *declaring/defining* an expression
and *evaluating* it.

More formally, the `eval` operator (which derives a value from a function) applied to `c` results in

>`eval(c.expr) = plus( eval(a.expr), eval(b.expr) )`

These in turn might be expressions that depend on other expressions, e.g., 
>`a.expr = lambda d, e: mult(d,e)`

So the evaluation of the top-level expression `c.expr` involves recursively evaluating all
expressions on which `c.expr` depends.
Eventually the recursion unwinds to a base case in which the expression involves no further computation

>`d = lambda: d.value`

As we traverse the code of the declarative program, we are defining more and more functions,
and dependencies between functions (i.e., some functions consume the results of other functions as arguments).

This collection of functions is called a *computation tree*.
A computation tree is just a collection of functions and dependencies.
A node `c` in the tree has *no concrete* value until we request it to be *evaluated*, which
involves 
- binding concrete values to all leaf nodes of the sub-tree defining `c.expr`
- recursively evaluating the nodes on which `c` depends.

# Eager execution

Many people find declarative programming confusing (and perhaps pointless).

As you will see, there is a point (and a very big one. Hint: do you like to write derivatives ?)

TF supports "eager execution" which makes TF look like an imperative language. This is optional in TF v1, and standard in TF v2.

So, when reading other people's code, it's important to observe whether eager execution has been enabled.

TF v2 is not yet standard so most code you will currently see is declarative.

You may stumble at first, but it is very powerful.

[Introducing Eager execution](https://developers.googleblog.com/2017/10/eager-execution-imperative-define-by.html?source=post_page---------------------------)
- Because you are not building a graph, the training loop is different
    - more Pythonic
    - no need to
        - instantiate session
        - `eval` or `run` the training step


# Deeper dive: inside the fully connected layer

Before Keras there was a layer API.

Before the layer API, there was raw TF.

Because you can find multiple generations of code on the web, it can be confusing.

A lot of code that you will find still uses raw TF.

And, on occassion, you have to write some raw TF: either in a Lambda function (like your own layer) or
    when the connections between layers are more complex than the Sequential Keras model
- e.g., a model that takes inputs from the outputs of two other models
    
So let's implement a FC layer in raw TF.

# Keras

[Keras](https://keras.io/) is a high level API for Neural Networks.
It supports multiple NN engines ("back ends") including TensorFlow, Theano, and CNTK.
So you can write a single program in Keras and run it on different underlying engines.

We will be using TensorFlow as our engine.

Technically speaking: Keras is an API -- a specification -- not a library.
TensorFlow has implemented this specification within the TensorFlow language

**This is not just a legal difference**
- Keras is available as a separate package, independent of TensorFlow
-` tensorflow.keras` is the Keras API implemented (and well-integrated) into TensorFlow

This may get confusing
- when you read the [Keras docs](https://keras.io/) they are referring to the abstract Keras API
    - used as `import keras as keras`

- when you read the [TensorFlow docs for Keras](https://www.tensorflow.org/guide/keras) it is refering to TensorFlows implementation of the API
    - used as `from tensorflow import keras`
    - this is what we will use !

[TensorFlow 2.0](https://medium.com/tensorflow/standardizing-on-keras-guidance-on-high-level-apis-in-tensorflow-2-0-bad2b04c819a)  (currently in beta release) has chosen to implement Keras as its (exclusive ?) high level API.

## Boiler plate: Guidance from TensorFlow team
[Guidance from TensorFlow team](https://medium.com/tensorflow/standardizing-on-keras-guidance-on-high-level-apis-in-tensorflow-2-0-bad2b04c819a)
- `tf.keras` is an implementation of the Keras API
    - *with enhancements*
        - eager execution
    - integrated into TensorFlow ecosystem
        - `tf.data`
    - use as

    

although on the same page they *also* use
- `tf.keras.models.Sequential` rather than `tf.keras.Sequential`
- `tf.keras.layers.Dense`
    - presumably depending on `from tensorflow.keras import layers`
    
** SO GUESS ** that file path `tensorflow.keras` is same module as `tf.keras`

Can actually show this in a notebook:
- `import tensorflow.keras.layers as tkl`
    - `tkl.Dense??` and `tf.keras.layers.Dense??` wind up in same file

## Eliminate the doubt: which functions are the same 

When you see sample code for different places, the same function may be referred to with slightly different
names; just the way Python imports work.

What **is** crucial to distinguish is TensorFlow's implementation of Keras and the Keras API


All the following comparisons evaluate to True, as you can test for yourself

But the following is **not** True because they come from different packages:

# [Getting Started with TensorFlow](https://www.tensorflow.org/tutorials?source=post_page---------------------------)

[Notebook in Colab](https://colab.research.google.com/github/tensorflow/docs/blob/master/site/en/tutorials/_index.ipynb?source=post_page---------------------------#scrollTo=hiH7AC-NTniF)

Sample good (can see modules used)
- `import tensorflow as tf`
- everything else is from `tf`
    - `tf.keras.models.Sequential`
    - `tf.keras.layers.Dense`
    - `tf.keras.datasets.mnist`

## Restoring order to the world



Having Keras tightly integrated into TensorFlow 2.0 cleans up a rather unruly TensorFlow eco-system that resulted in  similar functionality in multiple places.

This can make it very confusing for someone new to TensorFlow.  There are lots of examples on the web written
using various similar-looking packages. 

I'll try to point out potential sources of confusion. Beware !

[Demystify the TensorFlow APIs](https://medium.com/google-developer-experts/demystify-the-tensorflow-apis-57d2b0b8b6c0) summarizes it well

Note that `keras` and `tf.keras` are *two different name spaces!*

- `tf.keras` vs`keras`
    - `'keras` is a separately installed package (`pip install keras`)
        - used as `import keras as keras`
        - you can choose any engine
        - this is the "use Keras with TensorFlow as a back end" option
 
    - `tf.keras` is the TensorFlow implementation of the Keras API
        - **recommended** vs the Keras package
            - better integrated into TensorFlow
        
        - used as `from tensorflow import keras`
            - subsequently, call `keras.layers.Dense` etc
        

- `tf.layers` is going away in TensorFlow 2.0
    - `tf.keras` is recommended going forward
    - **Do not use**
    
- [Estimators](https://www.tensorflow.org/guide/estimators) (`tf.Estimator`)
    - Estimators are sometimes called "models in a box"; somewhat similar to `sklearn`
        - pre-canned high-level models (like Classifiers) rather than low-level `tf.keras.layers` (like Dense) from which it is built
        - convenient interface to [Datasets for Estimators](https://www.tensorflow.org/guide/datasets_for_estimators)
            - no need to create own mini-batches, etc.
    - You can achieve quite a bit of the convenience using Keras, so we will skip Estimators.
    

- Low-level TensorFlow 
    - great for learning
    - better to rely on pre-defined layers when possible
    
And our own observations
- `tf.contrib`
    - this was a name-space created to enable users to contribute useful packages.
    - some of these packages may have made their way into the core, or been integrated elsewhere
        - `tf.contrib.learn.Estimator` is the obsolete version of `tf.Estimator`
    - eliminated from TensorFlow 2.0
        - **avoid**
- [Datasets API](https://www.tensorflow.org/guide/datasets)
    - an API to handle large datasets, in memory- 

We will focus on two styles or packages
- `tf.keras`
    - this is the future, as it will be tightly integrated into TensorFlow 2.0
- `tf.layers` modules (e.g., `tf.layers.dense`)
    - used only to be compatible with the Geron book.
    - it is slightly lower level than Keras
    

There are two "API"'s" `Sequential` and `Functional`

## Sequential
[Getting started with the Keras Sequential Model](https://keras.io/getting-started/sequential-model-guide/)

Specify a model as a sequence of layers.  Very natural

**NOTE** copying from Keras (rather than tf.keras) so really should prefix with
`import tf.keras as keras`
or
`from tensorflow import keras``

The only limitation is that your computation graphs is a non-branching line (functions called in sequence).

## Functional

It is a little more verbose than `Sequential` but also more flexible in that you can define more complex computation graphs (multiple inputs/outputs, shared layers)

Notice that you are (manually) invoking a single layer at a time, passing as input the output of the prior layer.

- You must define an `Input` layer (placeholder for the input)
    - in `Sequential` you instead give an `input_shape=` parameter to the first layer to specify input shape)
- You "wrap" the graph into a "model" by a `Model` statement
    - looks like a function definition
        - names the input and output formal parameters
    - a `Model` acts just like a layer (but with internals that you create)