<a id = 'Top'></a>

# TensorFlow

In [1]:
import tensorflow as tf
import numpy as np

## Table of Contents
#### <a href = '#Tensors'>Section: Tensors</a>
#### <a href = '#Variables'>Section: Variables</a>
#### <a href = '#Automatic Differentiation and Gradients'>Section: Automatic Differentiation and Gradients</a>
#### <a href = '#Graphs and Functions'>Section: Graphs and Functions</a>
#### <a href = '#Modules, layers, and models'>Section: Modules, layers, and models</a>
#### <a href = '#Index'>The Index</a>

## Basics

#### Introduction:

Tensorflow is used to make building deep learning models easy. It works by creating computational graphs to represent the math going on. When you do a forward pass it uses this graph to efficiently propogate forward. To calculate gradients it also uses the graph with something called backdrop. 

#### Eager Execution 

Tensorflow uses eager execution which means that instead of creating an abstract computational graph and then using that during a session, operations return concrete values which can be used at the moment. This makes it easier to debug and interact with however optimizations in the graph and deployment become hard to reach

<a id = 'Tensors'> </a>

## Tensors

Everything in tensorflow is a tensor, which is a generalization of vectors. The way they work is that vectors are tensors of rank 1 and a tensor of rank n+1 is a set of tensors of rank 1. Yes, this does mean rank 2 tensors are matricies. This is very similar to how numpy arrays work and in fact most numpy methods persist to tensors. 

#### Creation/Basic info

All tensors are immutable (like numbers and strings) so you can't update its contents, rather you create new ones. Note that they must be rectangular and have a well defined shape (like 5 by 3 by 2). They also have a certain data type. To specify the data type just use the argument dtype, otherwise its inferred. To create a tensor use 
* <b>tf.constant(val)</b> => creates a tensor with specified val

To convert a tensor to a numpy array use 
* <b>tensor.numpy()</b> => return numpy form of array
or just do 
* <b>np.array(val)</b> => create numpy array from val 

In [2]:
tf.constant(5 + 3j)

<tf.Tensor: shape=(), dtype=complex128, numpy=(5+3j)>

In [3]:
tf.constant([[5,6],[1,5]], dtype = tf.float32)

<tf.Tensor: shape=(2, 2), dtype=float32, numpy=
array([[5., 6.],
       [1., 5.]], dtype=float32)>

In [4]:
tf.constant([1.2,4.5]).numpy()

array([1.2, 4.5], dtype=float32)

#### Math/Operations on tensors

Note: |# APPLIES| means that the two inputs can have different shapes, the values will just be broadcasted 

|# APPLIES| To add use + or
* <b>tf.add(a,b)</b> => return result of adding tensor a and b

To do elementwise multiplication (hadamard product) use * or
* <b>tf.multiply(a,b)</b> => return result of hadamard product of a and b

To apply one tensor to another (AB) use @ or
* <b>tf.matmul(a,b)</b> => apply tensor a to b - Note: ORDER MATTERS

|# APPLIES| To raise to some power use ** or 
* <b>tf.pow(a,b)</b> => return result of raising a to power of b

In [5]:
a = tf.constant([[1,0],[0,1]])
b = tf.constant([[3,4],[5,6]])
a+b, a*b, a@b, a**b

(<tf.Tensor: shape=(2, 2), dtype=int32, numpy=
 array([[4, 4],
        [5, 7]])>,
 <tf.Tensor: shape=(2, 2), dtype=int32, numpy=
 array([[3, 0],
        [0, 6]])>,
 <tf.Tensor: shape=(2, 2), dtype=int32, numpy=
 array([[3, 4],
        [5, 6]])>,
 <tf.Tensor: shape=(2, 2), dtype=int32, numpy=
 array([[1, 0],
        [0, 1]])>)

#### Cont.

To find the largest value
* <b>tf.reduce_max(t)</b> => Find max value in t

To find the index of the largest value
* <b>tf.argmax(t)</b> => Find the index of the max value in t

Other standard operations also exist. For example, to find the softmax of a tensor
* <b>tf.nn.softmax(t)</b> => Apply softmax to every element in the tensor

In [6]:
t = tf.constant([5,6,4], dtype=tf.float32)
tf.reduce_max(t), tf.argmax(t), tf.nn.softmax(t)

(<tf.Tensor: shape=(), dtype=float32, numpy=6.0>,
 <tf.Tensor: shape=(), dtype=int64, numpy=1>,
 <tf.Tensor: shape=(3,), dtype=float32, numpy=array([0.24472848, 0.66524094, 0.09003057], dtype=float32)>)

#### Other notes

Due to tensor being so similar to numpy, brodcasting and indexing works the same. Also reshape() and transpose() still work with the exception that both are tf methods so you pass in the tensor as an argument too. Both are very fast operation too.

In [7]:
a = tf.constant([[5,6,7],[1,2,3]])
print(a+1, a[0,2].numpy())
tf.reshape(a,[3,2]), tf.transpose(a)

tf.Tensor(
[[6 7 8]
 [2 3 4]], shape=(2, 3), dtype=int32) 7


(<tf.Tensor: shape=(3, 2), dtype=int32, numpy=
 array([[5, 6],
        [7, 1],
        [2, 3]])>,
 <tf.Tensor: shape=(3, 2), dtype=int32, numpy=
 array([[5, 1],
        [6, 2],
        [7, 3]])>)

<a id='Variables'> </a>

## Variables

#### Basic info

Variables are how you store values that are meant to change over time. In Tensorflow they look and act like a tensor (because they are backed by tensors) so shape, dtype, and numpy persist. All tensor operations also work HOWEVER none of them are inplace so if you try to transpose/reshape a variable you won't actually change.

* <b>tf.Variable(val, dtype = None)</b> => create a variable with an intial value of val and data type of dtype if specified

In [8]:
v = tf.Variable([[1,2.0]])
print(v)
tf.transpose(v)
print(v) # Still the same!!

<tf.Variable 'Variable:0' shape=(1, 2) dtype=float32, numpy=array([[1., 2.]], dtype=float32)>
<tf.Variable 'Variable:0' shape=(1, 2) dtype=float32, numpy=array([[1., 2.]], dtype=float32)>


#### Cont.

Once you have a variable and want to edit it, instead of recreating a Variable instance you can just use
* <b>v.assign(val)</b> => overwrites value variable used to have with val. NOTE shape and dtype stay the same. So you can't assign something funky.

You can also add or subtract the variable using
* <b>v.assign_add(val)</b> => adds val to previous value of variable. (Previous value gets overwritten again) 
* <b>v.assign_sub(val)</b> => above but subtraction

In [9]:
v = tf.Variable(5.1)
v.assign(3)
v.assign_sub(1)
v # Note how dtype is still int. If initial value was 5 instead of 5.1 this would be int32 instead of float32.

<tf.Variable 'Variable:0' shape=() dtype=float32, numpy=2.0>

#### Lifecycle, Naming, and Watching

Variables have the same lifecycle as Python object, when there are no references to an object it is automatically deallocated.

You can also name variables to help with debugging and tracking them (also useful for tensorboard!). Note that two variables with the same name are still not equal, they are just grouped together. Use the name parameter
* <b>tf.Variable(..., name = None,...)</b> => Make variable with name of name to help track it. Useful for debugging and TensorBoard

In addition, all variables have a parameter trainable set to True which means that their values and interactions are also watched (to help compute gradients later). To save that extra memory and computation you can just set it to False. 
* <b>tf.Variable(..., trainable = True,...)</b> => Make variable which gets watched or not watched (by GradientTape coming later).


In [10]:
a = tf.Variable(4, name = 'Variable', trainable = False)

<a id = 'Automatic Differentiation and Gradients'></a>

## Automatic Differentiation and Gradients

#### Basic Info

AutoDiff is something that allows you to compute gradients easily. It works by recording the computations you make into a tree like structure and then traversing backward through the graph when computing gradients. 

Tensorflow uses the GradientTape API to let users take control over this. Under the context of a GradientTape object you perform the operations relative to the gradient calculation and then after it you use 
* <b>tape.gradient(target, sources)</b> => target and source can be list of or individual tensor. For each tensor in source, it returns sum of partial derivatives of each target in gradients with respect to source tensor. If the target is a tensor, it will also return the sum with respect to the targets elements. For example, if you had MSE and regularization, you would put those two in target or put them in a tensor and pass that to target and then put the weights in sources.

Note that if the source is not connected to the target in any way the gradient will just return None. If you want 0 instead you can use
* <b>tape.gradient(..., unconnected_gradients=tf.UnconnectedGradients.ZERO, ...)</b> => makes unconnected gradients what you pass in instead of None. Type must derived from tf.UnconnectedGradients. Example argument could be tf.UnconnectedGradients.ZERO

In [11]:
x = tf.Variable(3, dtype = tf.float32) # Important to use floats otherwise you get an warning (3 here would make dtype int)

#Context of tape
with tf.GradientTape() as tape:
    y1 = x**2
    y2 = tf.abs(x)

#Get Gradient
tape.gradient(target = [y1, y2], sources = x) #dy/dx + dz/dx = 2x+sgn(x) = 6+1 = 7

<tf.Tensor: shape=(), dtype=float32, numpy=7.0>

In [12]:
#Another example
error = tf.Variable([3.0,4.0])

with tf.GradientTape() as tape:
    loss = 1/2*tf.norm(error)**2 # 1/2*||error||^2 => derivative is [x, y]

tape.gradient(loss, error)

<tf.Tensor: shape=(2,), dtype=float32, numpy=array([3., 4.], dtype=float32)>

In [32]:
x = tf.Variable([[1,3.0]])
w = tf.Variable([[4,5.0],[1,2]])
b = tf.Variable([[2,2.0]])
with tf.GradientTape() as tape:
    y = x@w + b
tape.gradient(y,[x,b]) #This time a list because there are multiple sources!

[<tf.Tensor: shape=(1, 2), dtype=float32, numpy=array([[9., 3.]], dtype=float32)>,
 <tf.Tensor: shape=(1, 2), dtype=float32, numpy=array([[1., 1.]], dtype=float32)>]

#### Computing gradients with respect to a model

Collecting all the variables you have into one place (like tf.Module, tf.keras.layers.Layer, tf.keras.Model of which both derive from tf.Module), makes things much more organized. You can get all the sources from
* <b>layer.trainable_variables</b> => returns a list containing all trainable variables that layer has

In [14]:
#Don't worry about the Dense, its just a place where all the variables are collected!
layer = tf.keras.layers.Dense(2, activation = 'relu') 
x = tf.constant([[1,2,3,4]], dtype=tf.float32)

with tf.GradientTape() as tape:
    y = layer(x)  #Since we don't define the # of neurons in the input layer passing in x creates the layer and returns y!
    loss = tf.reduce_mean(y**2) # a^2/2 + b^2/2
    
grad = tape.gradient(loss, layer.trainable_variables) 

for var, g in zip(layer.trainable_variables, grad):
    print(f'{var.name}, shape: {g.shape}')

dense/kernel:0, shape: (4, 2)
dense/bias:0, shape: (2,)


#### Controlling what the tape keep tracks of

Note that using gradient tape takes up memory and a bit of overhead due to it recording and storing everything. 

The default behavior is to only record the operations that a trainable tf.Variable makes.
* <b>tf.Variable(..., trainable = True, ...)</b> => the trainable parameter decides whether the variable is going to be tracked or not 

So if you tried to get the gradient with respect to a constant that wouldn't work (you would get None). Note that a variable + a tensor returns a tensor so if you want to add to a variable make sure you use assign add

To see what variables the tape is currently watching use 
* <b>tape.watched_variables()</b> => returns a list of the variables currently being watched
    
To tell the tape to watch a specific constant or non trainable variable use
* <b>tape.watch(v)</b> => Forces the tape to make sure its watching v as well

If you want the tape to stop watching all trainable variables and just not watch anything use 
* <b>tf.GradientTape(..., watch_accessed_variables=False, ...)</b> => Disables default tape behavior 

In [15]:
a = tf.Variable([1., 2])
b = tf.constant([1, 2.])
c = tf.Variable([1, 2.], trainable = False)
d = tf.Variable([1, 2.]) + 1
with tf.GradientTape() as tape:
    tape.watch(b)
    y = a**2 + b**2 + c**2
    
tape.gradient(y, [a,b,c,d])

[<tf.Tensor: shape=(2,), dtype=float32, numpy=array([2., 4.], dtype=float32)>,
 <tf.Tensor: shape=(2,), dtype=float32, numpy=array([2., 4.], dtype=float32)>,
 None,
 None]

#### Control Flow

Note that if you have if or else statements inside a tape's context everything will work fine. The tape just records the operations so the control flow logic will be invisible to the tape. 

#### Common Errors

You can get a gradient of None when you don't expect it to happen. Here are some common errors:
1. Replacing variable with tensor (use assign_add but not +1) => GradientTape() doesn't watch tensors
2. Doing calculations outside of tensorflow => tensorflow can't record calculation done outside of tensorflow 
3. Taking gradients of an integer or string => dtype is int. No example shown here
4. Taking gradients through a stateful object => A tensor is immutable so it has a value but no state. Doing operations on them create new tensors and everyone's happy. With variables, they do have a state, which becomes a wall preventing you from going further back

If you are expecting gradients of None and want to make them return some other value you can use
* <b>tape.gradient(..., unconnected_gradients, ...)</b> => Change None gradients to what you pass in. Try to subclass tf.UnconnectedGradients

In [20]:
ex_1 = tf.Variable(2.0)

for epoch in range(2):
    with tf.GradientTape() as tape:
        y = ex_1+1

    print(type(ex_1).__name__, ":", tape.gradient(y, ex_1))
    ex_1 = ex_1 + 1   # This should be `ex_1.assign_add(1)`

ResourceVariable : tf.Tensor(1.0, shape=(), dtype=float32)
EagerTensor : None


In [5]:
ex_2 = tf.Variable([3.1,2.9])
with tf.GradientTape() as tape:
    
    outside_tf = np.mean(ex_2)
    bad = tf.reduce_mean(outside_tf)

print(tape.gradient(bad, ex_2))

None


In [6]:
ex_4 = tf.Variable(3.0)
x1 = tf.Variable(0.0)

with tf.GradientTape() as tape:
    x1.assign_add(ex_4) # Update x1 = x1 + x0.
    
    # The tape starts recording from x1.
    y = x1**2   # y = (x1 + ex_4)**2

print(tape.gradient(y, ex_4))   #dy/dx0 = 2*(x1 + x2)

None


In [21]:
x = tf.Variable(3.1)
with tf.GradientTape() as tape:
    y = tf.Variable(4.0)
    
tape.gradient(y, x, unconnected_gradients = tf.UnconnectedGradients.ZERO)

<tf.Tensor: shape=(), dtype=float32, numpy=0.0>

<a id = 'Graphs and Functions'> </a>

## Graphs and Functions

#### About Graphs
Running eagerly is great but there are a bunch of speedups you can see when you start to use graphs. Graphs are data structures which contain tf.Operation (represent units of computation) and tf.Tensor (units of data that flow between operations). They are defined under a tf.Graph context and due to being data structures they can be run and saved without the original python code.  Creating graphs allows you to use them in places that don't have a python interpreter like a phone. They are also easily optimized and can give you significant speedups (although with small operations they are already optimized so it won't be that much better). In short, when creating and testing use eager execution but at the end creating a graph can make things much faster.

Creating a graph essentially has two parts to it. Tracing and using. When you first create a graph tensorflow will have to trace the graph so it may take some additional time. On the other hand, next time you use your graph all the speedups will make it significantly faster. Note that graphs are traced for different data types and different branches of control flow. By default, tf.autograph is used to convert loops and flow control into graphs. 

#### Tracing a graph

To create a graph you can simply use
* <b>tf.function(func)</b> => makes a graph out of func
* <b>@tf.function =></b> Apply this decorator to your function 

A tf.function will also recursively trace any function that it calls.

In [30]:
def outside(x,w,b):
    return x@w +b

#@tf.function - could have also done this
def layer_op(b):
    x1 = tf.constant([[1.0, 2.0]])
    w1 = tf.constant([[2.0], [3.0]])
    return outside(x1, w1, b)

graph_func = tf.function(layer_op) #Outside was traced too!
graph_func(tf.constant(4.0)).numpy()

array([[12.]], dtype=float32)

#### Inspecting Polymorphic Functions

When you trace a function it becomes polymorphic which means that it stays callables and encapsulates several concrete functions behind one API. Every time you pass in new argument signatures (ex: different dtype or shape), the graph is retraced and then the tf.Graph object is stored in a concrete_function. You can view all the concrete functions with 
* <b>func.pretty_printed_concrete_signatures()</b> => Prints all of the available traces the function has (refer to <a href='https://www.tensorflow.org/guide/function'>function</a> for this one)

To retrieve and handle these concrete functions use 
* <b>func.get_concrete_function(spec)</b> => if exists, gets concrete function that fits specification in spec
* <b>tf.TensorSpec(shape=, dtype=)</b> => Way to create specifications on a tensor, what you pass into above. If you have a None in a dimension in shape it will be flexible to any shape in that dimension

To limit the tracing that occurs with a functions you can also use 
* <b>tf.function(..., input_signature=spec_tuple, ...)</b> => restricts tracing that occurs with function using specifications tuple. Will raise ValueError if inputs don't match. If you had an LSTM for example, passing in None would stop a new graph from being created every time a new input shape is seen

Note: Examples not shown refer to basics guide for more

#### Python Side Effects

Switching from this tracing and Pythonic side of view has a couple side effects. As you know, the first time you call a function whose arguments are new it gets traced. This means that all the python code inside runs as well! However, the next time you call the function, the traced version is run instead meaning that any "Python code" (printing, appending to lists, mutating globals, etc...) in the function doesn't run. Other features like Python generators and iterators work using the Python runtime so things like advancing an iterator's state also don't work with functions. If you want to avoid this it is better to use tensorflow ops such as <b>tf.print()</b>, <b>tf.gather()</b>, <b>tf.stack()</b>, and <b>tf.TensorArray()</b>. If you still want to use python side effects you can use 
* <b>tf.py_function()</b> => Converts normal function to tensorflow function that is capable of dealing with python side effects. Problem is that it isn't that good with speedups and isn't portable. 

Examples not shown here. See Debugging/tensorflow section for more

#### Looping

A common pitfall is to loop over Python/Numpy data within a tf.function. As you might expect, this looping will only occur during a tracing process, adding a copy of your model each time you loop over it. If you really want the training loop inside your function you can wrap your data around a <b>tf.data.Dataset</b> object and then AutoGraph will automatically be able to deal with it. See the <a href='https://www.tensorflow.org/guide/data'>data guide</a> for this. Another thing you might want to do is accumulate intermediary values from a loop. Use <b>tf.TensorArray()</b> for this kind of thing. 

#### Debugging 

Debugging code is generally easier in eager mode than in in graph mode so make sure your functions work properly in eager mode before decorating with <b>tf.function()</b>. To ensure that your running or not running in eager mode you can use 
* <b>tf.config.run_functions_eagerly(bool)</b> => makes sure eager mode is on or off depending on bool 

If errors are only occuring with <b>tf.function()</b> you can try a couple things. For example, <b>print()</b> helps with tracing issues, <b>tf.print()</b> always executes and can help you track intermediate values, and <b>tf.debugging.enable_check_numerics()</b> lets you check where NaNs and Infs are made.

In [7]:
# Using python side effects cleverly to detect when tracing occurs. Note that each new integer argument creates a new trace. 
# This is because tensorflow uses their value as a way to distinguish them
@tf.function
def a_function_with_python_side_effect(x):
    print("Tracing!")  
    tf.print("Running!")
    return x * x + tf.constant(2)

print(a_function_with_python_side_effect(tf.constant(2)))
print(a_function_with_python_side_effect(tf.constant(3)))

print(a_function_with_python_side_effect(2))
print(a_function_with_python_side_effect(3))

Tracing!
Running!
tf.Tensor(6, shape=(), dtype=int32)
Running!
tf.Tensor(11, shape=(), dtype=int32)
Tracing!
Running!
tf.Tensor(6, shape=(), dtype=int32)
Tracing!
Running!
tf.Tensor(11, shape=(), dtype=int32)


#### Variables

If you create a new variable inside a function, due to trace resuse the variable might not be created every time. This behavior defers from what would happen in eager mode so to avoid the ambiguity a ValueError is raised. To avoid any issues just make sure the variables are only created the first time the function is executed.

In [10]:
# How NOT to do things
@tf.function()
def f():
    v = tf.Variable(5.0)
    return v
try:
    f()
except ValueError:
    print("Expected")
    
# How things should be done   
class Count(tf.Module):
    def __init__(self):
        self.count = None

    @tf.function
    def __call__(self):
        if not self.count: #Needed to make sure variable isn't recreated everytime 
            self.count = tf.Variable(0)
        return self.count.assign_add(1)

c = Count()
print(c())
print(c())

Expected
tf.Tensor(1, shape=(), dtype=int32)
tf.Tensor(2, shape=(), dtype=int32)


Another problem you might face is with the garbage collector. Since tensorflow only retains WeakRefs to a variable you must keep a reference to it at all time

In [4]:
external_var = tf.Variable(3)
@tf.function
def f(x):
    return x * external_var

traced_f = f.get_concrete_function(4)
print("Calling concrete function...")
print(traced_f(4))

del external_var
print()
print("Calling concrete function after garbage collecting its closed Variable...")
try: 
    traced_f(4)
except:
    print("FailedPreconditionError happened. Can't do operation when part of it doesn't exist!")

Calling concrete function...
tf.Tensor(12, shape=(), dtype=int32)

Calling concrete function after garbage collecting its closed Variable...
FailedPreconditionError happened. Can't do operation when part of it doesn't exist!


<a id = 'Modules, layers, and models'></a>

## Modules, layers, and models

To do machine learning in tensorflow you will likely need to define, save, and restore a model. A model is something that can compute a forward pass and update its contents in response to that pass

#### Defining models and layers in tensorflow

Models are usually made up of layers. Layers are functions with a known mathematical structure that can be reused and have trainable variables. Most high level implementations of layers and models are built on one foundational class, <b>tf.Module</b>

When inheriting <b>tf.Module</b> you normally want to write init. Inside of init make sure you accept an optional name argument and call super with name passed in. You also want to somehow call the module so writing call is also a good idea although you can use any function you want. 

By subclassing <b>tf.Module</b>, any <b>tf.Variable</b> or <b>tf.Module</b> instances assigned to this object's properties are automatically collected. This allows you to save and load variables, and also create collections of <b>tf.Modules</b>. You can also set trainability of variables through your module and change that any time. 

A couple things you can do with tf.Module:
* <b>module.submodules</b> => property containing all the modules inside the current module (all the submodules)
* <b>module.trainable_variables</b> => property containing all the variables that gradient tape will watch
* <b>module.variables</b> => property containing all the variables that exist in the module


In [5]:
class SimpleModule(tf.Module):
    
    def __init__(self, name=None):
        super().__init__(name = name)
        self.train = tf.Variable(5.0, name = 'trainable')
        self.non_train = tf.Variable(1.0, trainable = False, name = 'not trainable')
        
    def __call__(self, x):
        return self.train*x + self.non_train 

class finalModule(tf.Module):
    
    def __init__(self, name = None):
        super().__init__(name = name)
        self.sm1 = SimpleModule()
        self.sm2 = SimpleModule()
        
    def __call__(self, x):
        x = self.sm1(x)
        return self.sm2(x)

fm = finalModule('simple')

print(fm.submodules,'\n')
print(fm.trainable_variables,'\n')
print(fm.variables,'\n')

#Using GradientTape()
with tf.GradientTape() as tape:
    alpha = 1
    x = tf.Variable(3.0)
    y = fm(x)
    loss = tf.abs(y)
    
for v, dv in zip(fm.trainable_variables, tape.gradient(y, fm.trainable_variables)):
    v.assign_sub(dv*alpha) # plain ol' gradient descent

(<__main__.SimpleModule object at 0x000002B5F680B910>, <__main__.SimpleModule object at 0x000002B5ED65DB80>) 

(<tf.Variable 'trainable:0' shape=() dtype=float32, numpy=5.0>, <tf.Variable 'trainable:0' shape=() dtype=float32, numpy=5.0>) 

(<tf.Variable 'not trainable:0' shape=() dtype=float32, numpy=1.0>, <tf.Variable 'trainable:0' shape=() dtype=float32, numpy=5.0>, <tf.Variable 'not trainable:0' shape=() dtype=float32, numpy=1.0>, <tf.Variable 'trainable:0' shape=() dtype=float32, numpy=5.0>) 



#### Waiting to create variables

In practice, the shape of the inputs to a layer might not be known or may be changed later. To be flexible about this you can be a bit clever and defer this until the first input is passed in. This flexibility is also present in Keras and its good practice to have. Note that we use
* <b>tf.random.normal(shape, name = None)</b> => creates an array from a normal distribution using arguments
* <b>tf.zeros(shape, dtype, name = None)</b> => creates an array of zeros with argument specifications

and the second way of doing things

* <b>tf.random_normal_initializer()</b> => creates an object that can be used to initalize variables with random values
* <b>tf.zeros_initializer()</b> => creates an object that can be used to initalize variables with values of zero
* <b>initializer(shape)</b> => creates array using initializer with given shape

and finally create our variable with 
* <b>tf.Variable(initial_value, dtype,))</b> => Create a variable with starting values. Important to specify dtype as the intial values could be zero as well

NOTE: DONT PASS IN Keras.layers.Input(). This makes it a lambda layer and weird stuff happens. Do that when you subclass keras :)

In [60]:
class FlexibleDense(tf.Module):
    
    def __init__(self, outshape, name = None):
        super().__init__(name = name)
        self.outshape = outshape
        self.is_built = False
        
    def __call__(self, x):
        if not self.is_built:
            w_init = tf.random_normal_initializer()
            self.W = tf.Variable(initial_value = w_init([x.shape[-1], self.outshape]),dtype= tf.float32, name='w')
            
            b_init = tf.zeros([self.outshape], name = 'b', dtype=tf.float32)
            self.B = tf.Variable(initial_value = b_init, name = 'b')
            
            self.is_built = True
            
        return tf.nn.tanh(x @ self.W + self.B)
    
x = tf.constant([[1.0, 2]])
f = FlexibleDense(3)

with tf.GradientTape() as tape:
    y = f(x)
    loss = tf.abs(y-tf.zeros(y.shape))

print(tape.gradient(loss, f.trainable_variables))

(<tf.Tensor: shape=(3,), dtype=float32, numpy=array([-0.99993557,  0.9827529 , -0.9952498 ], dtype=float32)>, <tf.Tensor: shape=(2, 3), dtype=float32, numpy=
array([[-0.99993557,  0.9827529 , -0.9952498 ],
       [-1.9998711 ,  1.9655058 , -1.9904996 ]], dtype=float32)>)


#### Saving a module
You can visit the <a href = 'https://www.tensorflow.org/guide/intro_to_modules#saving_weights'>docs</a> for that one. Since the end goal is the keras functional api imma cut this one short.

As a another note, there is more to come with subclassing when looking at keras but for now this should suffice

## Process:

With any application these are the steps you will usually need to follow

1. Start by getting your data ready. You need to preprocess your data and turn it into a tf.data object


2. Create your model. This process can either be very easy but not as flexible or it can be more lengthy but to your purpose. Spectrum below
    - Sequential API + built in layers - new users and simple models
    - Functional API + built in layers - engineers with standard use cases
    - Functional API + custom layers, metrics, losses - engineers requiring increasing control
    - Subclassing: do everything yourself - researchers


3. Train the model. Distribute it across cpus, gpus, tpus, and even computers!
    - Set up TensorBoard to help analyze for the future (optional)


4. Serialize model to save it (can store in a model repository too - use TensorFlow Hub) 


5. Deploy model. For each case listed below use API on the right
    - Browser and Node: TensorFlow.js
    - Android and iOS: TensorFlow Lite
    - Cloud, on perm: TensorFlow Serving

<a id = 'Index'> </a>

## Index

<a href = '#Top'>Back to top?</a>

#### <a href = '#Tensors'>Section: Tensors</a>

* <b>tf.constant(val)</b> => creates a tensor with specified val
* <b>tensor.numpy()</b> => return numpy form of array
* <b>np.array(val)</b> => create numpy array from val 
* <b>tf.add(a,b)</b> => return result of adding tensor a and b
* <b>tf.multiply(a,b)</b> => return result of hadamard product of a and b
* <b>tf.matmul(a,b)</b> => apply tensor a to b - Note: ORDER MATTERS
* <b>tf.pow(a,b)</b> => return result of raising a to power of b
* <b>tf.reduce_max(t)</b> => Find max value in t
* <b>tf.argmax(t)</b> => Find the index of the max value in t
* <b>tf.nn.softmax(t)</b> => Apply softmax to every element in the tensor

#### <a href = '#Variables'>Section: Variables</a>

* <b>tf.Variable(val, dtype = None)</b> => create a variable with an intial value of val and data type of dtype if specified
* <b>v.assign(val)</b> => overwrites value variable used to have with val. NOTE shape and dtype stay the same. So you can't assign something funky.
* <b>v.assign_add(val)</b> => adds val to previous value of variable. (Previous value gets overwritten again) 
* <b>v.assign_sub(val)</b> => above but subtraction
* <b>tf.Variable(..., name = None,...)</b> => Make variable with name of name to help track it. Useful for debugging and TensorBoard
* <b>tf.Variable(..., trainable = True,...)</b> => Make variable which gets watched or not watched (by GradientTape coming later).


#### <a href = '#Automatic Differentiation and Gradients'>Section: Automatic Differentiation and Gradients</a>

* <b>tape.gradient(target, sources)</b> => target and source can be list of or individual tensor. For each tensor in source, it returns sum of partial derivatives of each target in gradients with respect to source tensor. If the target is a tensor, it will also return the sum with respect to the targets elements. For example, if you had MSE and regularization, you would put those two in target or put them in a tensor and pass that to target and then put the weights in sources.
* <b>tape.gradient(..., unconnected_gradients, ...)</b> => makes unconnected gradients what you pass in instead of None. Type must derived from tf.UnconnectedGradients. Example argument could be tf.UnconnectedGradients.ZERO
* <b>layer.trainable_variables</b> => returns a list containing all trainable variables that layer has
* <b>tf.Variable(..., trainable = True, ...)</b> => the trainable parameter decides whether the variable is going to be tracked or not 
* <b>tape.watched_variables()</b> => returns a list of the variables currently being watched
* <b>tape.watch(v)</b> => Forces the tape to make sure its watching v as well
* <b>tf.GradientTape(..., watch_accessed_variables=False, ...)</b> => Disables default tape behavior 

#### <a href = '#Graphs and Functions'>Section: Graphs and Functions</a>

* <b>tf.function(func)</b> => makes a graph out of func
* <b>@tf.function =></b> Apply this decorator to your function 
* <b>func.pretty_printed_concrete_signatures()</b> => Prints all of the available traces the function has (refer to <a href='https://www.tensorflow.org/guide/function'>function</a> for this one)
* <b>func.get_concrete_function(spec)</b> => if exists, gets concrete function that fits specification in spec
* <b>tf.TensorSpec(shape=, dtype=)</b> => Way to create specifications on a tensor, what you pass into above. If you have a None in a dimension in shape it will be flexible to any shape in that dimension
* <b>tf.function(..., input_signature=spec_tuple, ...)</b> => restricts tracing that occurs with function using specifications tuple. Will raise ValueError if inputs don't match. If you had an LSTM for example, passing in None would stop a new graph from being created every time a new input shape is seen    
* <b>tf.py_function()</b> => Converts normal function to tensorflow function that is capable of dealing with python side effects. Problem is that it isn't that good with speedups and isn't portable.   
* <b>tf.config.run_functions_eagerly(bool)</b> => makes sure eager mode is on or off depending on bool  


<b>tf.debugging.enable_check_numerics()</b>  
<b>tf.print()</b>   
<b>tf.gather()</b>  
<b>tf.stack()</b>  
<b>tf.TensorArray()</b>   
<b>tf.data.Dataset</b>  

#### <a href = '#Modules, layers, and models'>Section: Modules, layers, and models</a>

* <b>module.submodules</b> => property containing all the modules inside the current module (all the submodules)
* <b>module.trainable_variables</b> => property containing all the variables that gradient tape will watch
* <b>module.variables</b> => property containing all the variables that exist in the module 
* <b>tf.random.normal(shape, name = None)</b> => creates an array from a normal distribution with shape shape and name name
* <b>tf.zeros(shape, dtype, name = None)</b> => creates an array of zeros with argument specifications
* <b>tf.random_normal_initializer()</b> => creates an object that can be used to initalize variables with random values
* <b>tf.zeros_initializer()</b> => creates an object that can be used to initalize variables with values of zero
* <b>initializer(shape)</b> => creates array using initializer with given shape
* <b>tf.Variable(initial_value, dtype,))</b> => Create a variable with starting values. Important to specify dtype as the intial values could be zero as well

<b>tf.Module</b>  
<b>tf.Variable</b>  