# Getting Started Lessons

The objective of these lessons are to gently introduce some core concepts and basic usage of Tensorflow.  There are several common misunderstandings that hold independent learners back, such as the difference between the Tensorflow code and the Tensorflow graph, which can take weeks or months to figure out and can be very fustrating to learn on ones own.

The following concepts will be explored:

- executing the *Code* vs the *Graph*
- reversing the *Graph*
- a definition of *Model*, *Learning Rate*, and *Loss*
- a better way to structure code
- how tensorflow can be used to *Learn* variables in a *Graph*

For each lesson open the corresponding file, read the code and the comments, then run and modify the code to get a feel for it.

## Lesson 0 - A review

- Use python to calculate a x b = c

In [1]:
# we will use just plain old python for variables
# and get it to do something very straight forward

a = 5.0
b = 7.0

# -- induction --
# Multiply a by b
c = a * b


# let's check out a, b, and c
print("a =", a)
print("b =", b)
print("c =", c)

a = 5.0
b = 7.0
c = 35.0


## Lesson 1 - Just make something happen

We will build on the last example and do the same thing
but using Tensorflow.  We will define a and b as tensorflow constants.

Concepts :
- Make something super simple
- Address #1 mental hurdle right away
- 'a' isn't equal to 5, 'b' isn't equal to 7 !
- Code the *graph* then run the *graph*

In [2]:
# -- imports --
import tensorflow as tf
from tensorflow.compat.v1 import Session

# -- constants --
a = tf.constant(5.0)
b = tf.constant(7.0)

# -- induction --
# Multiply a by b
c = tf.multiply(a, b)

# start a session
sess = Session()

### let's check out a, b, and c
notice that when we print them out we do not get their values,
that is because we need to evaluate them

In [3]:
print("a =", a)
print("b =", b)
print("c =", c)

a = Tensor("Const:0", shape=(), dtype=float32)
b = Tensor("Const_1:0", shape=(), dtype=float32)
c = Tensor("Mul:0", shape=(), dtype=float32)


We can see that a,b, and c are not actually values.  We need to run them in a tensorflow session to get their value

In [4]:
# let's do the multiplication
print("The result of 5x7 is", sess.run(c))
# that evaluates c

The result of 5x7 is 35.0


In [5]:
# let's evaluate a and b
print("The value of a is", sess.run(a))
print("The value of b is", sess.run(b))

The value of a is 5.0
The value of b is 7.0


Some discussion about why we don't use python variables and instead have to use tensorflow variables that need evaluation

The main use-case of tensorflow is to learn better values (like the values of a,b, or c) by using a process called gradient descent.
We will explore how to do this in Lesson 4.  For now, what is important to remember is that for tensorflow to be able to do gradient
descent it needs to know how to calculate each value.  In the above we defined 

c = tf.multiply(a, b)

This doesn't actually multipy a and b, but instead defines how to calculate c.  We will see in Lesson 4 that tensorflow can
use this information to learn better values of a if a*b != c.


## Lesson 2 - Same thing but a little better

Our code will be really limited in use if all we can use is constants in tensorflow

Here we introduce how we can pass different values using placeholders
and passing the values in using the feed_dict argument in sess.run

Concepts :
- placeholder instead of constants
- reuse the same *graph*

In [6]:
from tensorflow.compat.v1 import placeholder

# -- variables --
a = placeholder(dtype=tf.float32)
b = placeholder(dtype=tf.float32)

# -- induction --
# Multiply a by b
c = tf.multiply(a, b)

# start a session
sess = Session()

print("The result of 5x7 is", sess.run(c, feed_dict={a: 5, b: 7}))
print("The result of 2x3 is", sess.run(c, feed_dict={a: 2, b: 3}))


The result of 5x7 is 35.0
The result of 2x3 is 6.0


## Lesson 3 - Same thing but a little better

- Variable vs Constant vs Placeholder
- Change variable

In [7]:
from tensorflow.compat.v1 import global_variables_initializer, assign

# -- variables -- (we'll refer to constants as variables from now on)
a = tf.Variable(5.0)
b = tf.constant(7.0)

# -- induction --
# Multiply a by b
c = tf.multiply(a, b)

# start a session
sess = Session()

# we have to initialize variables
sess.run(global_variables_initializer())

# let's do the multiplication
print("The result of ", sess.run(a), "x", sess.run(b), "is", sess.run(c))

print("Lets assign 10 to A and try this again")
sess.run(assign(a, 10))
print("The result of ", sess.run(a), "x", sess.run(b), "is", sess.run(c))


The result of  5.0 x 7.0 is 35.0
Lets assign 10 to A and try this again
The result of  10.0 x 7.0 is 70.0


## Lesson 4 - In Reverse

- Tensorflow is used to *learn*
- for *ab = c*, what if we know *b* and *c*?
- use Tensorflow to estimate *a*
- what happens if the learning rate is too high or too low?

In [8]:
from tensorflow.compat.v1.train import GradientDescentOptimizer
learning_rate=0.001
iterations=100
target=30

# -- variables --
a = tf.Variable(5.0)
b = tf.constant(7.0)

# -- induction --
# Multiply a by b
c = tf.multiply(a, b)

### Loss
In machine learning, we often call the error function the 'loss'.
We want c to equal the target so we calculate a loss accordingly.
In this case we will use the square of the difference as the error.

In [9]:
# -- loss --
loss = tf.square(c - target)

We want tensorflow to learn from the loss (the error) and
this is how we do that

In [10]:
optimizer = GradientDescentOptimizer(learning_rate)
learn = optimizer.minimize(loss)

In [11]:
# start a session
sess = Session()

# initialize the variables
sess.run(global_variables_initializer())

In [12]:
# let's do the multiplication
print("The result of ", sess.run(a), "x", sess.run(b), "is", sess.run(c), ", but we want it to =", target)

print()
print("We will use tensorflow to 'learn' the variable a.")
print()

print('-'*40)
print('Iteration  | Result a*b=c')
print('-'*40)
for iteration in range(1, iterations + 1):

    # learn a
    sess.run(learn)

    # print out what is going on
    if iteration < 10 or (iteration % 10 == 0 and iteration < 100) or (iteration % 100 == 0 and iteration < 1000) or iteration % 1000 == 0:
        print(f'{iteration:9}  | {sess.run(a):.2f} * {sess.run(b):.2f} = {sess.run(c):.2f}')

The result of  5.0 x 7.0 is 35.0 , but we want it to = 30

We will use tensorflow to 'learn' the variable a.

----------------------------------------
Iteration  | Result a*b=c
----------------------------------------
        1  | 4.93 * 7.00 = 34.51
        2  | 4.87 * 7.00 = 34.07
        3  | 4.81 * 7.00 = 33.67
        4  | 4.76 * 7.00 = 33.31
        5  | 4.71 * 7.00 = 32.99
        6  | 4.67 * 7.00 = 32.69
        7  | 4.63 * 7.00 = 32.43
        8  | 4.60 * 7.00 = 32.19
        9  | 4.57 * 7.00 = 31.98
       10  | 4.54 * 7.00 = 31.78
       20  | 4.38 * 7.00 = 30.64
       30  | 4.32 * 7.00 = 30.23
       40  | 4.30 * 7.00 = 30.08
       50  | 4.29 * 7.00 = 30.03
       60  | 4.29 * 7.00 = 30.01
       70  | 4.29 * 7.00 = 30.00
       80  | 4.29 * 7.00 = 30.00
       90  | 4.29 * 7.00 = 30.00
      100  | 4.29 * 7.00 = 30.00


And there we have it, we used Tensorflow to discover
that a has to equal about 4.29 if b = 7 and c = 30.

If you know algebra you may be wondering why we didn't just solve for a.

This is a general technique that can be used to solve larger problems that do
not have an algebraic solution, like recognizing a cat in a photo.

Starting with something small that we can verify another way helps us understand it.


## Lesson 5 - In Reverse with Gradient

- Same as before but let's look at the gradient

In [13]:
# we want to see what the optimizer is doing in this example,
# so we will get the computed gradients (error correction) from the optimizer and display them.
# usually we don't do this, but it will help us see what is going on
gradient = optimizer.compute_gradients(loss, var_list=[a])

# reinitialize the variables
sess.run(global_variables_initializer())

# let's do the multiplication
print("The result of ", sess.run(a), "x", sess.run(b), "is", sess.run(c), ", but we want it to =", target)

print()
print("We will use tensorflow to 'learn' the variable a.")
print()

print('-'*60)
print('Iteration  | gradient * learning rate | Result a * b = c       ')
print('           |     = correction to a    |')
print('-'*60)
print(f'{0:9}  |                          | {sess.run(a):.2f} * {sess.run(b):.2f} = {sess.run(c):.2f}')
for iteration in range(1, iterations + 1):

    # learn a
    sess.run(learn)

    # print out what is going on
    if iteration < 10 or (iteration % 10 == 0 and iteration < 100) or (iteration % 100 == 0 and iteration < 1000) or iteration % 1000 == 0:
        print(f'{iteration:9}  | {sess.run(gradient)[0][0]:5.2f} * {learning_rate:.4f} = {sess.run(gradient)[0][0]*learning_rate:.4f}  | {sess.run(a):.2f} * {sess.run(b):.2f} = {sess.run(c):.2f}')


The result of  5.0 x 7.0 is 35.0 , but we want it to = 30

We will use tensorflow to 'learn' the variable a.

------------------------------------------------------------
Iteration  | gradient * learning rate | Result a * b = c       
           |     = correction to a    |
------------------------------------------------------------
        0  |                          | 5.00 * 7.00 = 35.00
        1  | 63.14 * 0.0010 = 0.0631  | 4.93 * 7.00 = 34.51
        2  | 56.95 * 0.0010 = 0.0570  | 4.87 * 7.00 = 34.07
        3  | 51.37 * 0.0010 = 0.0514  | 4.81 * 7.00 = 33.67
        4  | 46.34 * 0.0010 = 0.0463  | 4.76 * 7.00 = 33.31
        5  | 41.80 * 0.0010 = 0.0418  | 4.71 * 7.00 = 32.99
        6  | 37.70 * 0.0010 = 0.0377  | 4.67 * 7.00 = 32.69
        7  | 34.00 * 0.0010 = 0.0340  | 4.63 * 7.00 = 32.43
        8  | 30.67 * 0.0010 = 0.0307  | 4.60 * 7.00 = 32.19
        9  | 27.67 * 0.0010 = 0.0277  | 4.57 * 7.00 = 31.98
       10  | 24.96 * 0.0010 = 0.0250  | 4.54 * 7.00 = 31.78
    

As you can see, the value of 'a' changes by the gradient * learning_rate at each iteration

The code : 

```
optimizer = GradientDescentOptimizer(args.learning_rate)
learn = optimizer.minimize(loss)
```

Tells tensorflow to use the error function (loss) to derive gradient and
create a function (learn) that will minimize the error.
We could use the value of gradient to do this ourselves, but we
well see that for more complicated graphs the above 2 lines does is more concise,
and infact we can reduce it to one line

```
learn = GradientDescentOptimizer(args.learning_rate).minimize(loss)
```

## Lesson 6 - Estimation

We will try to fit a line to the following data:
```
data_x = [1.0, 2.0, 3.0, 4.0, 5.0, 6.0, 7.0, 8.0, 9.0]
data_y = [0.1, 0.2, 0.4, 0.4, 0.6, 0.5, 0.7, 0.7, 0.9]
```
Note that a perfect line doesn't exist that reaches every point, and so we will find a line that has the least error.

Concepts:
- Estimate an imperfect line *y = ax + b* given *y* and *x*.
- *y = ax + b* is called our *model*

In [14]:
# -- variables --
# y = a * x + b
y = placeholder(dtype=tf.float32)
a = tf.Variable(0.0)
x = placeholder(dtype=tf.float32)
b = tf.Variable(0.0)

# -- induction --
# f(x) = a * x + b
fx = tf.add(tf.multiply(a, x), b)

# -- loss --
# but we want f(x) to equal y
# so we calculate a loss accordingly
loss = tf.square(fx - y)
learn = GradientDescentOptimizer(.001).minimize(loss)

In [15]:
# start a session
sess = Session()

# initialize the variables
sess.run(global_variables_initializer())

# define the training data
data_x = [1.0, 2.0, 3.0, 4.0, 5.0, 6.0, 7.0, 8.0, 9.0]
data_y = [0.1, 0.2, 0.4, 0.4, 0.6, 0.5, 0.7, 0.7, 0.9]

In [16]:
# let's calculate the total loss
def printTotalLoss():
    total_loss = 0
    for index in range(len(data_x)):
        total_loss += sess.run(loss, feed_dict={x: data_x[index], y: data_y[index]})
    print("the total loss is", total_loss)

print("\nBefore training",end=" ")
printTotalLoss()


Before training the total loss is 2.769999942742288


In [17]:
def getBetter():
    for index in range(len(data_x)):
        sess.run(learn, feed_dict={x: data_x[index], y: data_y[index]})


print("\nCalling get better")
for iteration in range(1, 1001):
    getBetter()
    if iteration == 1 or iteration == 10 or iteration == 100 or iteration == 1000:
        print("iteration", iteration, end=" ")
        printTotalLoss()


Calling get better
iteration 1 the total loss is 0.8336960133165121
iteration 10 the total loss is 0.036129510181126534
iteration 100 the total loss is 0.035100822218666394
iteration 1000 the total loss is 0.03401105559896678


In [18]:
print("\nWe calculated the equation is y =", sess.run(a), "* x +", sess.run(b))


We calculated the equation is y = 0.090352 * x + 0.047606114


We can look at the data for x and y and see that it is close!

We were able to use Tensorflow to fit a line to some data.

## Lesson 7 - Same thing but a lot better

The difference between this lesson and the previous
is that we will be using a matrix for 'x' and 'y'.

Previously 'x' and 'y' were a single value.  For training
we fed in 1 value for each at a time.  Tensorflow allows
for, and excels at, working with matrixies.

In this example 'x' and 'y' will be a 1D matrix of variable length.

Instead of feeding each pair of values of x and y to Tensorflow
we can instead feed all the values at once.

Concept:
- use the matrix

In [19]:
# -- variables --
# y = ax + b
y = placeholder(dtype=tf.float32, shape=[None])  
a = tf.Variable(0.0)
x = placeholder(dtype=tf.float32, shape=[None])
b = tf.Variable(0.0)

Notice that we define Y and x to have a shape "[none]"

This means that Y and x are a matrix of 1 dimension of unknown length.

In [20]:
# this is the same as the previous lesson

# -- induction --
# f(x) = ax + b
fx = tf.add(tf.multiply(a, x), b)

# -- loss --
# but we want f(x) to equal y
# so we calculate a loss accordingly
loss = tf.reduce_mean(tf.square(fx - y))
learn = GradientDescentOptimizer(.01).minimize(loss)

# start a session
sess = Session()

# initialize the variables
sess.run(global_variables_initializer())

In [21]:
# But here for training we can send ALL of data_x and data_y in one call

for iteration in range(1, 1001):
    sess.run(learn, feed_dict={x: data_x, y: data_y})
    if iteration == 1 or iteration == 10 or iteration == 100 or iteration % 1000 == 0:
        print("iteration", iteration,", total loss =", sess.run(loss, feed_dict={x: data_x, y: data_y}))

print("The equation is approximately f(x) =", sess.run(a), "* x +", sess.run(b))

print()

iteration 1 , total loss = 0.04140444
iteration 10 , total loss = 0.0040128585
iteration 100 , total loss = 0.00388984
iteration 1000 , total loss = 0.0037778467
The equation is approximately f(x) = 0.09009025 * x + 0.049432103



### incase you missed it
This went A LOT faster than feeding 1 pair of values at a time.  Why is that important?
Well, for this toy problem it really doesn't make a difference.  For larger problems
you'll want your training to *only* take hours instead of weeks.  Seriously, it's that much faster.

## Lesson 8 - Same thing with a curve

Okay, let's try something a little more complicated than a line.  Let's do this equation `y = ax^2 + bx + c` on some new data

```
data_x = [1., 2., 3., 4., 5., 6., 7., 8., 9., 10.0, 11.0, 12.0, 13.0, 14.0, 15.0, 16.0, 17.0, 18.0, 19.0]
data_y = [1., 2., 4., 4., 6., 5., 7., 7., 9., 10.0, 9.50, 8.00, 7.00, 6.00, 4.00, 5.00, 3.00, 2.00, 1.00]
```

Notice that data_x keeps increasing as data_y increases then descreases.

In [22]:

# -- variables --
y = placeholder(dtype=tf.float32, shape=[None])
a = tf.Variable(0.0)
x = placeholder(dtype=tf.float32, shape=[None])
b = tf.Variable(0.0)
c = tf.Variable(0.0)


# -- induction --
fx = a * tf.square(x) + b * x + c

# -- loss --
# loss = tf.reduce_mean(tf.square(fx - y))
loss = tf.sqrt(tf.reduce_mean(tf.square(fx - y)))
learn = GradientDescentOptimizer(.0001).minimize(loss)

# start a session
sess = Session()

# initialize the variables
sess.run(global_variables_initializer())

data_x = [1.0, 2.0, 3.0, 4.0, 5.0, 6.0, 7.0, 8.0, 9.0, 10.0, 11.0, 12.0, 13.0, 14.0, 15.0, 16.0, 17.0, 18.0, 19.0]
data_y = [1.0, 2.0, 4.0, 4.0, 6.0, 5.0, 7.0, 7.0, 9.0, 10.0, 9.50, 8.00, 7.00, 6.00, 4.00, 5.00, 3.00, 2.00, 1.00]

print()
for iteration in range(1, 5001):
    sess.run(learn, feed_dict={x: data_x, y: data_y})
    if iteration == 1 or iteration == 10 or iteration == 100 or iteration % 1000 == 0:
        print(f"iteration {iteration:5}, the average loss = {sess.run(loss, feed_dict={x: data_x, y: data_y}):.2f}")

print()
print(f"The approximate equation is f(x) = {sess.run(a):.3f} * x^2 + {sess.run(b):.3f} * x + {sess.run(c):.3f}")
print()



iteration     1, the average loss = 5.07
iteration    10, the average loss = 4.74
iteration   100, the average loss = 4.67
iteration  1000, the average loss = 3.95
iteration  2000, the average loss = 3.18
iteration  3000, the average loss = 2.45
iteration  4000, the average loss = 1.81
iteration  5000, the average loss = 1.49

The approximate equation is f(x) = -0.058 * x^2 + 1.275 * x + 0.235

