# Inference in Code
- TensorFow is one of the leading frameworks to implementing deep learning algorithms.
- One of the remarkable things about neural networks is the same algorithm can be applied to so many different applications.
- How can we do inference in a neural network to get it to tell us whether or not this temperature and duration setting will result in good coffee or not?
- ![image.png](attachment:image.png)
- ![image-2.png](attachment:image-2.png)
    - Here we create Layer 1 as the first hidden layer, the neural network, as dense open parenthesis units 3, that means 3 units or 3 hidden units in this layer using as the activation function, the sigmoid function. Dense is another name for the layers of a neural network.
    - Next, we compute a1 by taking layer 1, which is actually a function, and applying this function Layer 1 to the values of x. That's how we get a1, which is going to be a list of 3 number becuase layer 1 has 3 units
    - Next, for the second hidden layer, Layer 2, would be dense. Now this time it has one unit and again to sigmoid activation function, and we can then compute a2 by applying this Layer 2 function to the activationn values from Layer 1 to a1. This will give us a2.
- These are key steps for forward propagation in how we compute a1 and a2. 
- Handwritten digit classification problem
    ![image-3.png](attachment:image-3.png)
    - x is a list of the pixel intensity values. So x is equal to a numpy array of the list of pixel intensity values.
    - Then to intialize and carray out one step of forward propagation, Layer 1 is a dense layer with 25 units and the sigmoid activation function. We then compute a1 equals the Layer1 function applied to x.
    - Then a2 as Layer 2 applied to a1. 
    - Then finally, Layer 3 is the third and final dense lauyer.

# Data in TensorFlow
- ![image.png](attachment:image.png)
- ![image-2.png](attachment:image-2.png)
- ![image-3.png](attachment:image-3.png)
- A tensor here is a data type that the TensorFlow team has created in order to stroe and carry out computations on matrices efficiently. So whenever we see tensor just think of that matrix. In fact the a1 is a tensor.
    ![image-4.png](attachment:image-4.png)
    ![image-5.png](attachment:image-5.png)

# Building a neural network
- ![image.png](attachment:image.png)
- Instead of manually taking the data and passing it to layer one and then taking the activations from layer one and pass it to layer two. We can instead tell the tensorflow that we would like it to take layer one and layer two and string them together to form neural network. That is what sequantial function in TensorFlow does.
    ![image-2.png](attachment:image-2.png)
- So model predicts carries out forward propagation and carries an inference for us, using this neural network that we compiled using sequential function.
- By convention we could write a code like this
    ![image-3.png](attachment:image-3.png)
- Digit Classification model example : 
    ![image-4.png](attachment:image-4.png)

# Lab : Coffee Roasting in Tensorflow 
- Building a small neural network using TensorFlow
    ![image.png](attachment:image.png)

In [2]:
import numpy as np
import matplotlib.pyplot as plt
import tensorflow as tf
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dense

In [None]:
X, Y = load_coffe_data();

In [8]:
X = np.array([[185.32,  12.69],
       [259.92,  11.87],
       [231.01,  14.41],
       [175.37,  11.72],
       [187.12,  14.13],
       [225.91,  12.1 ],
       [208.41,  14.18],
       [207.08,  14.03],
       [280.6 ,  14.23],
       [202.87,  12.25],
       [196.7 ,  13.54],
       [270.31,  14.6 ],
       [192.95,  15.2 ],
       [213.57,  14.28],
       [164.47,  11.92],
       [177.26,  15.04],
       [241.77,  14.9 ],
       [237.  ,  13.13],
       [219.74,  13.87],
       [266.39,  13.25],
       [270.45,  13.95],
       [261.96,  13.49],
       [243.49,  12.86],
       [220.58,  12.36],
       [163.59,  11.65],
       [244.76,  13.33],
       [271.19,  14.84],
       [201.99,  15.39],
       [229.93,  14.56],
       [204.97,  12.28],
       [173.19,  12.22],
       [231.51,  11.95],
       [152.69,  14.83],
       [163.42,  13.3 ],
       [215.95,  13.98],
       [218.04,  15.25],
       [251.3 ,  13.8 ],
       [233.33,  13.53],
       [280.24,  12.41],
       [243.02,  13.72],
       [155.67,  12.68],
       [275.17,  14.64],
       [151.73,  12.69],
       [151.32,  14.81],
       [164.9 ,  11.73],
       [282.55,  13.28],
       [192.98,  11.7 ],
       [202.6 ,  12.96],
       [220.67,  11.53],
       [169.97,  12.34],
       [209.47,  12.71],
       [232.8 ,  12.64],
       [272.8 ,  15.35],
       [158.02,  12.34],
       [226.01,  14.58],
       [158.64,  12.24],
       [211.66,  14.17],
       [271.95,  14.97],
       [257.16,  11.71],
       [281.85,  13.96],
       [161.63,  12.52],
       [233.8 ,  13.04],
       [210.29,  14.72],
       [261.24,  13.69],
       [256.98,  13.12],
       [281.56,  13.92],
       [280.64,  11.68],
       [269.16,  13.74],
       [246.34,  12.27],
       [224.07,  12.66],
       [164.24,  11.51],
       [272.42,  14.18],
       [177.68,  12.53],
       [212.86,  14.77],
       [165.88,  15.37],
       [277.43,  12.48],
       [236.51,  12.94],
       [244.14,  11.85],
       [213.45,  13.85],
       [234.57,  14.27],
       [270.34,  12.47],
       [170.68,  13.06],
       [226.79,  15.34],
       [245.92,  14.45],
       [281.32,  12.57],
       [185.03,  13.19],
       [189.88,  14.1 ],
       [278.48,  12.11],
       [219.92,  14.21],
       [216.58,  15.15],
       [249.48,  15.03],
       [165.09,  12.28],
       [158.87,  14.82],
       [279.98,  11.56],
       [256.55,  14.41],
       [272.61,  12.58],
       [246.49,  12.45],
       [160.26,  14.48],
       [155.7 ,  14.3 ],
       [188.27,  13.45],
       [270.36,  12.47],
       [213.22,  12.92],
       [175.7 ,  13.39],
       [174.52,  14.7 ],
       [233.  ,  12.63],
       [281.37,  12.88],
       [240.62,  14.43],
       [185.81,  11.55],
       [270.5 ,  15.33],
       [172.98,  12.11],
       [208.41,  13.89],
       [283.51,  15.35],
       [283.36,  12.48],
       [230.85,  13.24],
       [181.24,  11.76],
       [172.78,  12.93],
       [161.88,  12.1 ],
       [156.03,  13.99],
       [216.52,  12.47],
       [221.06,  13.2 ],
       [238.99,  15.23],
       [197.69,  14.08],
       [179.55,  15.26],
       [233.39,  12.13],
       [184.7 ,  12.14],
       [174.18,  12.73],
       [261.11,  13.33],
       [187.42,  13.18],
       [186.1 ,  14.43],
       [157.94,  12.66],
       [193.64,  12.23],
       [249.65,  12.22],
       [190.56,  11.73],
       [252.  ,  12.96],
       [238.55,  12.37],
       [152.94,  12.79],
       [255.17,  14.85],
       [197.09,  14.89],
       [156.8 ,  13.59],
       [184.75,  13.26],
       [179.92,  15.07],
       [190.79,  15.28],
       [164.73,  13.22],
       [209.87,  14.34],
       [196.58,  13.47],
       [159.51,  12.74],
       [247.87,  11.92],
       [212.44,  12.45],
       [172.34,  11.99],
       [259.87,  14.25],
       [201.23,  13.07],
       [248.34,  13.92],
       [273.66,  15.18],
       [215.09,  14.14],
       [223.53,  12.74],
       [211.22,  14.38],
       [224.61,  14.03],
       [215.75,  15.31],
       [254.82,  12.02],
       [259.9 ,  15.17],
       [260.25,  12.87],
       [199.67,  12.47],
       [157.52,  13.39],
       [264.81,  14.58],
       [239.4 ,  14.89],
       [238.98,  12.39],
       [258.43,  12.97],
       [270.16,  12.81],
       [162.41,  14.42],
       [164.53,  14.98],
       [205.61,  14.62],
       [157.1 ,  13.68],
       [241.38,  12.02],
       [232.13,  12.07],
       [191.04,  12.96],
       [233.64,  12.02],
       [174.95,  14.63],
       [246.64,  13.32],
       [188.07,  14.27],
       [213.16,  12.75],
       [268.08,  12.31],
       [258.58,  13.97],
       [237.21,  14.23],
       [251.02,  15.02],
       [274.28,  12.52],
       [172.12,  15.09],
       [177.52,  12.39],
       [258.71,  15.36],
       [264.01,  13.57],
       [200.71,  15.45],
       [249.37,  14.02],
       [151.5 ,  12.28],
       [151.82,  15.13],
       [181.92,  12.18],
       [228.65,  12.31],
       [223.78,  15.3 ],
       [266.63,  12.48],
       [273.68,  13.1 ],
       [220.61,  12.8 ],
       [284.99,  12.73]])
Y = np.array([[1.],
       [0.],
       [0.],
       [0.],
       [1.],
       [1.],
       [0.],
       [0.],
       [0.],
       [1.],
       [1.],
       [0.],
       [0.],
       [0.],
       [0.],
       [0.],
       [0.],
       [0.],
       [0.],
       [0.],
       [0.],
       [0.],
       [0.],
       [1.],
       [0.],
       [0.],
       [0.],
       [0.],
       [0.],
       [1.],
       [0.],
       [0.],
       [0.],
       [0.],
       [0.],
       [0.],
       [0.],
       [0.],
       [0.],
       [0.],
       [0.],
       [0.],
       [0.],
       [0.],
       [0.],
       [0.],
       [0.],
       [1.],
       [0.],
       [0.],
       [1.],
       [1.],
       [0.],
       [0.],
       [0.],
       [0.],
       [0.],
       [0.],
       [0.],
       [0.],
       [0.],
       [0.],
       [0.],
       [0.],
       [0.],
       [0.],
       [0.],
       [0.],
       [1.],
       [1.],
       [0.],
       [0.],
       [1.],
       [0.],
       [0.],
       [0.],
       [0.],
       [0.],
       [0.],
       [0.],
       [0.],
       [0.],
       [0.],
       [0.],
       [0.],
       [1.],
       [1.],
       [0.],
       [0.],
       [0.],
       [0.],
       [0.],
       [0.],
       [0.],
       [0.],
       [0.],
       [0.],
       [0.],
       [0.],
       [1.],
       [0.],
       [1.],
       [1.],
       [0.],
       [1.],
       [0.],
       [0.],
       [0.],
       [0.],
       [0.],
       [0.],
       [0.],
       [0.],
       [0.],
       [0.],
       [0.],
       [0.],
       [0.],
       [1.],
       [1.],
       [0.],
       [0.],
       [0.],
       [1.],
       [1.],
       [0.],
       [0.],
       [1.],
       [0.],
       [0.],
       [1.],
       [0.],
       [0.],
       [0.],
       [1.],
       [0.],
       [0.],
       [0.],
       [0.],
       [1.],
       [0.],
       [0.],
       [0.],
       [0.],
       [1.],
       [0.],
       [0.],
       [1.],
       [0.],
       [0.],
       [1.],
       [0.],
       [0.],
       [0.],
       [1.],
       [0.],
       [0.],
       [0.],
       [0.],
       [0.],
       [0.],
       [1.],
       [0.],
       [0.],
       [0.],
       [1.],
       [0.],
       [0.],
       [0.],
       [0.],
       [0.],
       [0.],
       [1.],
       [1.],
       [1.],
       [1.],
       [0.],
       [0.],
       [1.],
       [1.],
       [0.],
       [0.],
       [0.],
       [0.],
       [0.],
       [0.],
       [1.],
       [0.],
       [0.],
       [0.],
       [0.],
       [0.],
       [0.],
       [1.],
       [1.],
       [0.],
       [0.],
       [0.],
       [1.],
       [0.]])

In [10]:
print(X.shape, Y.shape)

(200, 2) (200, 1)


- Let's plot the coffee roasting data. The 2 features are temperature in Celsius and Duration in minutes.
- ![image.png](attachment:image.png)

#### Normalize Data
- Fitting the weights to the data (backpropagation) will proceed more quickly if the data is normalized. This is the same procedure where features in the data are each normalized to have a similar range. The procedure below used a KEras normalization layer.
1. Create a 'Normalizationn Layer'. Note, as applied here, this not a layer in our model.
2. Adapt the data. This learns the mean and variance of the data set and saces the values internally.
3. Normalize the data.
- It is important to apply normalization to any future data that utilizes the learned model.

In [14]:
print(f"Temperature Max, Min pre normalization: {np.max(X[:,0]):0.2f}, {np.min(X[:,0]):0.2f}")
print(f"Duration    Max, Min pre normalization: {np.max(X[:,1]):0.2f}, {np.min(X[:,1]):0.2f}")

norm_l = tf.keras.layers.Normalization(axis=-1)
norm_l.adapt(X) # Learns mean, variance
Xn = norm_l(X)

print(f"Temperature Max, Min post normalization: {np.max(Xn[:,0]):0.2f}, {np.min(Xn[:,0]):0.2f}")
print(f"Duration    Max, Min post normalization: {np.max(Xn[:,1]):0.2f}, {np.min(Xn[:,1]):0.2f}")

Temperature Max, Min pre normalization: 284.99, 151.32
Duration    Max, Min pre normalization: 15.45, 11.51
Temperature Max, Min post normalization: 1.66, -1.69
Duration    Max, Min post normalization: 1.79, -1.70


Tile/Copy our data to increase the training set size and reduce the number of training epochs

In [20]:
Xt = np.tile(Xn, (1000, 1))
Yt = np.tile(Y, (1000, 1))
print(Xt.shape, Yt.shape)

(200000, 2) (200000, 1)


### Tensorflow Model
- Model : 
    ![image.png](attachment:image.png)
- Let's build the 'Coffe Roasting Network'. There are 2 ayers with sigmoid activations

In [22]:
tf.random.set_seed(1234)  #Applied to achive consistent results

model = Sequential(
    [
        tf.keras.Input(shape=(2,)),
        Dense(3, activation='sigmoid', name='layer1'),
        Dense(1, activation='sigmoid', name='layer2')
    ]
)

- Note 1 : The tf.Keras.Input(shape=(2,)), specifies the expected shape of the input. This allows Tensorflow to size the weights and bias parameters at this point. This is useful when exploring Tensorflow models. This statement can be omitted in practice and Tensorflow will size the network parameters when the input data is specified in the model.fit statement.
- Note 2 : Including the sigmoid activation in the final layer is not considered best practice. It would instead be accounted for in the loss which improves numerical stability.

The model.summar() provides a description of the network

In [23]:
model.summary()

Model: "sequential"
_________________________________________________________________
 Layer (type)                Output Shape              Param #   
 layer1 (Dense)              (None, 3)                 9         
                                                                 
 layer2 (Dense)              (None, 1)                 4         
                                                                 
Total params: 13 (52.00 Byte)
Trainable params: 13 (52.00 Byte)
Non-trainable params: 0 (0.00 Byte)
_________________________________________________________________


- The paramater counts shown in the summary correspond to the number of elements in the weight an bias arrays as shown

In [25]:
L1_num_params = 2*3 + 3 # W1 parameters + b1 parameters
L2_num_params = 3*1 + 1 # W2 parameters + b2 parameters
print("L1 params = ", L1_num_params, ", L2 params = ", L2_num_params)

L1 params =  9 , L2 params =  4


Let's examine the weights and biases Tensorflow has instatiated. The weights W should be of size (number of features in input, number of units in the layer) while the bias b size should match the number of units int the layer.
- In the 1st layer with 3 units, we expect W to have a size of (2, 3) and b should have 3 elements
- In the 2nd layer with 1 unit, we expect W to have a size of (3, 1) andn be should have 1 element

In [26]:
W1, b1 = model.get_layer("layer1").get_weights()
W2, b2 = model.get_layer("layer2").get_weights()
print(f"W1{W1.shape}:\n", W1, f"\nb1{b1.shape}:", b1)
print(f"W2{W2.shape}:\n", W2, f"\nb2{b2.shape}:", b2)

W1(2, 3):
 [[ 0.05343294  0.1238482  -0.49690902]
 [ 0.3276831  -0.9176431   0.94333005]] 
b1(3,): [0. 0. 0.]
W2(3, 1):
 [[ 1.1658789e+00]
 [-6.6375732e-04]
 [-5.1988494e-01]] 
b2(1,): [0.]


- The model.compile statement defines a loss function and specifgies a compile optimization.
- The model.fit statement runs gradient descent and fist the weights to the data.

In [28]:
model.compile(
    loss = tf.keras.losses.BinaryCrossentropy(),
    optimizer = tf.keras.optimizers.Adam(learning_rate=0.01),
)

model.fit(
    Xt, Yt,
    epochs=10,
)

Epoch 1/10

Epoch 2/10
Epoch 3/10
Epoch 4/10
Epoch 5/10
Epoch 6/10
Epoch 7/10
Epoch 8/10
Epoch 9/10
Epoch 10/10


<keras.src.callbacks.History at 0x203459c8510>

#### Epcohs and batches
- In the fit, the number of epochs was set to 10. This specifies that the entire data set should be applied during training 10 times.
- For efficiency, the training data set is broken into 'batches'. The default size of a batch in Tensorflow is 32. There are 200000 examples in our expanded data set or 6520 batches.

#### Updated weights
- After fitting, the weights have been updated

In [29]:
W1, b1 = model.get_layer("layer1").get_weights()
W2, b2 = model.get_layer("layer2").get_weights()
print("W1:\n", W1, "\nb1:", b1)
print("W2:\n", W2, "\nb2:", b2)

W1:
 [[12.70874    14.551171    0.06258822]
 [ 0.35390717 12.121884   -8.821391  ]] 
b1: [ 13.236463    1.8003536 -11.073933 ]
W2:
 [[ 38.378876]
 [-42.603134]
 [-44.794643]] 
b2: [-12.264797]


#### Predictions
- Once we have trained model, we can then use it to make predictions. Recall that the ouput of our model is a probability.
    ![image.png](attachment:image.png)
- Recall, we have normalized the input features, so we must normalize our test data as well.
- To make a prediction, we apply the predict method. 

In [30]:
X_test = np.array([
   [200,13.9],  # positive example
    [200,17]    # negative example 
])
X_testn = norm_l(X_test)
predictions = model.predict(X_testn)
print("predictions = \n", predictions)

predictions = 
 [[9.865704e-01]
 [6.842548e-08]]


- To conver the probabilities to a decision, we apply a threshold

In [32]:
yhat = np.zeros_like(predictions)
for i in range(len(predictions)):
    if predictions[i] >=0.5:
        yhat[i] = 1
    else:
        yhat[i] = 0
        
print(f"decisions = \n{yhat}")

decisions = 
[[1.]
 [0.]]


In [33]:
yhat = (predictions >= 0.5).astype(int)
print(f"decisions = \n{yhat}")

decisions = 
[[1]
 [0]]


### Layer functions
- Let's examine the fucntions of the units to determine their role in the coffee roasting decision. We will plot the output of each node for all values of the inputs (duration, temp). Each unit is a logistic function whose output can range from zero to one. The shading in the graph represent the ouput value.
    ![image.png](attachment:image.png)
- The shading shows that each unit is responsible for a different 'bad roast' region. 
    - Unit 0 has larger values when the temperature is too low. 
    - Unit 1 has larger values when the duration is too short.
    - Unit 2 has larger values for bad combinations of time/temp.
- It is worth noting that the network learned these functions on its own through the process of gradient descent. They are very much the same sort of fucntions as person might choose to make the same decisions.