## Predicting coffee roasing temp and duration 

In [1]:
import numpy as np

In [2]:
rng = np.random.default_rng(2)
X = rng.random(400).reshape(-1,2)

In [3]:
## Now X is in range of 0 to 1

In [4]:
X[:,1] = X[:,1] * 4 + 11.5          # 12-15 min is best
X[:,0] = X[:,0] * (285-150) + 150  # 350-500 F (175-260 C) is best


Y = np.zeros(len(X))
    
i=0
for t,d in X:
    y = -3/(260-175)*t + 21
    if (t > 175 and t < 260 and d > 12 and d < 15 and d<=y ):
        Y[i] = 1
    else:
        Y[i] = 0
    i += 1

X = X
Y= Y.reshape(-1,1)
print(X)
print(Y)

[[185.31763812  12.69396457]
 [259.92047498  11.86766377]
 [231.01357101  14.41424211]
 [175.3666449   11.72058651]
 [187.12086467  14.12973206]
 [225.90586448  12.10024905]
 [208.40515676  14.17718919]
 [207.07593089  14.0327376 ]
 [280.60385359  14.23225929]
 [202.86935247  12.24901028]
 [196.70468985  13.54426389]
 [270.31327028  14.60225577]
 [192.94979108  15.19686759]
 [213.57283453  14.27503537]
 [164.47298664  11.91817423]
 [177.25750542  15.03779869]
 [241.7745473   14.89694529]
 [236.99889634  13.12616959]
 [219.73805621  13.87377407]
 [266.38592796  13.25274466]
 [270.45241485  13.95486775]
 [261.96307698  13.49222422]
 [243.4899478   12.8561015 ]
 [220.58184803  12.36489356]
 [163.59498627  11.65441652]
 [244.76317931  13.32572248]
 [271.19410986  14.84073282]
 [201.98784315  15.39471508]
 [229.9283715   14.56353326]
 [204.97123839  12.28467965]
 [173.18989704  12.2248249 ]
 [231.51374483  11.95053142]
 [152.68795109  14.83198786]
 [163.42050092  13.30233814]
 [215.94730737

In [5]:
import numpy as np
import tensorflow as tf
from tensorflow.keras.layers import Dense,Input
from tensorflow.keras import Sequential


### Normalize Data
Fitting the weights to the data (back-propagation, covered in next week's lectures) will proceed more quickly if the data is normalized. This is the same procedure you used in Course 1 where features in the data are each normalized to have a similar range. 
The procedure below uses a Keras [normalization layer](https://keras.io/api/layers/preprocessing_layers/numerical/normalization/). It has the following steps:
- create a "Normalization Layer". Note, as applied here, this is not a layer in your model.
- 'adapt' the data. This learns the mean and variance of the data set and saves the values internally.
- normalize the data.  
It is important to apply normalization to any future data that utilizes the learned model.

In [6]:
norm_l = tf.keras.layers.Normalization(axis=-1)
norm_l.adapt(X)  # learns mean, variance
Xn = norm_l(X)

In [7]:
## increase the training example by tile
Xt = np.tile(Xn,(1000,1))
Yt= np.tile(Y,(1000,1))   
print(Xt.shape, Yt.shape)   

(200000, 2) (200000, 1)


In [8]:
tf.random.set_seed(0)

In [9]:
model = Sequential(
    [
        tf.keras.Input(shape=(2,)),
        Dense(3, activation='sigmoid', name = 'layer1'),
        Dense(1, activation='sigmoid', name = 'layer2')
     ]
)

>**Note 1:** The `tf.keras.Input(shape=(2,)),` specifies the expected shape of the input. This allows Tensorflow to size the weights and bias parameters at this point.  This is useful when exploring Tensorflow models. This statement can be omitted in practice and Tensorflow will size the network parameters when the input data is specified in the `model.fit` statement.  
>**Note 2:** Including the sigmoid activation in the final layer is not considered best practice. It would instead be accounted for in the loss which improves numerical stability. This will be described in more detail in a later lab.

The `model.summary()` provides a description of the network:

In [10]:
model.summary()

Model: "sequential"
_________________________________________________________________
 Layer (type)                Output Shape              Param #   
 layer1 (Dense)              (None, 3)                 9         
                                                                 
 layer2 (Dense)              (None, 1)                 4         
                                                                 
Total params: 13
Trainable params: 13
Non-trainable params: 0
_________________________________________________________________


In [12]:
### How to get parameters numbers ; it depends on the input size and number of number of nuerons

## EXample layer 1 2 input 3 neuron layer. 

w_layer = 2 * 3 + 3 * 1 #9
b_layer = 1 * 3 + 1 * 1 #4

Let's examine the weights and biases Tensorflow has instantiated.  The weights $W$ should be of size (number of features in input, number of units in the layer) while the bias $b$ size should match the number of units in the layer:
- In the first layer with 3 units, we expect W to have a size of (2,3) and $b$ should have 3 elements.
- In the second layer with 1 unit, we expect W to have a size of (3,1) and $b$ should have 1 element.

In [15]:
W1, b1 = model.get_layer("layer1").get_weights()
print(W1)
print(b1)

[[-0.19685084 -0.71965426 -0.45032644]
 [-0.01329792  1.0376117  -0.5560321 ]]
[0. 0. 0.]


In [17]:
W2, b2 = model.get_layer("layer2").get_weights()
print(W2)
print(b2)

[[-0.67317605]
 [ 0.32792222]
 [ 1.0070678 ]]
[0.]


In [18]:
## Compile the model 
model.compile(
loss = tf.keras.losses.BinaryCrossentropy(),
    optimizer = tf.keras.optimizers.Adam(learning_rate=0.1)
)

In [20]:
model.fit(Xt,Yt, epochs=10)

Epoch 1/10
Epoch 2/10
Epoch 3/10
Epoch 4/10
Epoch 5/10
Epoch 6/10
Epoch 7/10
Epoch 8/10
Epoch 9/10
Epoch 10/10


<keras.callbacks.History at 0x1dcfae91880>

#### Epochs and batches
In the `fit` statement above, the number of `epochs` was set to 10. This specifies that the entire data set should be applied during training 10 times.  During training, you see output describing the progress of training that looks like this:
```
Epoch 1/10
6250/6250 [==============================] - 6s 910us/step - loss: 0.1782
```
The first line, `Epoch 1/10`, describes which epoch the model is currently running. For efficiency, the training data set is broken into 'batches'. The default size of a batch in Tensorflow is 32. There are 200000 examples in our expanded data set or 6250 batches. The notation on the 2nd line `6250/6250 [====` is describing which batch has been executed.

In [21]:
W1, b1 = model.get_layer("layer1").get_weights()
print(W1)
print(b1)

[[  0.4147566  -21.81393    -26.398855  ]
 [-17.372026    -0.47571805 -21.975174  ]]
[-20.760496 -22.34693   -4.144626]


In [22]:
W2, b2 = model.get_layer("layer2").get_weights()
print(W2)
print(b2)

[[-62.24599 ]
 [-62.254955]
 [ 61.625168]]
[-9.784358]


In [23]:
## store and set the weight to model to avoid rerunning every time the model to get the weights

W1 = np.array([
    [-8.94,  0.29, 12.89],
    [-0.17, -7.34, 10.79]] )
b1 = np.array([-9.87, -9.28,  1.01])
W2 = np.array([
    [-31.38],
    [-27.86],
    [-32.79]])
b2 = np.array([15.54])
model.get_layer("layer1").set_weights([W1,b1])
model.get_layer("layer2").set_weights([W2,b2])

In [24]:
## Making predictions
X_test = np.array([
    [200,13.9],  # postive example
    [200,17]])   # negative example
X_testn = norm_l(X_test)
predictions = model.predict(X_testn)
print("predictions = \n", predictions)


predictions = 
 [[9.625139e-01]
 [3.031606e-08]]


To convert the probability to decision 

In [25]:
yhat = np.zeros_like(predictions)
for i in range(len(predictions)):
    if predictions[i] >= 0.5:
        yhat[i] = 1
    else:
        yhat[i] = 0
print(f"decisions = \n{yhat}")

decisions = 
[[1.]
 [0.]]
