## Coffee Roasting neural network

In [None]:
import numpy as np
import matplotlib.pyplot as plt
import tensorflow as tf
from tensorflow.keras.layers import Dense, Input, Normalization # type: ignore
from tensorflow.keras.models import Sequential # type: ignore
from tensorflow.keras.losses import BinaryCrossentropy # type: ignore
from tensorflow.keras.optimizers import Adam # type: ignore
from lab_coffee_utils import load_coffee_data, plt_roast, plt_layer, plt_network, plt_output_unit

# Allows us to manage and control the log messages
import logging
# Sets the logging level for tensorflow to show only errors
logging.getLogger("tensorflow").setLevel(logging.ERROR)
# Controls the verbosity of tensorflow autograph i.e; python code to tensorflow graph code
tf.autograph.set_verbosity(0)
# Disables internal logging or output messages

In [None]:
x_train, y_train=load_coffee_data()
print(x_train.shape, y_train.shape)

Plot the coffee roasting data below. The two features are Temperature in Celsius and Duration in minutes. `Coffee Roasting at Home` suggests that the duration is best kept between 12 and 15 minutes while the temp should be between 175 and 260 degrees Celsius.

In [None]:
plt_roast(x_train, y_train)

### Normalizing the data

| `axis` | Description                              | Example Shape       | Normalization Happens Across |
| ------ | ---------------------------------------- | ------------------- | ---------------------------- |
| `-1`   | Last axis → typically **features**       | `(batch, features)` | each **column/feature**      |
| `0`    | First axis → typically **samples/batch** | `(batch, features)` | across **rows/samples**      |

In [None]:
print(f"Temperature max, min pre normalization: {np.max(x_train[:,0]):0.2f}, {np.min(x_train[:,0]):0.2f}")
print(f"Duration max, min pre normalization: {np.max(x_train[:,1]):0.2f}, {np.min(x_train[:,1]):0.2f}")

norm_l=Normalization(axis=-1)
norm_l.adapt(x_train)
x_norm=norm_l(x_train)

print(f"Temperature max, min post normalization: {np.max(x_norm[:,0]):0.2f}, {np.min(x_norm[:,0]):0.2f}")
print(f"Duration max, min post normalization: {np.max(x_norm[:,1]):0.2f}, {np.min(x_norm[:,1]):0.2f}")

### Tiling the data

In [None]:
x_tiled=np.tile(x_norm, (1000, 1))
y_tiled=np.tile(y_train, (1000, 1))
print(x_tiled.shape, y_tiled.shape)

### Tensorflow model

In [None]:
tf.random.set_seed(1234)

model=Sequential(
  [
    Input(shape=(2,)), # specifies the expected shape of the input
    Dense(units=3, activation='sigmoid', name="L1"),
    Dense(units=1, activation='sigmoid', name="L2")
  ]
)

In [None]:
model.summary()

### 📊 Model Summary Breakdown

#### 🔹 Layer L1 (Dense with 3 units, input shape = 2)

* Each of the 3 neurons receives input from **2 features**.
* Each neuron has:

  * **2 weights** (one per input)
  * **1 bias**
* So per neuron: `2 (weights) + 1 (bias) = 3`
* For 3 neurons: `3 × 3 = 9` parameters

✅ **L1 total parameters: 9**

---

#### 🔹 Layer L2 (Dense with 1 unit, input from 3 outputs of L1)

* The single neuron in L2 receives input from **3 neurons** in L1.
* It has:

  * **3 weights**
  * **1 bias**
* Total: `3 + 1 = 4`

✅ **L2 total parameters: 4**

---

### 🧠 Summary Table

| Layer | Inputs | Neurons | Weights per Neuron | Biases | Total Parameters |
| ----- | ------ | ------- | ------------------ | ------ | ---------------- |
| L1    | 2      | 3       | 2                  | 3      | 2×3 + 3 = **9**  |
| L2    | 3      | 1       | 3                  | 1      | 3×1 + 1 = **4**  |


In [None]:
W1, b1=model.get_layer("L1").get_weights()
W2, b2=model.get_layer("L2").get_weights()

print(W1, W1.shape, b1, b1.shape)
print(W2, W2.shape, b2, b2.shape)

# Shape of W will be (number of input features, number of units)

- The `model.compile` statement defines a loss function and specifies a compile optimization.
- The `model.fit` statement runs gradient descent and fits the weights to the data.

In [None]:
model.compile(
  loss=BinaryCrossentropy(),
  optimizer=Adam(learning_rate=0.01)
)

In [None]:
model.fit(x_tiled, y_tiled, epochs=10)

### Epochs and batches
In the `compile` statement above, the number of `epochs` was set to 10. This specifies that the entire data set should be applied during training 10 times.  During training, you see output describing the progress of training that looks like this:
```
Epoch 1/10
6250/6250 [==============================] - 6s 910us/step - loss: 0.1782
```
The first line, `Epoch 1/10`, describes which epoch the model is currently running. For efficiency, the training data set is broken into 'batches'. The default size of a batch in Tensorflow is 32. There are 200000 examples in our expanded data set or 6250 batches. The notation on the 2nd line `6250/6250 [====` is describing which batch has been executed.

In [None]:
W1, b1=model.get_layer("L1").get_weights()
W2, b2=model.get_layer("L2").get_weights()

print("Updated parameters:")
print(W1, b1)
print(W2, b2)

In [None]:
W1=np.array([
  [-8.94, 0.29, 12.89],
  [-0.17, -7.34, 10.79]])
b1=np.array([-9.87, -9.28,  1.01])

W2=np.array([
  [-31.38],
  [-27.86],
  [-32.79]])
b2 = np.array([15.54])

model.get_layer("L1").set_weights([W1, b1])
model.get_layer("L2").set_weights([W2, b2])

In [None]:
x_test=np.array([
  [200, 13.9],
  [200, 17]
])

x_test_norm=norm_l(x_test)
predictions=model.predict(x_test_norm)
print(f"Predictions: {predictions}")

In [None]:
y_pred=np.zeros_like(predictions)

for i in range(len(predictions)):
  if predictions[i]>=0.5:
    y_pred[i]=1
  else:
    y_pred[i]=0

print(f"Decisions: {y_pred}")

In [None]:
plt_layer(x_train, y_train.reshape(-1,), W1, b1, norm_l)

The shading shows that each unit is responsible for a different `bad roast` region. Unit 0 has larger values when the temperature is too low. Unit 1 has larger values when the duration is too short. Unit 2 has larger values for bad combinations of duration and temperature.

In [None]:
plt_output_unit(W2, b2)

High output values correspond to `bad roast` areas. Below, the maximum output is in areas where the three inputs are small values corresponding to `good roast` areas.

In [None]:
netf=lambda x: model.predict(norm_l(x))
plt_network(x_train, y_train, netf)

The left graph is the raw output of the final layer represented by the blue shading. This is overlaid on the training data represented by the X's and O's.   

The right graph is the output of the network after a decision threshold. The X's and O's here correspond to decisions made by the network.  
The following takes a moment to run.