# Working with Neural Networks using Tensorflow

<a target="_blank" href="https://colab.research.google.com/github/LuWidme/uk259/blob/05f7e58e35048d2ee227109791520f41d34b7343/demos/NN%20in%20tensowrflow.ipynb">
  <img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/>
</a>

In [27]:
import tensorflow.keras as ks
import tensorflow as tf
mnist = ks.datasets.mnist
(x_train, y_train), (x_test, y_test) = mnist.load_data() # 70'000 handwritten digits
# scale data to [0,1] range
x_train, x_test = x_train / 255.0, x_test / 255.0

In [28]:
#attention: usig matplot and tensorflow in the same noetbooks may cause crashes in some cases

# from matplotlib import pyplot as plt
# plt.imshow(x_train[0], interpolation='nearest')
# plt.show()

# print("Label: ", y_train[0])

## Build a machine learning model

Build a `tf.keras.Sequential` model by stacking layers.

In [29]:
model = ks.models.Sequential([
  #Input layer created implicitly
  ks.layers.Flatten(input_shape=(28, 28)),#  reshape input (28 x 28) array to 1-D array, creating 784 nodes, one for each pixel
  ks.layers.Dense(128, activation='relu'), # Dense: all (128) nodes are connected to all preceding nodes
  ks.layers.Dense(10) # this is our output layer, with one node for each number
])

model.summary()


For each example, the model returns a vector of [logits](https://developers.google.com/machine-learning/glossary#logits) or [log-odds](https://developers.google.com/machine-learning/glossary#log-odds) scores, one for each class.

In [30]:
predictions = model(x_train[:1]).numpy()
predictions

array([[ 0.12999497, -0.09825356, -0.45652062,  0.32020533, -0.3374755 ,
        -0.22148943, -0.42035022, -1.0699991 ,  0.05281308,  0.30377463]],
      dtype=float32)

The `tf.nn.softmax` function converts these logits to *probabilities* for each class:

In [31]:
tf.nn.softmax(predictions).numpy()

array([[0.12681693, 0.10093693, 0.07054346, 0.15338558, 0.07946161,
        0.08923382, 0.07314175, 0.03819675, 0.11739715, 0.15088594]],
      dtype=float32)

Note: It is possible to bake the `tf.nn.softmax` function into the activation function for the last layer of the network. While this can make the model output more directly interpretable, this approach is discouraged as it's impossible to provide an exact and numerically stable loss calculation for all models when using a softmax output.

Define a loss function for training using `losses.SparseCategoricalCrossentropy`, which takes a vector of logits and a `True` index and returns a scalar loss for each example.

In [32]:
loss_fn = tf.keras.losses.SparseCategoricalCrossentropy(from_logits=True)

This loss is equal to the negative log probability of the true class: The loss is zero if the model is sure of the correct class.

This *untrained model* gives probabilities close to random (1/10 for each class), so the initial loss should be close to `-tf.math.log(1/10) ~= 2.3`.

In [33]:
loss_fn(y_train[:1], predictions).numpy()

np.float32(2.416495)

Before you start training, configure and compile the model using Keras `Model.compile`. Set the [`optimizer`](https://www.tensorflow.org/api_docs/python/tf/keras/optimizers) class to `adam`, set the `loss` to the `loss_fn` function you defined earlier, and specify a metric to be evaluated for the model by setting the `metrics` parameter to `accuracy`.

In [34]:
model.compile(optimizer='adam',
              loss=loss_fn,
              metrics=['accuracy'])


## Train and evaluate your model
Now its tiem to **train** our model, meanning we use labled data to tune our model such that the predicted labels mach as closely as possible to the actual labels. We do this over many *steps* in multiple *epochs*;

* Epoch: A training epoch represents a complete use of all training data for gradients calculation and optimizations(train the model).

* Step: A training step means using one batch size of training data to train the model.

Use the `Model.fit` method to adjust your model parameters and minimize the loss (i.e. *train* the model). The training will take up to a minute and you can ovserve how the loss and accuracy on the test dataset change over time:

In [22]:
model.fit(x_train, y_train, epochs=5)

Epoch 1/5
[1m1875/1875[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m10s[0m 5ms/step - accuracy: 0.8797 - loss: 0.4293
Epoch 2/5
[1m1875/1875[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m9s[0m 5ms/step - accuracy: 0.9654 - loss: 0.1175
Epoch 3/5
[1m1875/1875[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m11s[0m 5ms/step - accuracy: 0.9780 - loss: 0.0764
Epoch 4/5
[1m1875/1875[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m10s[0m 5ms/step - accuracy: 0.9838 - loss: 0.0542
Epoch 5/5
[1m1875/1875[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m10s[0m 5ms/step - accuracy: 0.9870 - loss: 0.0429


<keras.src.callbacks.history.History at 0x7fa43ec8dc10>


Congartualtions! you just trained your first Neural Network model!


The `Model.evaluate` method checks the models performance, usually on a "[Validation-set](https://developers.google.com/machine-learning/glossary#validation-set)" or "[Test-set](https://developers.google.com/machine-learning/glossary#test-set)".

For this we let the model predict the label of a dataset it was not trained on and check how acurate it was.

In [23]:
model.evaluate(x_test,  y_test, verbose=2)

313/313 - 1s - 3ms/step - accuracy: 0.9751 - loss: 0.0822


[0.08216213434934616, 0.9750999808311462]

The image classifier is now trained to ~98% accuracy on this dataset. To learn more, read the [TensorFlow tutorials](https://www.tensorflow.org/tutorials/).

If you want your model to return a probability, you can wrap the trained model, and attach the softmax to it:

In [24]:
probability_model = tf.keras.Sequential([
  model,
  tf.keras.layers.Softmax()
])

For example, lets predict the label of the first entry in the test dataset

In [25]:
import numpy as np
Prediction= probability_model(x_test[:1])
print(Prediction)

print("True Label: {} \nPredicted Label: {} with probability: {:0.3f} %".format(np.argmax(Prediction),y_test[0] ,Prediction[0,np.argmax(Prediction)]*100))

tf.Tensor(
[[1.2858467e-07 1.3589779e-08 4.1818730e-06 6.2185507e-05 2.1534606e-10
  4.1019405e-07 3.7132336e-13 9.9993229e-01 1.2060902e-07 6.4210042e-07]], shape=(1, 10), dtype=float32)
True Label: 7 
Predicted Label: 7 with probability: 99.993 %


## Conclusion

Congratulations! You have trained a machine learning model using a prebuilt dataset using the [Keras](https://www.tensorflow.org/guide/keras/overview) API.

For more examples of using Keras, check out the [tutorials](https://www.tensorflow.org/tutorials/keras/). To learn more about building models with Keras, read the [guides](https://www.tensorflow.org/guide/keras). If you want learn more about loading and preparing data, see the tutorials on [image data loading](https://www.tensorflow.org/tutorials/load_data/images) or [CSV data loading](https://www.tensorflow.org/tutorials/load_data/csv).


*adapted from : https://www.tensorflow.org/tutorials/quickstart/beginner*

## Exercise

Load the [MNist Clothing dataset](https://www.tensorflow.org/datasets/catalog/fashion_mnist) and repeat the steps above to train a neural network model that can categorize different clothing items based on images. You may need to use a more complex network, or you can try to use [convolution layers]("https://www.tensorflow.org/api_docs/python/tf/keras/layers/Conv2D") instead of fully connected (Dense Layers) as they usually perform better on image-recognition tasks.

In [1]:
# Define a model
model_cnn = ks.models.Sequential([
    # Add a Conv2D layer: 32 filters, kernel size of 3x3, ReLU activation
    # Input shape needs to include color channel (28, 28, 1) for grayscale
    ks.layers.Conv2D(32, (3, 3), activation='relu', input_shape=(28, 28, 1)),
    # Add a MaxPooling2D layer: reduces spatial dimensions
    ks.layers.MaxPooling2D((2, 2)),
    # Add another Conv2D layer: 64 filters, kernel size of 3x3, ReLU activation
    ks.layers.Conv2D(64, (3, 3), activation='relu'),
    # Add another MaxPooling2D layer
    ks.layers.MaxPooling2D((2, 2)),
    # Flatten the output from the convolutional layers
    ks.layers.Flatten(),
    # Add a Dense layer with 128 nodes and ReLU activation
    ks.layers.Dense(128, activation='relu'),
    # Add the output Dense layer with 10 nodes (for 10 classes)
    ks.layers.Dense(10)
])

# Compile the CNN model
model_cnn.compile(optimizer='adam',
                  loss=tf.keras.losses.SparseCategoricalCrossentropy(from_logits=True),
                  metrics=['accuracy'])

# Print the summary of the CNN model
model_cnn.summary()

# Train the CNN model
x_train_cnn = x_train[..., tf.newaxis]
x_test_cnn = x_test[..., tf.newaxis]

model_cnn.fit(x_train_cnn, y_train, epochs=5)

predictions = model(x_train[:1]).numpy()
predictions

tf.nn.softmax(predictions).numpy()

loss_fn = tf.keras.losses.SparseCategoricalCrossentropy(from_logits=True)

loss_fn(y_train[:1], predictions).numpy()

model.compile(optimizer='adam',
              loss=tf.keras.losses.SparseCategoricalCrossentropy(from_logits=True),
              metrics=['accuracy'])

model.fit(x_train, y_train, epochs=50)

model.evaluate(x_test,  y_test, verbose=2)

probability_model = tf.keras.Sequential([
  model,
  tf.keras.layers.Softmax()
])

Prediction= probability_model(x_test[:1])
print(Prediction)

print("True Label: {} \nPredicted Label: {} with probability: {:0.3f} %".format(np.argmax(Prediction),y_test[0] ,Prediction[0,np.argmax(Prediction)]*100))

model.summary()

NameError: name 'ks' is not defined

In [60]:
# Train the CNN model
# Reshape x_train and x_test to include the channel dimension
x_train_cnn = x_train[..., tf.newaxis]
x_test_cnn = x_test[..., tf.newaxis]

model_cnn.fit(x_train_cnn, y_train, epochs=5)

Epoch 1/5
[1m1875/1875[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m64s[0m 33ms/step - accuracy: 0.7793 - loss: 0.6138
Epoch 2/5
[1m1875/1875[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m83s[0m 34ms/step - accuracy: 0.8832 - loss: 0.3152
Epoch 3/5
[1m1875/1875[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m79s[0m 32ms/step - accuracy: 0.9039 - loss: 0.2600
Epoch 4/5
[1m1875/1875[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m83s[0m 33ms/step - accuracy: 0.9165 - loss: 0.2249
Epoch 5/5
[1m1875/1875[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m79s[0m 31ms/step - accuracy: 0.9248 - loss: 0.1969


<keras.src.callbacks.history.History at 0x7fa399314390>

In [None]:
# Evaluate the CNN model
model_cnn.evaluate(x_test_cnn, y_test, verbose=2)