# Introduction to TensorFlow
This notebook will focus on the fundamentals of the TensorFlow platform (Python) in a follow-along style tutorial. 
## 1. Handle Imports
First we need to import TensorFlow. It is common practice to import it as "tf" and to also check the version you are using. 

In [1]:
import tensorflow as tf
print(tf.__version__)

2.10.0


## 2. Load a dataset
Load and prepare the MNIST dataset. Pixel values for the images range from 0-255, and we will scale these values to a range of 0-1 by dividing by 255.0. This also converts the data to floating-point rather than integer. 

In [2]:
mnist = tf.keras.datasets.mnist
(x_train, y_train), (x_test, y_test) = mnist.load_data()
x_train, x_test = x_train / 255.0, x_test / 255.0

Downloading data from https://storage.googleapis.com/tensorflow/tf-keras-datasets/mnist.npz


## 3. Build a machine learning model
Now we can build a sequential model using `tf.keras.Sequential`:

In [3]:
model = tf.keras.models.Sequential([
    tf.keras.layers.Flatten(input_shape=(28, 28)),
    tf.keras.layers.Dense(128, activation='relu'),
    tf.keras.layers.Dropout(0.2),
    tf.keras.layers.Dense(10)
])

`Sequential` models are useful for stacking layers where each layer has one input tensor and on output tensor. Layers are functions with a known mathematical structure that can be reused and have trained variables. This model uses the `Flatten, Dense, Dropout` layers. 

For each example, the model returns a vector of logits or log-odds scores, one for each class.

In [8]:
predictions = model(x_train[:1]).numpy()
predictions

array([[ 0.5344286 , -0.14030938,  0.49713415,  0.26344085,  0.45553884,
         0.16627784, -0.26635483,  0.7346546 ,  0.1916005 , -0.11848602]],
      dtype=float32)

The `tf.nn.softmax` function converts these logits to probabilities for each class:

In [6]:
tf.nn.softmax(predictions).numpy()

array([[0.12899223, 0.06569443, 0.12427013, 0.09837279, 0.1192071 ,
        0.08926427, 0.05791455, 0.15758707, 0.09155355, 0.06714386]],
      dtype=float32)

Note that it is possible to bake the function directly into the activation function for the last layer of the network. This makes the model output easier to understand, but this is discouraged as it is impossible to provide an exact and stable loss calculation for all models using softmax output. 

Define a loss function for training using `losses.SparseCategoricalCrossentropy`:

In [9]:
loss_function = tf.keras.losses.SparseCategoricalCrossentropy(from_logits=True)

The loss function takes a vector of ground truth values and vector of logits and returns a scalar loss for each example. This loss is equal to the negative log probability of the true class: The loss is zero if the model is sure of the correct class.

This untrained model gives probabilities close to random (1/10 for each class), so the initial loss should be close to `-tf.math.log(1/10) ~= 2.3`.

In [10]:
loss_function(y_train[:1], predictions).numpy()

2.416154