<a href="https://colab.research.google.com/github/iamomtiwari/Tensorflow-Basics/blob/main/BasicTF.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

In [1]:
import tensorflow as tf

In [2]:
tf.constant([[1,2],[3,4]])

<tf.Tensor: shape=(2, 2), dtype=int32, numpy=
array([[1, 2],
       [3, 4]], dtype=int32)>

In [3]:
tf.constant([[1,2,3],[4,5,6],[6,7,8]])

<tf.Tensor: shape=(3, 3), dtype=int32, numpy=
array([[1, 2, 3],
       [4, 5, 6],
       [6, 7, 8]], dtype=int32)>

This is how to create matrix in tensorflow. The idea here is to make constant and then give array for 3 rows.

In [4]:
tf.zeros((3,2))

<tf.Tensor: shape=(3, 2), dtype=float32, numpy=
array([[0., 0.],
       [0., 0.],
       [0., 0.]], dtype=float32)>

This is how to create a zero matrix by defining zeros abd the shape of matrix.

In [5]:
tf.zeros((2,3))

<tf.Tensor: shape=(2, 3), dtype=float32, numpy=
array([[0., 0., 0.],
       [0., 0., 0.]], dtype=float32)>

In [6]:
import tensorflow as tf
import numpy as np
from tensorflow import keras
from tensorflow.keras import layers
from tensorflow.keras.datasets import mnist

In [7]:
(x_train,y_train),(x_test,y_test)=mnist.load_data()

Downloading data from https://storage.googleapis.com/tensorflow/tf-keras-datasets/mnist.npz
[1m11490434/11490434[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 0us/step


In [8]:
x_train=x_train.reshape(-1,28*28).astype("float32")/255
x_test=x_test.reshape(-1,28*28).astype("float32")/255

 x_train / x_test
These are numpy arrays containing image data.

For MNIST, each image is 28x28 pixels in grayscale, so initially, each image has a shape of (28, 28) and the dataset has shape (60000, 28, 28) for training and (10000, 28, 28) for test.

🔹 .reshape(-1, 28*28)
This reshapes the image data from 2D (28x28) into 1D (784).

-1 automatically infers the number of samples (like 60000 for training or 10000 for testing).

So it converts x_train from shape (60000, 28, 28) → (60000, 784).

This is done because many neural networks expect flat input vectors, not 2D image matrices.

🔹 .astype("float32")
Converts the data type to float32 (from uint8).

Original MNIST pixel values are integers between 0 to 255 (since it's grayscale).

Converting to float allows for decimal precision, which is required for neural networks.

🔹 / 255
Normalizes the pixel values to a range of 0 to 1.

This helps the neural network train faster and more accurately, as large input values can lead to unstable training.

✅ Summary of What It Does:
Flattens each MNIST image from 28x28 to 784, converts it to float32, and normalizes the pixel values from 0–255 to 0–1.

In [11]:
model=keras.Sequential(
    [
        #first is input of size 28x28
        #keras.Input(shape=(28*28)),
        layers.Dense(512,input_dim=28*28,activation="relu"),
        layers.Dense(256,activation="relu"),
        layers.Dense(10,activation="softmax"),
     ]
)

  super().__init__(activity_regularizer=activity_regularizer, **kwargs)



---

### ✅ **Full Model Code:**

```python
model = keras.Sequential([
    layers.Dense(512, input_dim=28*28, activation="relu"),
    layers.Dense(256, activation="relu"),
    layers.Dense(10, activation="softmax"),
])
```

---

### 🔷 `keras.Sequential([...])`

* This means you're building a **feed-forward neural network**, layer by layer.
* **Sequential** is used when each layer feeds directly into the next one—no branching or merging.
* This is perfect for a basic classification model like MNIST.

---

### 🔹 **First Layer:**

```python
layers.Dense(512, input_dim=28*28, activation="relu")
```

* **`Dense(512)`**: This creates a fully connected (dense) layer with **512 neurons**.
* **`input_dim=28*28`**: Each input is a flattened image of size **784** (since 28×28 = 784).
* **`activation="relu"`**: The **ReLU** activation adds non-linearity. It replaces negative values with 0.

📌 Think of this as the first processing layer that takes all the 784 pixel values and learns 512 useful patterns/features from them.

---

### 🔹 **Second Layer:**

```python
layers.Dense(256, activation="relu")
```

* A dense layer with **256 neurons**.
* It takes the 512 features from the previous layer and learns 256 more abstract patterns.
* **ReLU** is again used to introduce non-linearity.

📌 This deepens the learning—more layers allow the network to learn more complex relationships.

---

### 🔹 **Output Layer:**

```python
layers.Dense(10, activation="softmax")
```

* This is the **output layer**, with **10 neurons** (since MNIST has **10 classes**, digits 0–9).
* **`activation="softmax"`**: Converts raw output scores into **probabilities** that sum to 1.

📌 This layer tells you **which digit (0–9)** the model thinks the image represents, with probabilities.

---

### ✅ **Summary in Simple Terms:**

* Input: Flattened image of **784 pixels**.
* First Layer: Learns **512 features** from input using ReLU.
* Second Layer: Refines that into **256 features**, again using ReLU.
* Output Layer: Gives **probabilities for each digit (0–9)** using softmax.

---


In [12]:
from keras.models import Sequential
from keras.layers import Dense
model=Sequential()
model.add(Dense(512,input_dim=28*28,activation="relu"))
model.add(Dense(256,activation="relu"))
model.add(Dense(10,activation="softmax"))

  super().__init__(activity_regularizer=activity_regularizer, **kwargs)


 Step-by-Step Explanation:
1️⃣ model = Sequential()
This creates an empty neural network model.

You're going to add layers to this model in sequence, one after another.

2️⃣ model.add(Dense(512, input_dim=28*28, activation="relu"))
Adds the first hidden layer with:

512 neurons.

input_dim=28*28 → accepts 784 input features (a flattened 28x28 MNIST image).

activation="relu" → applies the ReLU (Rectified Linear Unit) function to introduce non-linearity.

3️⃣ model.add(Dense(256, activation="relu"))
Adds a second hidden layer with:

256 neurons.

No need to specify input_dim here because it automatically takes input from the previous layer.

activation="relu" → again uses ReLU to help model learn complex features.

4️⃣ model.add(Dense(10, activation="softmax"))
Adds the output layer:

10 neurons → one for each MNIST class (digits 0 through 9).

activation="softmax" → converts outputs into probabilities that sum to 1.

📌 What This Model Does:
It builds a neural network for digit classification (0–9) using:

Input: Flattened 784-dimensional vectors.

Hidden Layers: Two dense layers to learn features (512 → 256).

Output: 10-class softmax for digit prediction.



In [13]:
model.compile(
    loss=keras.losses.SparseCategoricalCrossentropy(from_logits=False),#by default logits is fasle
    optimizer=keras.optimizers.Adam(learning_rate=0.001),
    metrics=["accuracy"],
)

1️⃣ loss=keras.losses.SparseCategoricalCrossentropy(from_logits=False)
You're using SparseCategoricalCrossentropy as your loss function.

It is used when:

You have integer-encoded labels (like 0, 1, 2,..., 9 for MNIST).

Your model output uses softmax (i.e., from_logits=False).

🔸 If you had used activation=None in the output layer, then you'd set from_logits=True.

2️⃣ optimizer=keras.optimizers.Adam(learning_rate=0.001)
This sets the optimizer to Adam, a very effective and commonly used optimizer.

learning_rate=0.001 is a standard starting value—this controls how fast your model learns.

3️⃣ metrics=["accuracy"]
This tells Keras to display accuracy during training and evaluation.

Accuracy is appropriate for classification problems like MNIST.

✅ Summary:
Loss: SparseCategoricalCrossentropy → For integer labels + softmax output.

Optimizer: Adam with LR = 0.001.

Metric: Accuracy to monitor how well the model is lear



---

## ✅ What Are **Logits**?

* **Logits** are the **raw, unnormalized output scores** from the last layer of your neural network **before applying softmax**.
* For example, your model might output something like:

  ```python
  [3.2, 1.1, -2.4, 0.7, ..., 4.8]  # 10 values for MNIST
  ```
* These values are called **logits**.

---

## ✅ What Does `from_logits=False` Mean?

When you use:

```python
loss = SparseCategoricalCrossentropy(from_logits=False)
```

You're telling Keras:

> “**My model already applied softmax**, so the output is a **probability distribution**. Don't apply softmax again.”

In your model, the output layer is:

```python
Dense(10, activation="softmax")
```

So the softmax is **already applied** inside the model. Hence, `from_logits=False` is correct.

---

## ❗ What If `from_logits=True` But You Use Softmax in the Model?

That would be a **mistake**.

* If you use `from_logits=True`, the loss function **expects raw logits** (i.e., no softmax).
* But if your model **already applied softmax**, then softmax would get applied **again internally** during loss computation.
* This results in **double-softmax**, which will mess up your gradients and training. Your model may **fail to learn**.

---

### ✅ Correct Pairings:

| Model Output Activation | Loss Setting                  | Explanation                      |
| ----------------------- | ----------------------------- | -------------------------------- |
| `activation="softmax"`  | `from_logits=False` ✅         | Already softmaxed, no reapply.   |
| `activation=None`       | `from_logits=True` ✅          | Raw logits, softmax inside loss. |
| `activation="softmax"`  | `from_logits=True` ❌ (Wrong)  | Double softmax — will break it.  |
| `activation=None`       | `from_logits=False` ❌ (Wrong) | No softmax applied at all.       |

---

### 🔍 Summary:

* **Logits** = raw scores before softmax.
* `from_logits=False` → use this when your model ends in `softmax`.
* `from_logits=True` → use this when your model ends in **no activation** (i.e., raw output).

---




In [14]:
model.summary()

In [15]:
model.fit(x_train,y_train,batch_size=32,epochs=5,verbose=1)

Epoch 1/5
[1m1875/1875[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m19s[0m 9ms/step - accuracy: 0.9043 - loss: 0.3177
Epoch 2/5
[1m1875/1875[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m21s[0m 9ms/step - accuracy: 0.9756 - loss: 0.0791
Epoch 3/5
[1m1875/1875[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m20s[0m 9ms/step - accuracy: 0.9828 - loss: 0.0518
Epoch 4/5
[1m1875/1875[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m17s[0m 9ms/step - accuracy: 0.9889 - loss: 0.0352
Epoch 5/5
[1m1875/1875[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m18s[0m 9ms/step - accuracy: 0.9912 - loss: 0.0263


<keras.src.callbacks.history.History at 0x7d1e28730c10>

In [16]:
model.fit(x_train,y_train,batch_size=32,epochs=5,verbose=1)

Epoch 1/5
[1m1875/1875[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m19s[0m 10ms/step - accuracy: 0.9915 - loss: 0.0253
Epoch 2/5
[1m1875/1875[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m18s[0m 10ms/step - accuracy: 0.9940 - loss: 0.0196
Epoch 3/5
[1m1875/1875[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m16s[0m 9ms/step - accuracy: 0.9951 - loss: 0.0152
Epoch 4/5
[1m1875/1875[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m22s[0m 9ms/step - accuracy: 0.9948 - loss: 0.0164
Epoch 5/5
[1m1875/1875[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m16s[0m 9ms/step - accuracy: 0.9953 - loss: 0.0139


<keras.src.callbacks.history.History at 0x7d1e2876ac50>

In [17]:
model.predict(x_test[0].reshape(1,-1))

[1m1/1[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 77ms/step


array([[1.2543754e-17, 5.9871968e-11, 9.8572425e-18, 1.4730708e-13,
        7.1460714e-14, 9.7226511e-18, 4.7434740e-16, 1.0000000e+00,
        9.4799309e-13, 1.3828294e-10]], dtype=float32)

1️⃣ x_test[0]
Picks the first test image from your x_test dataset.

Originally, x_test[0] has shape (784,) (since you've flattened the 28x28 image earlier).

2️⃣ .reshape(1, -1)
This reshapes the image from shape (784,) to (1, 784).

Why? Because:

Keras models expect a batch of samples as input, even if it's only 1 sample.

Shape (1, 784) = 1 image, 784 features.

3️⃣ model.predict(...)
This sends the reshaped image through your trained neural network.

The output will be a 1x10 vector like:

python
Copy code
[[0.01, 0.02, 0.85, 0.03, ..., 0.01]]
Each number is the predicted probability for digits 0 through 9.

In this case, it thinks the image is most likely a ‘2’ because 0.85 is the highest value at index 2.



In [19]:
prediction=model.predict(x_test[0].reshape(1,-1))
np.argmax(prediction[0])

[1m1/1[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 40ms/step


np.int64(7)

🔷 Step-by-Step Explanation:
1️⃣ x_test[0]
This selects the first image from your test dataset.

Shape: (784,) because MNIST images were flattened from 28×28 to 1D.

2️⃣ .reshape(1, -1)
Reshapes the 1D array to 2D: from (784,) to (1, 784).

The model expects batch input, even for just 1 image.

-1 is a flexible dimension—it means "infer automatically" (in this case, 784 features).

3️⃣ model.predict(...)
Feeds the reshaped image into the model.

Returns a probability distribution over the 10 possible digit classes.

For example:

python
Copy code
prediction = model.predict(...)
# prediction might be:
[[0.01, 0.03, 0.85, 0.02, 0.01, 0.01, 0.02, 0.02, 0.02, 0.01]]
This is a 1x10 array:

Each value is the model’s confidence that the input image is digit 0, 1, ..., 9.

In this case, 0.85 at index 2 → model thinks it's most likely a "2".

4️⃣ prediction[0]
Accesses the first row (only one sample was predicted).

This gives the actual 10-class probability array.

5️⃣ np.argmax(prediction[0])
Finds the index of the highest probability.

That index corresponds to the predicted digit.