# Neural Network

In [1]:
from keras.datasets import mnist

In [17]:
(train_images, train_labels), (test_images, test_labels) = mnist.load_data()
print(f"Shape of training images is {train_images.shape} \nShape of testing Images is {test_images.shape}")

Shape of training images is (60000, 28, 28) 
Shape of testing Images is (10000, 28, 28)


There are 60k Images (Rows). Since each Row is represented by 28x28 2D matrix we can say its a Grayscale image of shape 28x28

In [None]:
print(train_labels[0:5])

[5 0 4 1 9]


The labels in MNIST case are the actual value of the image itself represented by an integer only

In [12]:
from keras import models 
from keras import layers 

network = models.Sequential()
network.add(layers.Dense(512, activation="relu", input_shape = (28*28,)))
network.add(layers.Dense(10, activation="softmax"))

#small note here (28*28) will throw an error since Python recognizes it as an integer (784) while (28*28,) makes it recognize as a tuple of size (784,)

* Layers are fundamental building blocks for the data processing. Assuming them as a filter for the data.
* Data goes in one form and comes out in another. Hopefully a more meaningful and useful repesentation for the problem at hand.
* Simple layers are connected together to form a chain one after the other implementing a form of progressive **distillation**.
* Models is like a sieve for data preprocessing made up of successive increasing refined data filters -- layers. 

* **Loss function** - Measures the performance on the training/eval data to steer itself in the right direction
* **Optimizer** - Updates the weights acc to the loss
* **Monitoring Metrics** - Calling it Accuracy

In [13]:
#Compilation

network.compile(optimizer="rmsprop",
                loss="categorical_crossentropy",
                metrics=["accuracy"])

In [18]:
#Reshape and change data type

train_images = train_images.reshape(60000, 28*28)
train_images = train_images.astype("float32")/255.0 #Normalize it too

test_images = test_images.reshape(10000, 28*28)
test_images = test_images.astype("float32")/255.0 

print(f"Shape of training images is {train_images.shape} \nShape of testing Images is {test_images.shape}")

Shape of training images is (60000, 784) 
Shape of testing Images is (10000, 784)


Not playing around much in the reshape work, think of it just as unpacking the 28 by 28 pixels image into a flat row of 784 pixels. So u have 60000 rows and each row has 784 columns or call it pixel data points.

In [19]:
from keras.utils import to_categorical

train_labels = to_categorical(train_labels)
test_labels = to_categorical(test_labels)

#Encoding labels using keras' in built categorical encoder

In [None]:
network.fit(train_images,
            train_labels,
            epochs=3,
            batch_size=8)

#Fit is the inbuilt method for training DL models on Keras

Epoch 1/3


2025-07-27 16:30:10.005967: W external/local_xla/xla/tsl/framework/cpu_allocator_impl.cc:83] Allocation of 188160000 exceeds 10% of free system memory.
2025-07-27 16:30:10.133238: W external/local_xla/xla/tsl/framework/cpu_allocator_impl.cc:83] Allocation of 188160000 exceeds 10% of free system memory.
I0000 00:00:1753614010.674773   18175 service.cc:152] XLA service 0x7789a8007470 initialized for platform CUDA (this does not guarantee that XLA will be used). Devices:
I0000 00:00:1753614010.674791   18175 service.cc:160]   StreamExecutor device (0): NVIDIA GeForce RTX 2060, Compute Capability 7.5
2025-07-27 16:30:10.695705: I tensorflow/compiler/mlir/tensorflow/utils/dump_mlir_util.cc:269] disabling MLIR crash reproducer, set env var `MLIR_CRASH_REPRODUCER_DIRECTORY` to enable.
I0000 00:00:1753614010.757553   18175 cuda_dnn.cc:529] Loaded cuDNN version 90300


[1m 105/7500[0m [37m━━━━━━━━━━━━━━━━━━━━[0m [1m10s[0m 1ms/step - accuracy: 0.5817 - loss: 1.3292 

I0000 00:00:1753614011.266876   18175 device_compiler.h:188] Compiled cluster using XLA!  This line is logged at most once for the lifetime of the process.


[1m7500/7500[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m12s[0m 1ms/step - accuracy: 0.9093 - loss: 0.3083
Epoch 2/3
[1m7500/7500[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m12s[0m 2ms/step - accuracy: 0.9726 - loss: 0.1037
Epoch 3/3
[1m7500/7500[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m10s[0m 1ms/step - accuracy: 0.9800 - loss: 0.0810


<keras.src.callbacks.history.History at 0x778a8c8d0c10>

In [23]:
test_loss, test_accuracy = network.evaluate(test_images, test_labels)
print(f"Test accuracy = {test_accuracy} \nTest Loss {test_loss}")

[1m313/313[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 1ms/step - accuracy: 0.9730 - loss: 0.1361
Test accuracy = 0.9763000011444092 
Test Loss 0.11742827296257019


Test accuracy is sloghtly lower than the training accuracy. Its normal sometimes or maybe sometimes can be an issue of other training or data things.

# Data Representation

* Tensors are the default representation of data in neural networks. 
* A container of data
* Tensors can refere to arbitrary number dimensions in matrixes. These dimensions can be referred by as axis.

## Scalar

In [2]:
#Tensor with only number
# 0 dimensional

import numpy as np

x = np.array(12)
print(x)
print("Dimensions are ",x.ndim)

12
Dimensions are  0


## Vector

In [None]:
# Array of numbers
# 1 D Tensor
# 1 Axis

x = np.array([1, 2, 3, 4])
print(x)
print("Dimensions are ", x.ndim)

#Here there are 4 values so we ll call it 4D Vector / Not confusing with a 4D Tensor

[1 2 3 4]
Dimensions are  1


## Matrixes

In [9]:
# Array of vectors
# 2D Tensor
# 2 Axes
# commonly abbreviated as rows and columns
x = np.array([[1, 2, 3],
             [4, 5, 6],
             [7, 8, 9]])

print(x)
print("Dimensions are ", x.ndim)
print("Shape is ", x.shape)

[[1 2 3]
 [4 5 6]
 [7 8 9]]
Dimensions are  2
Shape is  (3, 3)


In [10]:
# Array of matrixes 
# You can keep on packing array within array within arrays
# Each time we do this we ll be increasing the dimensions
# High dimensional tensors

x = np.array([[[5, 2, 3],
               [6, 7, 8],
               [10, 11, 12]]])

print(x)
print("Dimensions are ",x.ndim)
print("Shape is ",x.shape)

[[[ 5  2  3]
  [ 6  7  8]
  [10 11 12]]]
Dimensions are  3
Shape is  (1, 3, 3)


* **Axes** = Tensor's n-dim
* **Shape** = Tuple of integers describing how many dimensions the tensor has along each axis
* **Data Type** = Type of data contained within the container (Tensor); uint8, float16, float32

In [16]:
from keras.datasets import mnist
(train_images, train_labels), (test_images, test_labels) = mnist.load_data()
print(train_images.shape)
print(train_images.ndim)
print(train_images.dtype)

# For the train images tensor
# Dimensions are 3 
# Shape for each dimension is respectively 6000, 28 and 28
# Data type of tensor is uint8

(60000, 28, 28)
3
uint8


In [17]:
#We can manipulate it like arrays - slicing 
x = train_images[244: 322]

| Real world examples
* 2D Tensor - Vector Data
* 3D Tensor - Timeseries Data
* 4D Tensor - Images
* 5D Tensor - Video

# Tensor Operations

Every layer of a neural network can be represented by a handful of mathematical operations in the end. Each layer takes in a tensor - performs some operation on it - returns another tensor. Ultimately its all about representation and work

## Element Wise operations

In [26]:
# Relu
# Mathematically Relu is f(a,0)


def naive_relu(x):
    assert len(x.shape) == 2 # X Should be a 2D Tensor

    x = x.copy()
    for i in range(x.shape[0]): #Rows
        for j in range(x.shape[1]): #Columns
            x[i, j] = max(x[i, j], 0)
    return x

In [27]:
x = np.array([[1, 2, -3],
             [4, -5, 6],
             [7, -8, 9]])

naive_relu(x)

array([[1, 2, 0],
       [4, 0, 6],
       [7, 0, 9]])

In [28]:
#Use numpy rather than all these for element wise operations as they are optimized to run blazing fast using 
# BLAS
# Basic Linear Algebra Subprograms

## Broadcasting

On addition of tensors of two different shapes - In presence of no ambiguity - The smaller tensor is broadcasted to match the shape of the larger tensor. 

Its of two types 
|
* Axes (broadcast axes) are added to smaller tensors to match the ndim of the larger ones
* Smaller tensor is repeated alongside the new axes to match the full shape of the larger tensor

## Dot Product

* Vector.Vector = Vector
* Matrix.Vector = Vector
* (a, b, c, d).(d,) -> (a, b, c)
* (a, b, c, d).(d, e) -> (a, b, c, e)

## Reshaping

Changing the shape of a tensor to match the target shape

In [32]:
x = np.array([[0, 2],
              [1, 4],
              [7,8]])
print(x.shape)

x = x.reshape([6,1])
print(x)

x = x.reshape([2, 3])
print(x)

(3, 2)
[[0]
 [2]
 [1]
 [4]
 [7]
 [8]]
[[0 2 1]
 [4 7 8]]


In [33]:
# Transposing is a special case of reshaping
# Exchanging rows and columns
# x[i, :] -> x[:, i]

x = np.zeros((300, 20))
x = np.transpose(x)
print(x.shape)

(20, 300)


# Gradient based optimization