<a href="https://colab.research.google.com/github/saffarizadeh/INSY5378/blob/main/Assignments/Assignment_2.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

<img src="https://kambizsaffari.com/Logo/College_of_Business.cmyk-hz-lg.png" width="500px"/>

# *INSY 5378*

# **The Mathematical Building Blocks of Neural Networks**

Instructor: Dr. Kambiz Saffari

---

**Instructions**
- This assignment covers concepts from **Chapter 2: The Mathematical Building Blocks of Neural Networks**.
- Topics include: tensors, tensor operations, broadcasting, gradient descent, and a complete neural network workflow.
- You may use **NumPy** and **Keras** where indicated.
- Write your answers directly in the provided cells.
- You may add additional cells if you want to test ideas, but only answers in the marked cells will be graded.


## Question 1: Tensors - Rank, Shape, and Slicing

Understanding how data is represented as tensors is fundamental to deep learning.

**Part A - Creating and Inspecting Tensors**

Using NumPy, create **each** of the following and print its `ndim` (rank), `shape`, and `dtype`:

1. A **scalar** (rank-0 tensor) with the value `7`.
2. A **vector** (rank-1 tensor) with the values `[3, 14, 15, 92, 65]`.
3. A **matrix** (rank-2 tensor) with shape `(3, 4)` filled with zeros.
4. A **rank-3 tensor** with shape `(2, 3, 4)` filled with ones.

**Part B - Tensor Slicing**

Consider the following rank-3 tensor:

```python
import numpy as np
data = np.arange(60).reshape((3, 4, 5))
```

Without running the code first, **predict** the result of each slice below, then verify by running it. Write your prediction as a comment above each line.

1. `data[0]` - What is its shape?
2. `data[:, 1, :]` - What is its shape?
3. `data[1:, :2, 3:]` - What is its shape?
4. `data[:, -1, :]` - What is its shape?

**Part C - Real-World Tensor Shapes (Comment Only)**

In a comment, describe the tensor shape (rank and what each dimension represents) you would use to represent:
- A batch of 128 grayscale images of size 256×256 (refer to the image data conventions discussed in the chapter).
- A dataset of 250 stock trading days, where each day has 390 one-minute readings with 3 features (current price, high, low).


In [None]:
# Write your answer to Question 1 here

## Question 2: Tensor Operations and Broadcasting

The core computations inside a neural network layer are tensor operations. This question asks you to implement them by hand to build intuition.

**Part A - Naive Element-wise Operations**

Without using any NumPy built-in math functions (no `np.add`, `np.maximum`, `+` on arrays, etc.), write the following two functions that operate on **rank-2 tensors** (2D lists or arrays). You may use `for` loops.

1. `naive_add(x, y)` - returns the element-wise sum of two same-shape matrices.
2. `naive_relu(x)` - returns a copy of the matrix where all negative values are replaced with `0`.

Test both functions on the following inputs and print the results:
```python
x = [[1, -2, 3], [-4, 5, -6]]
y = [[10, 20, 30], [40, 50, 60]]
```

**Part B - Broadcasting by Hand**

Without using NumPy broadcasting (no `+` on arrays of different shapes), write a function:

- `naive_add_matrix_and_vector(x, v)` - adds a vector `v` of length `n` to every row of a matrix `x` with shape `(m, n)`.

Test it with:
```python
x = [[1, 2, 3], [4, 5, 6], [7, 8, 9]]
v = [100, 200, 300]
```

Then, verify your result is correct by converting to NumPy arrays and using NumPy's built-in broadcasting (`x + v`). Print both outputs.

**Part C - The Dense Layer Computation (Comment Only)**

The core computation of a Dense layer is: `output = relu(matmul(input, W) + b)`

In a comment, answer:
1. Which operation in this expression uses broadcasting?
2. Why is the `relu` activation function necessary? What would happen if a multi-layer network used only Dense layers without any activation function?


In [None]:
# Write your answer to Question 2 here

## Question 3: Gradient Descent - Step by Step

Before using frameworks that handle optimization automatically, it is important to understand what gradient descent is actually doing.

**Part A - Manual Gradient Descent on a Simple Function**

Consider the function: `f(x) = (x - 3)² + 1`

Its derivative is: `f'(x) = 2 * (x - 3)`

Starting at `x = 10.0`, with a `learning_rate = 0.1`, perform **20 steps** of gradient descent by hand (in a loop). In each step:
1. Compute the gradient: `grad = 2 * (x - 3)`
2. Update x: `x = x - learning_rate * grad`

Print the value of `x` and `f(x)` at each step. In a comment, state:
- What value does `x` converge to?
- Why does it converge there? (Think about where the derivative equals zero.)

**Part B - Effect of Learning Rate**

Repeat the same experiment from Part A **three times**, using these learning rates:
1. `learning_rate = 0.01` (small)
2. `learning_rate = 0.1`  (moderate)
3. `learning_rate = 1.0`  (large)

For each, print the final value of `x` after 20 steps.

In a comment, explain:
- What happens when the learning rate is too small?
- What happens when the learning rate is too large?
- Why choosing an appropriate learning rate matters for training neural networks.

**Part C - Connecting to Neural Networks (Comment Only)**

In a comment, answer:
1. In a real neural network, what plays the role of `x` (the variable being updated)?
2. What plays the role of `f(x)` (the function being minimized)?
3. Why do we use "mini-batch" gradient descent instead of computing the gradient over the entire dataset at once?


In [None]:
# Write your answer to Question 3 here

## Question 4: Putting It All Together - The MNIST Pipeline

This question walks you through the complete neural network workflow from Chapter 2 and asks you to **explain every step**.

**Part A - Load and Explore the Data**

Load the MNIST dataset using Keras:
```python
from keras.datasets import mnist
(train_images, train_labels), (test_images, test_labels) = mnist.load_data()
```

Print the following for `train_images`:
- `ndim` (rank)
- `shape`
- `dtype`

In a comment, explain what each dimension of the shape represents.

**Part B - Preprocess the Data**

Apply the two preprocessing steps from the chapter:
1. Reshape the images from `(60000, 28, 28)` to `(60000, 784)`.
2. Convert the data type to `float32` and scale pixel values to the range `[0, 1]`.

Do the same for `test_images`.

In a comment, explain:
- Why do we reshape 28×28 images into flat vectors of length 784?
- Why do we scale pixel values from `[0, 255]` to `[0, 1]`?

**Part C - Build, Compile, and Train the Model**

Build the model from the chapter:
```python
import keras
from keras import layers

model = keras.Sequential([
    layers.Dense(512, activation="relu"),
    layers.Dense(10, activation="softmax"),
])
```

Compile it with optimizer `"adam"`, loss `"sparse_categorical_crossentropy"`, and metrics `["accuracy"]`.

Train the model for 5 epochs with a batch size of 128.

In a comment, explain:
1. What does the first Dense layer with 512 units and `relu` activation do?
2. What does the second Dense layer with 10 units and `softmax` activation do?
3. What is the role of the loss function during training?
4. During training with `batch_size=128` and 60,000 samples, how many gradient updates happen per epoch?

**Part D - Evaluate and Interpret**

Evaluate the model on the test set using `model.evaluate()` and print the test accuracy.

Then, use `model.predict()` on the first 5 test images. For each:
- Print the predicted label (using `argmax`).
- Print the actual label.
- Print the model's confidence (probability) for its predicted class.

In a comment, explain:
- Why is the test accuracy lower than the training accuracy? (The chapter gives a specific term for this - use it.)
- In one sentence, what does this term mean?


In [None]:
# Write your answer to Question 4 here

---
**Submission Reminder**
- Make sure your notebook runs from top to bottom without errors.
- Clearly label all answers.
- Include all required comments and explanations - they are part of your grade.
