<a href="https://colab.research.google.com/github/labibkamran/Machine-Learning/blob/main/Lab1/lab.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Labib Kamran - 467183 - BSCS13D

# Lab 1: Introduction to Machine Learning



In [8]:
# First Python cell
print("Hello, Machine Learning World!")

Hello, Machine Learning World!


In [9]:
# Imports and Mounting
from google.colab import drive
drive.mount('/content/drive')

import pandas as pd
data = pd.read_csv(r'/content/sample_data/california_housing_train.csv')

import numpy as np
data["Extra Column"] = np.random.rand(len(data)) # adding random numbers

data.to_csv(r'/content/Sample-Superstore-Modified.csv')

Drive already mounted at /content/drive; to attempt to forcibly remount, call drive.mount("/content/drive", force_remount=True).


## <a id="why-numpy"></a> Why NumPy?

NumPy provides fast, vectorized array operations essential for ML workflows (preprocessing, linear algebra, batching). It’s the foundation for many ML libraries.

### Example: BMI (vectorized)


In [10]:
import numpy as np
heights = np.array([1.75, 1.80, 1.65])  # meters
weights = np.array([65, 78, 50])       # kg
bmi = weights / (heights ** 2)
print("BMI:", bmi)

BMI: [21.2244898  24.07407407 18.36547291]


### Normalization example


In [11]:
data = np.array([4.0, 5.0, 6.0, 8.0, 10.0])
normalized_data = (data - np.mean(data)) / np.std(data)
print("Z-score normalized:", normalized_data)

Z-score normalized: [-1.2070197  -0.74278135 -0.27854301  0.64993368  1.57841037]


### Matrix multiplication (Linear Regression prediction)


In [12]:
X = np.array([[1, 2], [3, 4], [5, 6]])
beta = np.array([0.5, 1.5])
b = 0.1
y_pred = np.dot(X, beta) + b
print("y_pred:", y_pred)

y_pred: [ 3.6  7.6 11.6]


## <a id="key-numpy-concepts"></a> Key NumPy Concepts

### Arrays vs Lists


In [13]:
# Arrays vs Lists
# Element-wise sum with Python lists requires iteration
list_a = [1, 2, 3]
list_b = [4, 5, 6]
result_list = [a + b for a, b in zip(list_a, list_b)]
print("List addition:", result_list)

# With NumPy, operations are vectorized
import numpy as np
array_a = np.array([1, 2, 3])
array_b = np.array([4, 5, 6])
result_np = array_a + array_b
print("NumPy array addition:", result_np)

List addition: [5, 7, 9]
NumPy array addition: [5 7 9]


In [14]:
# Creating NumPy arrays
print("From list:", np.array([1, 2, 3, 4]))
print("Zeros (3x3):\n", np.zeros((3, 3)))
print("Ones (2x4):\n", np.ones((2, 4)))

# Random values (2x3) between 0 and 1
rand_mat = np.random.random((2, 3))
print("Random (2x3):\n", rand_mat)

# Element-wise operations
a = np.array([1, 2, 3])
b = np.array([4, 5, 6])
print("a + b:", a + b)
print("a * b:", a * b)

# Broadcasting example
A = np.array([[1, 2, 3], [4, 5, 6]])
B = np.array([1, 2, 3])
print("Broadcasted A + B:\n", A + B)

# Reshape and flatten
c = np.array([1, 2, 3, 4, 5, 6])
c_reshaped = c.reshape(2, 3)
print("Reshaped (2x3):\n", c_reshaped)
print("Flattened:", c_reshaped.flatten())

From list: [1 2 3 4]
Zeros (3x3):
 [[0. 0. 0.]
 [0. 0. 0.]
 [0. 0. 0.]]
Ones (2x4):
 [[1. 1. 1. 1.]
 [1. 1. 1. 1.]]
Random (2x3):
 [[0.4705466  0.15976708 0.91891765]
 [0.78227536 0.40037965 0.32503768]]
a + b: [5 7 9]
a * b: [ 4 10 18]
Broadcasted A + B:
 [[2 4 6]
 [5 7 9]]
Reshaped (2x3):
 [[1 2 3]
 [4 5 6]]
Flattened: [1 2 3 4 5 6]


## <a id="hands-on-coding"></a> Hands-On Coding

### Creating and Manipulating Arrays


In [15]:
# 1D and 2D arrays
import numpy as np
array_1d = np.array([10, 20, 30, 40])
print("1D:", array_1d)

array_2d = np.array([[1, 2, 3], [4, 5, 6]])
print("2D:\n", array_2d)

# Basic operations
array_sum = array_1d + 5
print("1D + 5:", array_sum)

# Random matrix and scaling
random_matrix = np.random.random((3, 3))
print("Random (3x3):\n", random_matrix)
random_matrix_scaled = random_matrix * 100
print("Scaled (*100):\n", random_matrix_scaled)

# Reshape and slicing
reshaped_array = np.array([1, 2, 3, 4, 5, 6]).reshape(2, 3)
print("Reshaped (2x3):\n", reshaped_array)

sliced_array = reshaped_array[:2, :2]
print("Sliced (first 2 rows/cols):\n", sliced_array)

1D: [10 20 30 40]
2D:
 [[1 2 3]
 [4 5 6]]
1D + 5: [15 25 35 45]
Random (3x3):
 [[0.5063342  0.08165597 0.63955002]
 [0.11842205 0.78540649 0.49251904]
 [0.30638387 0.95368432 0.60116113]]
Scaled (*100):
 [[50.63342022  8.16559665 63.95500228]
 [11.84220481 78.54064881 49.25190393]
 [30.63838678 95.36843187 60.11611334]]
Reshaped (2x3):
 [[1 2 3]
 [4 5 6]]
Sliced (first 2 rows/cols):
 [[1 2]
 [4 5]]


## <a id="mini-challenge"></a> Mini Challenge

Write a function that takes an array of any shape and returns a min-max normalized version (values in [0, 1]). Then test it with a random array. Finally, explain why normalization is critical in ML preprocessing.


In [16]:
import numpy as np

def min_max_normalize(x: np.ndarray, axis=None, keepdims=False):
    """
    Min-max normalize the array to [0, 1].
    - axis=None normalizes across the whole array (default).
    - axis can be an int/tuple to normalize along axes.
    - keepdims keeps reduced dimensions if True.
    Handles constant arrays by returning zeros.
    """
    x = np.asarray(x)
    x_min = np.min(x, axis=axis, keepdims=True)
    x_max = np.max(x, axis=axis, keepdims=True)
    denom = x_max - x_min
    # Avoid division by zero for constant arrays
    denom = np.where(denom == 0, 1, denom)
    normalized = (x - x_min) / denom
    if not keepdims and axis is not None:
        # Squeeze reduced dims if requested
        normalized = np.squeeze(normalized, axis=axis)
    return normalized

# Test with random array
np.random.seed(42)
test = np.random.randn(3, 4)
print("Original:\n", test)
print("Min-max normalized (global):\n", min_max_normalize(test))
print("Min-max normalized (per-row):\n", min_max_normalize(test, axis=1, keepdims=True))

Original:
 [[ 0.49671415 -0.1382643   0.64768854  1.52302986]
 [-0.23415337 -0.23413696  1.57921282  0.76743473]
 [-0.46947439  0.54256004 -0.46341769 -0.46572975]]
Min-max normalized (global):
 [[0.4716135  0.16166943 0.54530673 0.97257612]
 [0.1148643  0.11487231 1.         0.60375694]
 [0.         0.49399168 0.00295638 0.00182782]]
Min-max normalized (per-row):
 [[3.82219158e-01 0.00000000e+00 4.73096733e-01 1.00000000e+00]
 [0.00000000e+00 9.05375552e-06 1.00000000e+00 5.52336373e-01]
 [0.00000000e+00 1.00000000e+00 5.98467102e-03 3.70010373e-03]]


> Why normalization is critical in ML preprocessing:
>
> - Stabilizes and speeds up training by keeping features on similar scales (important for gradient descent).
> - Prevents features with large scales from dominating distance-based models (kNN, clustering) and regularization.
> - Improves numeric stability for matrix operations.
> - Consistent scaling is essential for fair model comparison and reproducibility.
