<a href="https://colab.research.google.com/github/suryatejabatchu08/Machine-Learning-Lab/blob/main/lab1_AP23110010549.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# 🧑‍💻 NumPy Complete Guided Project
**Instructor / Student Colab Notebook** – covers *all* key concepts from `Numpy‑1` to `Numpy‑5`.

*Generated: 08 Aug 2025*


**Table of Contents**

1. [Setup](#setup)  
2. [Array Creation & Dtypes](#creation)  
3. [Array Attributes & Inspection](#attributes)  
4. [Indexing, Slicing, Fancy Indexing](#indexing)  
5. [Reshaping, Transpose & Copies vs Views](#reshape)  
6. [Joining, Splitting, Set & Sorting Ops](#join)  
7. [Arithmetic Ops, Universal Functions](#arithmetic)  
8. [Broadcasting (Rules + Examples)](#broadcast)  
9. [Statistics & Aggregations](#stats)  
10. [Random Numbers & Reproducibility](#random)  
11. [Structured / Recarrays](#structured)  
12. [Linear Algebra Essentials](#linalg)  
13. [File I/O (`npy`, `npz`, `txt`)](#io)  
14. [Datetime64 & Timedelta64](#datetime)  
15. [Masked Arrays & NaNs](#mask)  
16. [Mini‑Project — Fitness Data Analysis](#project)  
17. [Conclusion & Next Steps](#conclusion)  


## <a name='setup'></a>1️⃣ Setup

In [8]:
import numpy as np, math, os, pathlib, types, textwrap, random
print('NumPy version:', np.__version__)

NumPy version: 2.0.2


## <a name='creation'></a>2️⃣ Array Creation & Dtypes

Key functions: `np.array`, `np.arange`, `np.linspace`, `zeros`, `ones`, `full`, `eye`, `identity`, `diag`, `empty`

In [9]:
# EXAMPLE
arr1 = np.array([1, 2, 3], dtype=np.int32)
arr2 = np.linspace(0, 1, 6)
arr3 = np.full((2,3), 7.5)
print(arr1, arr2, arr3, sep="\n")
print("dtypes:", arr1.dtype, arr2.dtype)


[1 2 3]
[0.  0.2 0.4 0.6 0.8 1. ]
[[7.5 7.5 7.5]
 [7.5 7.5 7.5]]
dtypes: int32 float64


In [10]:
# 🖊️ TODO: create a 10×10 chessboard pattern using zeros & ones
chess_board = np.zeros((10,10), dtype=int)
chess_board[1::2, ::2] = 1
chess_board[::2, 1::2] = 1
print(chess_board)

[[0 1 0 1 0 1 0 1 0 1]
 [1 0 1 0 1 0 1 0 1 0]
 [0 1 0 1 0 1 0 1 0 1]
 [1 0 1 0 1 0 1 0 1 0]
 [0 1 0 1 0 1 0 1 0 1]
 [1 0 1 0 1 0 1 0 1 0]
 [0 1 0 1 0 1 0 1 0 1]
 [1 0 1 0 1 0 1 0 1 0]
 [0 1 0 1 0 1 0 1 0 1]
 [1 0 1 0 1 0 1 0 1 0]]


## <a name='attributes'></a>3️⃣ Array Attributes & Inspection

`shape`, `ndim`, `size`, `dtype`, `itemsize`, `nbytes`

In [11]:
M = np.arange(12).reshape(3,4)
print('shape', M.shape, 'ndim', M.ndim, 'size', M.size, 'itemsize', M.itemsize, 'total bytes', M.nbytes)


shape (3, 4) ndim 2 size 12 itemsize 8 total bytes 96


In [12]:
# 🖊️ TODO: check memory footprint of a 1000×1000 float64 array
arr_large = np.zeros((1000, 1000), dtype=np.float64)
print("Memory footprint of 1000x1000 float64 array:", arr_large.nbytes, "bytes")

Memory footprint of 1000x1000 float64 array: 8000000 bytes


## <a name='indexing'></a>4️⃣ Indexing, Slicing & Fancy Indexing

In [13]:
a = np.arange(1,26).reshape(5,5)
print(a[:, 0])     # first column
print(a[::2, ::2]) # every 2nd row/col
mask = (a % 3 == 0)
print('multiples of 3:', a[mask])


[ 1  6 11 16 21]
[[ 1  3  5]
 [11 13 15]
 [21 23 25]]
multiples of 3: [ 3  6  9 12 15 18 21 24]


In [14]:
# 🖊️ TODO: use fancy indexing to swap first and last rows of `a`
a[[0, -1], :] = a[[-1, 0], :]
print(a)

[[21 22 23 24 25]
 [ 6  7  8  9 10]
 [11 12 13 14 15]
 [16 17 18 19 20]
 [ 1  2  3  4  5]]


## <a name='reshape'></a>5️⃣ Reshaping, Transpose & Copies vs Views

In [15]:
b = np.arange(8)
B = b.reshape(2,4)
B[0,0] = 99
print('b is modified:', b)
C = b.reshape(2,4).copy()
C[0,0] = -1
print('b unchanged with copy:', b)


b is modified: [99  1  2  3  4  5  6  7]
b unchanged with copy: [99  1  2  3  4  5  6  7]


In [16]:
# 🖊️ TODO: Flatten a 3‑D array into 1‑D using both `ravel` and `flatten`; observe copy vs view.
arr_3d = np.arange(24).reshape(2, 3, 4)
print("Original 3D array:\n", arr_3d)

# Using ravel
ravelled_arr = arr_3d.ravel()
ravelled_arr[0] = -1 # Modify the raveled array
print("\nRavelled array (modified):\n", ravelled_arr)
print("Original 3D array after modifying raveled array:\n", arr_3d) # Check if original is modified

# Using flatten
flattened_arr = arr_3d.flatten()
flattened_arr[0] = 99 # Modify the flattened array
print("\nFlattened array (modified):\n", flattened_arr)
print("Original 3D array after modifying flattened array:\n", arr_3d) # Check if original is modified

Original 3D array:
 [[[ 0  1  2  3]
  [ 4  5  6  7]
  [ 8  9 10 11]]

 [[12 13 14 15]
  [16 17 18 19]
  [20 21 22 23]]]

Ravelled array (modified):
 [-1  1  2  3  4  5  6  7  8  9 10 11 12 13 14 15 16 17 18 19 20 21 22 23]
Original 3D array after modifying raveled array:
 [[[-1  1  2  3]
  [ 4  5  6  7]
  [ 8  9 10 11]]

 [[12 13 14 15]
  [16 17 18 19]
  [20 21 22 23]]]

Flattened array (modified):
 [99  1  2  3  4  5  6  7  8  9 10 11 12 13 14 15 16 17 18 19 20 21 22 23]
Original 3D array after modifying flattened array:
 [[[-1  1  2  3]
  [ 4  5  6  7]
  [ 8  9 10 11]]

 [[12 13 14 15]
  [16 17 18 19]
  [20 21 22 23]]]


## <a name='join'></a>6️⃣ Joining, Splitting, Set & Sorting Ops

In [17]:
x = np.array([1,3,5]); y = np.array([2,4,6])
xy = np.concatenate([x,y])
print('union', np.union1d(x,y))
print('intersect', np.intersect1d(xy,[1,2,10]))
print('sorted descending', np.sort(xy)[::-1])


union [1 2 3 4 5 6]
intersect [1 2]
sorted descending [6 5 4 3 2 1]


In [21]:
# 🖊️ TODO: split `xy` back into two equal halves using `np.array_split`
x_split, y_split = np.array_split(xy, 2)
print('split:', x_split, y_split)

split: [1 3 5] [2 4 6]


## <a name='arithmetic'></a>7️⃣ Arithmetic Ops & Universal Functions

In [22]:
v = np.arange(5)
print('exp', np.exp(v))
print('sin', np.sin(v))
print('vectorised addition', v + 10)


exp [ 1.          2.71828183  7.3890561  20.08553692 54.59815003]
sin [ 0.          0.84147098  0.90929743  0.14112001 -0.7568025 ]
vectorised addition [10 11 12 13 14]


In [24]:
# 🖊️ TODO: given degrees [0,30,45,60,90], compute radians and sin values.
degrees = np.array([0, 30, 45, 60, 90])
radians = np.deg2rad(degrees)
sin_values = np.sin(radians)
print("Degrees:", degrees)
print("Radians:", radians)
print("Sin values:", sin_values)

Degrees: [ 0 30 45 60 90]
Radians: [0.         0.52359878 0.78539816 1.04719755 1.57079633]
Sin values: [0.         0.5        0.70710678 0.8660254  1.        ]


## <a name='broadcast'></a>8️⃣ Broadcasting Rules

Rules: compare dimensions from right → left; stretch size 1 dims; mismatch error.

In [25]:
row = np.arange(5)
col = np.arange(3).reshape(3,1)
matrix = row + col  # broadcast to 3×5
print(matrix)


[[0 1 2 3 4]
 [1 2 3 4 5]
 [2 3 4 5 6]]


In [33]:
# 🖊️ TODO: use broadcasting to create a 10×10 multiplication table.
table = np.arange(1,11).reshape(10,1) * np.arange(1, 11)
print(table)

[[  1   2   3   4   5   6   7   8   9  10]
 [  2   4   6   8  10  12  14  16  18  20]
 [  3   6   9  12  15  18  21  24  27  30]
 [  4   8  12  16  20  24  28  32  36  40]
 [  5  10  15  20  25  30  35  40  45  50]
 [  6  12  18  24  30  36  42  48  54  60]
 [  7  14  21  28  35  42  49  56  63  70]
 [  8  16  24  32  40  48  56  64  72  80]
 [  9  18  27  36  45  54  63  72  81  90]
 [ 10  20  30  40  50  60  70  80  90 100]]


## <a name='stats'></a>9️⃣ Statistics & Aggregations

In [34]:
data = np.random.default_rng(0).integers(1, 100, size=(5,4))
print('data\n', data)
print('row sums', data.sum(axis=1))
print('col means', data.mean(axis=0))


data
 [[85 64 51 27]
 [31  5  8  2]
 [18 81 65 91]
 [50 61 97 73]
 [63 54 56 93]]
row sums [227  46 255 281 266]
col means [49.4 53.  55.4 57.2]


In [35]:
# 🖊️ TODO: compute `np.percentile` (25th, 50th, 75th) of flattened `data`.
percentiles = np.percentile(data.flatten(), [25, 50, 75])
print("25th, 50th, and 75th percentiles of flattened data:", percentiles)

25th, 50th, and 75th percentiles of flattened data: [30.  58.5 75. ]


## <a name='random'></a>🔟 Random Numbers & Reproducibility

In [41]:
rng = np.random.default_rng(42)
rand_floats = rng.random(5)
rand_ints = rng.integers(low=10, high=50, size=5)
print(rand_floats, rand_ints)
rng2 = np.random.default_rng(42)
assert np.allclose(rand_floats, rng2.random(5))


[0.77395605 0.43887844 0.85859792 0.69736803 0.09417735] [31 49 39 40 38]


In [42]:
# 🖊️ TODO: simulate rolling a fair six‑sided die 100 times; estimate proportion of 6s.
die_rolls = rng.integers(1, 7, size=100)
proportion_of_sixes = np.sum(die_rolls == 6) / len(die_rolls)
print("Proportion of 6s:", proportion_of_sixes)

Proportion of 6s: 0.12


## <a name='structured'></a>1️⃣1️⃣ Structured / Record Arrays

In [44]:
people = np.array([('Alice', 25, 55.0), ('Bob', 30, 85.5)],
                   dtype=[('name','U10'), ('age','i4'), ('weight','f4')])
print(people['name'], people['age'].mean())


['Alice' 'Bob'] 27.5


In [45]:
# 🖊️ TODO: add a new field 'height' to the structured array using `np.lib.recfunctions.append_fields` (hint: pip install?).
from numpy.lib.recfunctions import append_fields
heights = np.array([1.65, 1.80], dtype='f4')
people_with_height = append_fields(people, 'height', heights, usemask=False)
print(people_with_height)

[('Alice', 25, 55. , 1.65) ('Bob', 30, 85.5, 1.8 )]


## <a name='linalg'></a>1️⃣2️⃣ Linear Algebra Essentials

In [46]:
A = np.random.random((3,3))
b = np.random.random(3)
x = np.linalg.solve(A, b)
print('A·x ≈ b?', np.allclose(A.dot(x), b))


A·x ≈ b? True


In [47]:
# 🖊️ TODO: compute eigenvalues of `A` using `np.linalg.eig`.
eigenvalues, eigenvectors = np.linalg.eig(A)
print("Eigenvalues of A:", eigenvalues)
print("Eigenvectors of A:\n", eigenvectors)

Eigenvalues of A: [ 1.81994565 -0.24667861  0.37791831]
Eigenvectors of A:
 [[-0.80857375 -0.77400386  0.40943307]
 [-0.45103114  0.38061824 -0.76518389]
 [-0.37786161  0.50601164  0.49684824]]


## <a name='io'></a>1️⃣3️⃣ File I/O (`npy`, `npz`, `txt`)

In [48]:
np.save('array.npy', A)
loaded = np.load('array.npy')
print('loaded equals A?', np.allclose(loaded, A))
np.savez('multi_arrays.npz', A=A, b=b)


loaded equals A? True


In [51]:
# 🖊️ TODO: Use `np.savetxt` to write `data` (from stats section) to CSV then reload with `np.loadtxt`.
np.savetxt('data.csv', data, delimiter=',')
loaded_data = np.loadtxt('data.csv', delimiter=',')
print('Loaded data equals original data?', np.allclose(loaded_data, data))

Loaded data equals original data? True


## <a name='datetime'></a>1️⃣4️⃣ Datetime64 & Timedelta64

In [59]:
dates = np.arange('2023-01', '2023-04', dtype='datetime64[D]')
delta = dates[1:] - dates[:-1]
print(dates[:5], delta[0])


['2023-01-01' '2023-01-02' '2023-01-03' '2023-01-04' '2023-01-05'] 1 days


In [60]:
# 🖊️ TODO: find how many Mondays appear in `dates` array.
mondays = np.busday_count(dates[0], dates[-1], weekmask='Mon')
print("Number of Mondays in the dates array:", mondays)

Number of Mondays in the dates array: 13


## <a name='mask'></a>1️⃣5️⃣ Masked Arrays & NaNs

In [61]:
arr = np.array([1, 2, np.nan, 4, np.nan])
masked = np.ma.masked_invalid(arr)
print(masked.mean())


2.3333333333333335


In [62]:
# 🖊️ TODO: replace NaNs with column means in a 2‑D array containing NaNs.
arr_2d_nan = np.array([[1, 2, np.nan, 4],
                       [5, np.nan, 7, 8],
                       [9, 10, 11, np.nan],
                       [13, 14, 15, 16]])
col_means = np.nanmean(arr_2d_nan, axis=0)
nan_indices = np.isnan(arr_2d_nan)
arr_2d_nan[nan_indices] = np.take(col_means, np.where(nan_indices)[1])
print("Array with NaNs replaced by column means:\n", arr_2d_nan)

Array with NaNs replaced by column means:
 [[ 1.          2.         11.          4.        ]
 [ 5.          8.66666667  7.          8.        ]
 [ 9.         10.         11.          9.33333333]
 [13.         14.         15.         16.        ]]


## <a name='project'></a>1️⃣6️⃣ Mini‑Project: Fitness Data Analysis

Load `fitness.txt` (tab‑separated) then follow prompts.

In [64]:
fitness = np.genfromtxt('fitness.txt', delimiter='\t', dtype=None, encoding=None, names=True)
print('columns:', fitness.dtype.names, 'rows:', len(fitness))


columns: ('date', 'step_count', 'mood', 'calories_burned', 'hours_of_sleep', 'bool_of_active', 'weight_kg') rows: 96


In [65]:
# 🖊️ TODO: Monthly step count, sleep vs mood correlation, weekly summary, etc.
import pandas as pd
fitness = np.genfromtxt('fitness.txt', delimiter=None, dtype=None, encoding=None, names=True)
cols = [c.lstrip('#') for c in fitness.dtype.names]
fitness.dtype.names = tuple(cols)
df = pd.DataFrame(fitness)
df['date'] = pd.to_datetime(df['date'], format='%d-%m-%Y')
df['month'] = df['date'].dt.to_period('M')
df['week'] = df['date'].dt.isocalendar().week
monthly_steps = df.groupby('month')['step_count'].sum()
sleep_mood_corr = df['hours_of_sleep'].corr(df['mood'])
weekly_summary = df.groupby('week').agg({
    'step_count': 'sum',
    'hours_of_sleep': 'mean',
    'mood': 'mean',
    'calories_burned': 'sum'
})
print('Monthly Step Count:\n', monthly_steps)
print('\nSleep vs Mood Correlation:', sleep_mood_corr)
print('\nWeekly Summary:\n', weekly_summary)

Monthly Step Count:
 month
2017-10     79051
2017-11    103071
2017-12     89565
2018-01     10163
Freq: M, Name: step_count, dtype: int64

Sleep vs Mood Correlation: 0.2104166644730015

Weekly Summary:
       step_count  hours_of_sleep        mood  calories_burned
week                                                         
1           5833        3.714286  171.428571              168
2           4330        5.000000  250.000000                0
40         11530        6.000000  133.333333              378
41         23810        5.571429  114.285714              765
42         20669        6.428571  128.571429              657
43         16283        5.571429  214.285714              521
44         20598        5.142857  285.714286              665
45         31217        4.714286  285.714286             1012
46         17714        5.571429  285.714286              589
47         27859        5.714286  300.000000              917
48         18701        6.428571  257.142857        

## <a name='conclusion'></a>1️⃣7️⃣ Conclusion & Further Practice
Congrats on covering **all core NumPy topics** from your five lecture notebooks!

*Keep experimenting, read the official docs, and try converting your NumPy pipelines into Pandas or JAX for more fun.*