<a href="https://colab.research.google.com/github/karthikeya-2005/Machine-Learning-lab/blob/main/Lab1_AP23110011400.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# 🧑‍💻 NumPy Complete Guided Project
**Instructor / Student Colab Notebook** – covers *all* key concepts from `Numpy‑1` to `Numpy‑5`.

*Generated: 08 Aug 2025*


**Table of Contents**

1. [Setup](#setup)  
2. [Array Creation & Dtypes](#creation)  
3. [Array Attributes & Inspection](#attributes)  
4. [Indexing, Slicing, Fancy Indexing](#indexing)  
5. [Reshaping, Transpose & Copies vs Views](#reshape)  
6. [Joining, Splitting, Set & Sorting Ops](#join)  
7. [Arithmetic Ops, Universal Functions](#arithmetic)  
8. [Broadcasting (Rules + Examples)](#broadcast)  
9. [Statistics & Aggregations](#stats)  
10. [Random Numbers & Reproducibility](#random)  
11. [Structured / Recarrays](#structured)  
12. [Linear Algebra Essentials](#linalg)  
13. [File I/O (`npy`, `npz`, `txt`)](#io)  
14. [Datetime64 & Timedelta64](#datetime)  
15. [Masked Arrays & NaNs](#mask)  
16. [Mini‑Project — Fitness Data Analysis](#project)  
17. [Conclusion & Next Steps](#conclusion)  


## <a name='setup'></a>1️⃣ Setup

In [None]:
import numpy as np, math, os, pathlib, types, textwrap, random
print('NumPy version:', np.__version__)

NumPy version: 2.0.2


## <a name='creation'></a>2️⃣ Array Creation & Dtypes

Key functions: `np.array`, `np.arange`, `np.linspace`, `zeros`, `ones`, `full`, `eye`, `identity`, `diag`, `empty`

In [None]:
# EXAMPLE
arr1 = np.array([1, 2, 3], dtype=np.int32)
arr2 = np.linspace(0, 1, 6)
arr3 = np.full((2,3), 7.5)
print(arr1, arr2, arr3, sep="\n")
print("dtypes:", arr1.dtype, arr2.dtype)


In [None]:
# 🖊️ TODO: create a 10×10 chessboard pattern using zeros & ones

chessboard = np.zeros((10, 10), dtype=int)
chessboard[1::2, ::2] = 1
chessboard[::2, 1::2] = 1

print(chessboard)



[[0 1 0 1 0 1 0 1 0 1]
 [1 0 1 0 1 0 1 0 1 0]
 [0 1 0 1 0 1 0 1 0 1]
 [1 0 1 0 1 0 1 0 1 0]
 [0 1 0 1 0 1 0 1 0 1]
 [1 0 1 0 1 0 1 0 1 0]
 [0 1 0 1 0 1 0 1 0 1]
 [1 0 1 0 1 0 1 0 1 0]
 [0 1 0 1 0 1 0 1 0 1]
 [1 0 1 0 1 0 1 0 1 0]]


## <a name='attributes'></a>3️⃣ Array Attributes & Inspection

`shape`, `ndim`, `size`, `dtype`, `itemsize`, `nbytes`

In [None]:
M = np.arange(12).reshape(3,4)
print('shape', M.shape, 'ndim', M.ndim, 'size', M.size, 'itemsize', M.itemsize, 'total bytes', M.nbytes)


In [None]:
# 🖊️ TODO: check memory footprint of a 1000×1000 float64 array

arr = np.zeros((1000, 1000), dtype=np.float64)
print(arr.nbytes, "bytes")
print(arr.nbytes / (1024**2), "MB")


8000000 bytes
7.62939453125 MB


## <a name='indexing'></a>4️⃣ Indexing, Slicing & Fancy Indexing

In [None]:
a = np.arange(1,26).reshape(5,5)
print(a[:, 0])     # first column
print(a[::2, ::2]) # every 2nd row/col
mask = (a % 3 == 0)
print('multiples of 3:', a[mask])


In [None]:
# 🖊️ TODO: use fancy indexing to swap first and last rows of `a`

a = np.arange(1, 26).reshape(5, 5)
print(a)
a[[0, -1]] = a[[-1, 0]]
print("\nAfter swapping first and last rows:\n", a)


[[ 1  2  3  4  5]
 [ 6  7  8  9 10]
 [11 12 13 14 15]
 [16 17 18 19 20]
 [21 22 23 24 25]]

After swapping first and last rows:
 [[21 22 23 24 25]
 [ 6  7  8  9 10]
 [11 12 13 14 15]
 [16 17 18 19 20]
 [ 1  2  3  4  5]]


## <a name='reshape'></a>5️⃣ Reshaping, Transpose & Copies vs Views

In [None]:
b = np.arange(8)
B = b.reshape(2,4)
B[0,0] = 99
print('b is modified:', b)
C = b.reshape(2,4).copy()
C[0,0] = -1
print('b unchanged with copy:', b)


In [None]:
# 🖊️ TODO: Flatten a 3‑D array into 1‑D using both `ravel` and `flatten`; observe copy vs view.

arr3d = np.arange(1, 13).reshape(2, 2, 3)
print("Original 3D array:\n", arr3d)

r = arr3d.ravel()
print("\nUsing ravel():", r)

f = arr3d.flatten()
print("Using flatten():", f)

r[0] = 999
print("\nAfter modifying ravel() output:")
print("ravel view:", r)
print("Original array after ravel change:\n", arr3d)

f[1] = 555
print("\nAfter modifying flatten() output:")
print("flatten copy:", f)
print("Original array after flatten change:\n", arr3d)



Original 3D array:
 [[[ 1  2  3]
  [ 4  5  6]]

 [[ 7  8  9]
  [10 11 12]]]

Using ravel(): [ 1  2  3  4  5  6  7  8  9 10 11 12]
Using flatten(): [ 1  2  3  4  5  6  7  8  9 10 11 12]

After modifying ravel() output:
ravel view: [999   2   3   4   5   6   7   8   9  10  11  12]
Original array after ravel change:
 [[[999   2   3]
  [  4   5   6]]

 [[  7   8   9]
  [ 10  11  12]]]

After modifying flatten() output:
flatten copy: [  1 555   3   4   5   6   7   8   9  10  11  12]
Original array after flatten change:
 [[[999   2   3]
  [  4   5   6]]

 [[  7   8   9]
  [ 10  11  12]]]


## <a name='join'></a>6️⃣ Joining, Splitting, Set & Sorting Ops

In [None]:
x = np.array([1,3,5]); y = np.array([2,4,6])
xy = np.concatenate([x,y])
print('union', np.union1d(x,y))
print('intersect', np.intersect1d(xy,[1,2,10]))
print('sorted descending', np.sort(xy)[::-1])


In [None]:
# 🖊️ TODO: split `xy` back into two equal halves using `np.array_split`

xy = np.arange(10)

halves = np.array_split(xy, 2)

print("First half:", halves[0])
print("Second half:", halves[1])



First half: [0 1 2 3 4]
Second half: [5 6 7 8 9]


## <a name='arithmetic'></a>7️⃣ Arithmetic Ops & Universal Functions

In [None]:
v = np.arange(5)
print('exp', np.exp(v))
print('sin', np.sin(v))
print('vectorised addition', v + 10)


In [None]:
# 🖊️ TODO: given degrees [0,30,45,60,90], compute radians and sin values.

degrees = np.array([0, 30, 45, 60, 90])

radians = np.deg2rad(degrees)

sin_values = np.sin(radians)

print("Degrees:", degrees)
print("Radians:", radians)
print("Sine values:", sin_values)


Degrees: [ 0 30 45 60 90]
Radians: [0.         0.52359878 0.78539816 1.04719755 1.57079633]
Sine values: [0.         0.5        0.70710678 0.8660254  1.        ]


## <a name='broadcast'></a>8️⃣ Broadcasting Rules

Rules: compare dimensions from right → left; stretch size 1 dims; mismatch error.

In [None]:
row = np.arange(5)
col = np.arange(3).reshape(3,1)
matrix = row + col  # broadcast to 3×5
print(matrix)


In [None]:
# 🖊️ TODO: use broadcasting to create a 10×10 multiplication table.

row = np.arange(1, 11).reshape(1, 10)
col = np.arange(1, 11).reshape(10, 1)

table = col * row

print(table)


[[  1   2   3   4   5   6   7   8   9  10]
 [  2   4   6   8  10  12  14  16  18  20]
 [  3   6   9  12  15  18  21  24  27  30]
 [  4   8  12  16  20  24  28  32  36  40]
 [  5  10  15  20  25  30  35  40  45  50]
 [  6  12  18  24  30  36  42  48  54  60]
 [  7  14  21  28  35  42  49  56  63  70]
 [  8  16  24  32  40  48  56  64  72  80]
 [  9  18  27  36  45  54  63  72  81  90]
 [ 10  20  30  40  50  60  70  80  90 100]]


## <a name='stats'></a>9️⃣ Statistics & Aggregations

In [None]:
data = np.random.default_rng(0).integers(1, 100, size=(5,4))
print('data\n', data)
print('row sums', data.sum(axis=1))
print('col means', data.mean(axis=0))


data
 [[85 64 51 27]
 [31  5  8  2]
 [18 81 65 91]
 [50 61 97 73]
 [63 54 56 93]]
row sums [227  46 255 281 266]
col means [49.4 53.  55.4 57.2]


In [None]:
# 🖊️ TODO: compute `np.percentile` (25th, 50th, 75th) of flattened `data`.

data = np.random.default_rng(0).integers(1, 100, size=(5, 4))
print('data\n', data)

print('row sums', data.sum(axis=1))

print('col means', data.mean(axis=0))

percentiles = np.percentile(data, [25, 50, 75])
print('25th, 50th, 75th percentiles:', percentiles)


data
 [[85 64 51 27]
 [31  5  8  2]
 [18 81 65 91]
 [50 61 97 73]
 [63 54 56 93]]
row sums [227  46 255 281 266]
col means [49.4 53.  55.4 57.2]
25th, 50th, 75th percentiles: [30.  58.5 75. ]


## <a name='random'></a>🔟 Random Numbers & Reproducibility

In [None]:
rng = np.random.default_rng(42)
rand_floats = rng.random(5)
rand_ints = rng.integers(low=10, high=50, size=5)
print(rand_floats, rand_ints)
rng2 = np.random.default_rng(42)
assert np.allclose(rand_floats, rng2.random(5))


In [None]:
# 🖊️ TODO: simulate rolling a fair six‑sided die 100 times; estimate proportion of 6s.

rolls = np.random.default_rng(0).integers(1, 7, size=100)

proportion_sixes = np.mean(rolls == 6)

print("Rolls:", rolls)
print("Proportion of 6s:", proportion_sixes)


Rolls: [6 4 4 2 2 1 1 1 2 5 4 6 4 4 6 5 4 4 4 6 2 5 5 1 3 6 4 1 5 5 6 2 1 6 1 4 1
 2 3 3 3 1 1 1 1 5 4 4 2 4 5 3 3 6 5 6 3 5 6 4 6 5 5 3 6 1 4 5 6 4 3 2 3 3
 5 6 1 6 4 3 5 4 2 2 5 4 4 3 5 3 2 6 2 2 5 4 1 1 3 5]
Proportion of 6s: 0.16


## <a name='structured'></a>1️⃣1️⃣ Structured / Record Arrays

In [None]:
people = np.array([('Alice', 25, 55.0), ('Bob', 30, 85.5)],
                   dtype=[('name','U10'), ('age','i4'), ('weight','f4')])
print(people['name'], people['age'].mean())


In [None]:
# 🖊️ TODO: add a new field 'height' to the structured array using `np.lib.recfunctions.append_fields` (hint: pip install?).
from numpy.lib import recfunctions as rfn

a = np.array([(1, 'Alice'), (2, 'Bob')],
             dtype=[('id', 'i4'), ('name', 'U10')])

heights = np.array([5.5, 6.0])

a_new = rfn.append_fields(a, 'height', heights, dtypes=float, usemask=False)

print(a_new)
print(a_new.dtype)


[(1, 'Alice', 5.5) (2, 'Bob', 6. )]
[('id', '<i4'), ('name', '<U10'), ('height', '<f8')]


## <a name='linalg'></a>1️⃣2️⃣ Linear Algebra Essentials

In [None]:
A = np.random.random((3,3))
b = np.random.random(3)
x = np.linalg.solve(A, b)
print('A·x ≈ b?', np.allclose(A.dot(x), b))


In [None]:
# 🖊️ TODO: compute eigenvalues of `A` using `np.linalg.eig`.

A = np.random.random((3, 3))
b = np.random.random(3)
x = np.linalg.solve(A, b)
print('A·x ≈ b?', np.allclose(A.dot(x), b))

eigenvalues, eigenvectors = np.linalg.eig(A)
print("Eigenvalues of A:", eigenvalues)



A·x ≈ b? True
Eigenvalues of A: [ 1.78052945  0.25072844 -0.25258037]


## <a name='io'></a>1️⃣3️⃣ File I/O (`npy`, `npz`, `txt`)

In [None]:
np.save('array.npy', A)
loaded = np.load('array.npy')
print('loaded equals A?', np.allclose(loaded, A))
np.savez('multi_arrays.npz', A=A, b=b)


In [None]:
# 🖊️ TODO: Use `np.savetxt` to write `data` (from stats section) to CSV then reload with `np.loadtxt`.

data = np.random.default_rng(0).integers(1, 100, size=(5, 4))

np.savetxt('data.csv', data, delimiter=',', fmt='%d')

loaded_data = np.loadtxt('data.csv', delimiter=',', dtype=int)

print("Original data:\n", data)
print("Loaded data:\n", loaded_data)
print("Data equal after reload?", np.array_equal(data, loaded_data))


Original data:
 [[85 64 51 27]
 [31  5  8  2]
 [18 81 65 91]
 [50 61 97 73]
 [63 54 56 93]]
Loaded data:
 [[85 64 51 27]
 [31  5  8  2]
 [18 81 65 91]
 [50 61 97 73]
 [63 54 56 93]]
Data equal after reload? True


## <a name='datetime'></a>1️⃣4️⃣ Datetime64 & Timedelta64

In [None]:
dates = np.arange('2023-01', '2023-04', dtype='datetime64[D]')
delta = dates[1:] - dates[:-1]
print(dates[:5], delta[0])


In [None]:
# 🖊️ TODO: find how many Mondays appear in `dates` array.
day_of_week = (dates.astype('datetime64[D]').view('int64') + 4) % 7  # 1970-01-01 was Thursday (day 4)
num_mondays = np.sum(day_of_week == 0)

print("Number of Mondays:", num_mondays)


Number of Mondays: 13


## <a name='mask'></a>1️⃣5️⃣ Masked Arrays & NaNs

In [None]:
arr = np.array([1, 2, np.nan, 4, np.nan])
masked = np.ma.masked_invalid(arr)
print(masked.mean())


In [None]:
# 🖊️ TODO: replace NaNs with column means in a 2‑D array containing NaNs.
import numpy as np

arr2d = np.array([
    [1,   2,  np.nan],
    [4, np.nan, 6],
    [7,   8,  9]
], dtype=float)

col_means = np.nanmean(arr2d, axis=0)

inds = np.where(np.isnan(arr2d))

arr2d[inds] = np.take(col_means, inds[1])

print("Column means:", col_means)
print("Array after replacing NaNs:\n", arr2d)


Column means: [4.  5.  7.5]
Array after replacing NaNs:
 [[1.  2.  7.5]
 [4.  5.  6. ]
 [7.  8.  9. ]]


## <a name='project'></a>1️⃣6️⃣ Mini‑Project: Fitness Data Analysis

Load `fitness.txt` (tab‑separated) then follow prompts.

In [None]:
import numpy as np, math, os, pathlib, types, textwrap, random
fitness = np.genfromtxt('fitness.txt', delimiter='\t', dtype=None, encoding=None, names=True)
print('columns:', fitness.dtype.names, 'rows:', len(fitness))


columns: ('date', 'step_count', 'mood', 'calories_burned', 'hours_of_sleep', 'bool_of_active', 'weight_kg') rows: 96


In [None]:
# 🖊️ TODO: Monthly step count, sleep vs mood correlation, weekly summary, etc.

fitness = pd.read_csv('fitness.txt', sep='\t', parse_dates=['#date'], dayfirst=True)

monthly_steps = fitness.groupby(fitness['#date'].dt.to_period('M'))['step_count'].sum()

sleep_mood_corr = fitness['hours_of_sleep'].corr(fitness['mood'])

weekly_summary = fitness.groupby(fitness['#date'].dt.isocalendar().week).agg(
    total_steps=('step_count', 'sum'),
    avg_mood=('mood', 'mean'),
    avg_sleep=('hours_of_sleep', 'mean')
)

print("=== Monthly Step Count ===")
print(monthly_steps)

print("\n=== Sleep vs Mood Correlation ===")
print(sleep_mood_corr)

print("\n=== Weekly Summary ===")
print(weekly_summary)


=== Monthly Step Count ===
#date
2017-10     79051
2017-11    103071
2017-12     89565
2018-01     10163
Freq: M, Name: step_count, dtype: int64

=== Sleep vs Mood Correlation ===
0.2104166644730015

=== Weekly Summary ===
      total_steps    avg_mood  avg_sleep
week                                    
1            5833  171.428571   3.714286
2            4330  250.000000   5.000000
40          11530  133.333333   6.000000
41          23810  114.285714   5.571429
42          20669  128.571429   6.428571
43          16283  214.285714   5.571429
44          20598  285.714286   5.142857
45          31217  285.714286   4.714286
46          17714  285.714286   5.571429
47          27859  300.000000   5.714286
48          18701  257.142857   6.428571
49          21178  214.285714   5.428571
50          22628  185.714286   4.285714
51          20682  142.857143   4.857143
52          18818  185.714286   4.142857


## <a name='conclusion'></a>1️⃣7️⃣ Conclusion & Further Practice
Congrats on covering **all core NumPy topics** from your five lecture notebooks!

*Keep experimenting, read the official docs, and try converting your NumPy pipelines into Pandas or JAX for more fun.*