# 🧑‍💻 NumPy Complete Guided Project
**Instructor / Student Colab Notebook** – covers *all* key concepts from `Numpy‑1` to `Numpy‑5`.

*Generated: 08 Aug 2025*


**Table of Contents**

1. [Setup](#setup)  
2. [Array Creation & Dtypes](#creation)  
3. [Array Attributes & Inspection](#attributes)  
4. [Indexing, Slicing, Fancy Indexing](#indexing)  
5. [Reshaping, Transpose & Copies vs Views](#reshape)  
6. [Joining, Splitting, Set & Sorting Ops](#join)  
7. [Arithmetic Ops, Universal Functions](#arithmetic)  
8. [Broadcasting (Rules + Examples)](#broadcast)  
9. [Statistics & Aggregations](#stats)  
10. [Random Numbers & Reproducibility](#random)  
11. [Structured / Recarrays](#structured)  
12. [Linear Algebra Essentials](#linalg)  
13. [File I/O (`npy`, `npz`, `txt`)](#io)  
14. [Datetime64 & Timedelta64](#datetime)  
15. [Masked Arrays & NaNs](#mask)  
16. [Mini‑Project — Fitness Data Analysis](#project)  
17. [Conclusion & Next Steps](#conclusion)  


## <a name='setup'></a>1️⃣ Setup

In [None]:
import numpy as np, math, os, pathlib, types, textwrap, random
print('NumPy version:', np.__version__)

## <a name='creation'></a>2️⃣ Array Creation & Dtypes

Key functions: `np.array`, `np.arange`, `np.linspace`, `zeros`, `ones`, `full`, `eye`, `identity`, `diag`, `empty`

In [13]:
# EXAMPLE
import numpy as np
arr1 = np.array([1, 2, 3], dtype=np.int32)
arr2 = np.linspace(0, 1, 6)
arr3 = np.full((2,3), 7.5)
print(arr1, arr2, arr3, sep="\n")
print("dtypes:", arr1.dtype, arr2.dtype)


[1 2 3]
[0.  0.2 0.4 0.6 0.8 1. ]
[[7.5 7.5 7.5]
 [7.5 7.5 7.5]]
dtypes: int32 float64


In [17]:
# 🖊️ TODO: create a 10×10 chessboard pattern using zeros & ones
import numpy as np
arr = (np.indices((10,10)).sum(axis=0)) % 2
print(arr)

[[0 1 0 1 0 1 0 1 0 1]
 [1 0 1 0 1 0 1 0 1 0]
 [0 1 0 1 0 1 0 1 0 1]
 [1 0 1 0 1 0 1 0 1 0]
 [0 1 0 1 0 1 0 1 0 1]
 [1 0 1 0 1 0 1 0 1 0]
 [0 1 0 1 0 1 0 1 0 1]
 [1 0 1 0 1 0 1 0 1 0]
 [0 1 0 1 0 1 0 1 0 1]
 [1 0 1 0 1 0 1 0 1 0]]


## <a name='attributes'></a>3️⃣ Array Attributes & Inspection

`shape`, `ndim`, `size`, `dtype`, `itemsize`, `nbytes`

In [12]:
M = np.arange(12).reshape(3,4)
print('shape', M.shape, 'ndim', M.ndim, 'size', M.size, 'itemsize', M.itemsize, 'total bytes', M.nbytes)


shape (3, 4) ndim 2 size 12 itemsize 8 total bytes 96


In [19]:
# 🖊️ TODO: check memory footprint of a 1000×1000 float64 array
import numpy as np
arr = np.ones((1000,1000), dtype=np.float64)
print('size',arr.size)
print('itemsize',arr.itemsize)
print('bytes', arr.nbytes)

size 1000000
itemsize 8
bytes 8000000


## <a name='indexing'></a>4️⃣ Indexing, Slicing & Fancy Indexing

In [20]:
import numpy as np
a = np.arange(1,26).reshape(5,5)
print(a)
print(a[:, 0])     # first column
print(a[::2, ::2]) # every 2nd row/col
mask = (a % 3 == 0)
print('multiples of 3:', a[mask])


[[ 1  2  3  4  5]
 [ 6  7  8  9 10]
 [11 12 13 14 15]
 [16 17 18 19 20]
 [21 22 23 24 25]]
[ 1  6 11 16 21]
[[ 1  3  5]
 [11 13 15]
 [21 23 25]]
multiples of 3: [ 3  6  9 12 15 18 21 24]


In [21]:
# 🖊️ TODO: use fancy indexing to swap first and last rows of `a`
import numpy as np
a = np.arange(1,26).reshape(5,5)
a[[0,-1]] = a[[-1, 0]]
print(a)

[[21 22 23 24 25]
 [ 6  7  8  9 10]
 [11 12 13 14 15]
 [16 17 18 19 20]
 [ 1  2  3  4  5]]


## <a name='reshape'></a>5️⃣ Reshaping, Transpose & Copies vs Views

In [22]:
import numpy as np
b = np.arange(8)
B = b.reshape(2,4)
B[0,0] = 99
print('b is modified:', b)
C = b.reshape(2,4).copy()
C[0,0] = -1
print('b unchanged with copy:', b)


b is modified: [99  1  2  3  4  5  6  7]
b unchanged with copy: [99  1  2  3  4  5  6  7]


In [24]:
# 🖊️ TODO: Flatten a 3‑D array into 1‑D using both `ravel` and `flatten`; observe copy vs view.
import numpy as np
arr = np.arange(24).reshape(2, 3, 4)
ARR = arr.flatten()
ARR[0] = 1001
print('arr is not modified with flatten:', arr)
ARR2 = arr.ravel()
ARR2[0] = 1001
print('arr is modified with ravel:', arr)

arr is not modified with flatten: [[[ 0  1  2  3]
  [ 4  5  6  7]
  [ 8  9 10 11]]

 [[12 13 14 15]
  [16 17 18 19]
  [20 21 22 23]]]
arr is modified with ravel: [[[1001    1    2    3]
  [   4    5    6    7]
  [   8    9   10   11]]

 [[  12   13   14   15]
  [  16   17   18   19]
  [  20   21   22   23]]]


## <a name='join'></a>6️⃣ Joining, Splitting, Set & Sorting Ops

In [25]:
x = np.array([1,3,5]); y = np.array([2,4,6])
xy = np.concatenate([x,y])
print('union', np.union1d(x,y))
print('intersect', np.intersect1d(xy,[1,2,10]))
print('sorted descending', np.sort(xy)[::-1])


union [1 2 3 4 5 6]
intersect [1 2]
sorted descending [6 5 4 3 2 1]


In [28]:
# 🖊️ TODO: split `xy` back into two equal halves using `np.array_split`
import numpy as np
x = np.array([1,3,5]); y = np.array([2,4,6])
xy = np.concatenate([x,y])
print(xy)
x, y = np.array_split(xy,2)
print(x)
print(y)

[1 3 5 2 4 6]
[1 3 5]
[2 4 6]


## <a name='arithmetic'></a>7️⃣ Arithmetic Ops & Universal Functions

In [None]:
v = np.arange(5)
print('exp', np.exp(v))
print('sin', np.sin(v))
print('vectorised addition', v + 10)


In [32]:
# 🖊️ TODO: given degrees [0,30,45,60,90], compute radians and sin values.
import numpy as np
arr = np.array([0,30,45,60,90])
radians = np.radians(arr)
sin_values = np.sin(radians)
print("radians: ", radians)
print("sin values: ", sin_values)

radians:  [0.         0.52359878 0.78539816 1.04719755 1.57079633]
sin values:  [0.         0.5        0.70710678 0.8660254  1.        ]


## <a name='broadcast'></a>8️⃣ Broadcasting Rules

Rules: compare dimensions from right → left; stretch size 1 dims; mismatch error.

In [None]:
row = np.arange(5)
col = np.arange(3).reshape(3,1)
matrix = row + col  # broadcast to 3×5
print(matrix)


In [34]:
# 🖊️ TODO: use broadcasting to create a 10×10 multiplication table.
import numpy as np
row = np.arange(11)
col = np.arange(11).reshape(11,1)
mat = row * col
print(mat)


[[  0   0   0   0   0   0   0   0   0   0   0]
 [  0   1   2   3   4   5   6   7   8   9  10]
 [  0   2   4   6   8  10  12  14  16  18  20]
 [  0   3   6   9  12  15  18  21  24  27  30]
 [  0   4   8  12  16  20  24  28  32  36  40]
 [  0   5  10  15  20  25  30  35  40  45  50]
 [  0   6  12  18  24  30  36  42  48  54  60]
 [  0   7  14  21  28  35  42  49  56  63  70]
 [  0   8  16  24  32  40  48  56  64  72  80]
 [  0   9  18  27  36  45  54  63  72  81  90]
 [  0  10  20  30  40  50  60  70  80  90 100]]


## <a name='stats'></a>9️⃣ Statistics & Aggregations

In [35]:
data = np.random.default_rng(0).integers(1, 100, size=(5,4))
print('data\n', data)
print('row sums', data.sum(axis=1))
print('col means', data.mean(axis=0))


data
 [[85 64 51 27]
 [31  5  8  2]
 [18 81 65 91]
 [50 61 97 73]
 [63 54 56 93]]
row sums [227  46 255 281 266]
col means [49.4 53.  55.4 57.2]


In [37]:
# 🖊️ TODO: compute `np.percentile` (25th, 50th, 75th) of flattened `data`.
data = np.random.default_rng(0).integers(1, 100, size=(5,4))
data2 = data.flatten()
p25 = np.percentile(data2, 25)
p50 = np.percentile(data2, 50)
p75 = np.percentile(data2, 75)

print("25th Percentile:", p25)
print("50th Percentile :", p50)
print("75th Percentile:", p75)

25th Percentile: 30.0
50th Percentile : 58.5
75th Percentile: 75.0


## <a name='random'></a>🔟 Random Numbers & Reproducibility

In [None]:
rng = np.random.default_rng(42)
rand_floats = rng.random(5)
rand_ints = rng.integers(low=10, high=50, size=5)
print(rand_floats, rand_ints)
rng2 = np.random.default_rng(42)
assert np.allclose(rand_floats, rng2.random(5))


In [1]:
# 🖊️ TODO: simulate rolling a fair six‑sided die 100 times; estimate proportion of 6s.
import numpy as np
rng = np.random.default_rng(42)
rolls = rng.integers(1, 7, size=100)
proportion_of_6s = np.count_nonzero(rolls == 6) / len(rolls)

print("Number of 6s:", np.count_nonzero(rolls == 6))
print("Proportion of 6s:", proportion_of_6s)


Number of 6s: 14
Proportion of 6s: 0.14


## <a name='structured'></a>1️⃣1️⃣ Structured / Record Arrays

In [None]:
people = np.array([('Alice', 25, 55.0), ('Bob', 30, 85.5)],
                   dtype=[('name','U10'), ('age','i4'), ('weight','f4')])
print(people['name'], people['age'].mean())


In [2]:
# 🖊️ TODO: add a new field 'height' to the structured array using `np.lib.recfunctions.append_fields` (hint: pip install?).
import numpy as np
from numpy.lib import recfunctions as rfn  # required import
people = np.array([('Alice', 25, 55.0), ('Bob', 30, 85.5)],
                   dtype=[('name','U10'), ('age','i4'), ('weight','f4')])
heights = [160.0, 175.0]
people_with_height = rfn.append_fields(people, 'height', heights, dtypes='f4', usemask=False)
print(people_with_height)

[('Alice', 25, 55. , 160.) ('Bob', 30, 85.5, 175.)]


## <a name='linalg'></a>1️⃣2️⃣ Linear Algebra Essentials

In [None]:
A = np.random.random((3,3))
b = np.random.random(3)
x = np.linalg.solve(A, b)
print('A·x ≈ b?', np.allclose(A.dot(x), b))


In [3]:
# 🖊️ TODO: compute eigenvalues of `A` using `np.linalg.eig`.
import numpy as np
A = np.random.random((3, 3))
eigenvalues, eigenvectors = np.linalg.eig(A)
print("Matrix A:", A)
print("Eigenvalues:", eigenvalues)
print("Eigenvectors:", eigenvectors)

Matrix A: [[0.63957598 0.56769183 0.94158222]
 [0.53183168 0.01240355 0.6476112 ]
 [0.39873619 0.49125154 0.57577016]]
Eigenvalues: [ 1.61309363 -0.02623082 -0.35911312]
Eigenvectors: [[-0.74264988 -0.75825981  0.1097154 ]
 [-0.44809705 -0.15926526 -0.89838738]
 [-0.49767479  0.63219984  0.42527949]]


## <a name='io'></a>1️⃣3️⃣ File I/O (`npy`, `npz`, `txt`)

In [None]:
np.save('array.npy', A)
loaded = np.load('array.npy')
print('loaded equals A?', np.allclose(loaded, A))
np.savez('multi_arrays.npz', A=A, b=b)


In [4]:
# 🖊️ TODO: Use `np.savetxt` to write `data` (from stats section) to CSV then reload with `np.loadtxt`.
import numpy as np
data = np.random.default_rng(0).integers(1, 100, size=(5, 4))
np.savetxt('data.csv', data, delimiter=',', fmt='%d')
loaded_data = np.loadtxt('data.csv', delimiter=',')
print("equals original?", np.allclose(loaded_data, data))

equals original? True


## <a name='datetime'></a>1️⃣4️⃣ Datetime64 & Timedelta64

In [None]:
dates = np.arange('2023-01', '2023-04', dtype='datetime64[D]')
delta = dates[1:] - dates[:-1]
print(dates[:5], delta[0])


In [5]:
# 🖊️ TODO: find how many Mondays appear in `dates` array.
import numpy as np
dates = np.arange('2023-01', '2023-04', dtype='datetime64[D]')
weekdays = dates.astype('datetime64[D]').astype('datetime64[W]').astype(int)
days = dates.astype('datetime64[D]').astype(int) - weekdays * 7
weekday_numbers = (days % 7)
num_mondays = np.sum(weekday_numbers == 0)
print(f"Number of Mondays from Jan 1 to Mar 31, 2023: {num_mondays}")

Number of Mondays from Jan 1 to Mar 31, 2023: 13


## <a name='mask'></a>1️⃣5️⃣ Masked Arrays & NaNs

In [None]:
arr = np.array([1, 2, np.nan, 4, np.nan])
masked = np.ma.masked_invalid(arr)
print(masked.mean())


In [9]:
# 🖊️ TODO: replace NaNs with column means in a 2‑D array containing NaNs.
import numpy as np
arr = np.array([
    [1.0, 2.0, np.nan],
    [4.0, np.nan, 6.0],
    [7.0, 8.0, 9.0]
])
col_means = np.nanmean(arr, axis=0)
nan_mask = np.isnan(arr)
arr[nan_mask] = np.take(col_means, np.where(nan_mask)[1])
print("Filled array:", arr)


/content


## <a name='project'></a>1️⃣6️⃣ Mini‑Project: Fitness Data Analysis

Load `fitness.txt` (tab‑separated) then follow prompts.

In [10]:
fitness = np.genfromtxt('fitness.txt', delimiter='\t', dtype=None, encoding=None, names=True)
print('columns:', fitness.dtype.names, 'rows:', len(fitness))


Saving fitness.txt to fitness.txt


In [14]:
from google.colab import files
uploaded = files.upload()
# 🖊 TODO: Monthly step count, sleep vs mood correlation, weekly summary, etc.

import numpy as np
from scipy.stats import pearsonr
from datetime import datetime
fitness = np.genfromtxt('fitness.txt', delimiter='\t', dtype=None, encoding=None, names=True)

if '#date' in fitness.dtype.names:
    fitness.dtype.names = tuple(['date' if name == '#date' else name for name in fitness.dtype.names])

# Convert to numpy datetime64
date_strs = [datetime.strptime(d, "%d-%m-%Y").strftime("%Y-%m-%d") for d in fitness['date']]
dates = np.array(date_strs, dtype='datetime64[D]')

# step count monthly
print("Monthly Step Counts:")
months = dates.astype('datetime64[M]')
for month in np.unique(months):
    mask = months == month
    total = fitness['step_count'][mask].sum()
    print(f"{month}: {total} steps")

sleep = np.where(fitness['hours_of_sleep'] == 500, 0, fitness['hours_of_sleep'])

# sleep vs mood
print("Sleep vs Mood Correlation:")
mood = fitness['mood']
corr, _ = pearsonr(sleep, mood)
print(f"Pearson correlation: {corr:.2f}")

# weekly average
print("Weekly Averages (Steps, Sleep):")
weeks = dates.astype('datetime64[W]')
for week in np.unique(weeks):
    mask = weeks == week
    avg_steps = fitness['step_count'][mask].mean()
    avg_sleep = sleep[mask].mean()
    print(f"{week}: Avg Steps = {avg_steps:.0f}, Avg Sleep = {avg_sleep:.1f} hrs")

Monthly Step Counts:
2017-10: 79051 steps
2017-11: 103071 steps
2017-12: 89565 steps
2018-01: 10163 steps
Sleep vs Mood Correlation:
Pearson correlation: 0.21
Weekly Averages (Steps, Sleep):
2017-10-05: Avg Steps = 4742, Avg Sleep = 5.5 hrs
2017-10-12: Avg Steps = 2779, Avg Sleep = 6.1 hrs
2017-10-19: Avg Steps = 2789, Avg Sleep = 5.7 hrs
2017-10-26: Avg Steps = 2294, Avg Sleep = 6.1 hrs
2017-11-02: Avg Steps = 3568, Avg Sleep = 3.9 hrs
2017-11-09: Avg Steps = 3954, Avg Sleep = 5.9 hrs
2017-11-16: Avg Steps = 2911, Avg Sleep = 5.4 hrs
2017-11-23: Avg Steps = 3143, Avg Sleep = 6.1 hrs
2017-11-30: Avg Steps = 2913, Avg Sleep = 7.0 hrs
2017-12-07: Avg Steps = 3213, Avg Sleep = 4.1 hrs
2017-12-14: Avg Steps = 3204, Avg Sleep = 4.4 hrs
2017-12-21: Avg Steps = 2743, Avg Sleep = 4.4 hrs
2017-12-28: Avg Steps = 1857, Avg Sleep = 3.6 hrs
2018-01-04: Avg Steps = 970, Avg Sleep = 4.7 hrs


## <a name='conclusion'></a>1️⃣7️⃣ Conclusion & Further Practice
Congrats on covering **all core NumPy topics** from your five lecture notebooks!

*Keep experimenting, read the official docs, and try converting your NumPy pipelines into Pandas or JAX for more fun.*