<a href="https://colab.research.google.com/github/mahesh-babu-chittem/Machine-Learning-MaheshBabuChittem/blob/main/Lab1_AP23110010084_NumPy.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# 🧑‍💻 NumPy Complete Guided Project
**Instructor / Student Colab Notebook** – covers *all* key concepts from `Numpy‑1` to `Numpy‑5`.

*Generated: 08 Aug 2025*


**Table of Contents**

1. [Setup](#setup)  
2. [Array Creation & Dtypes](#creation)  
3. [Array Attributes & Inspection](#attributes)  
4. [Indexing, Slicing, Fancy Indexing](#indexing)  
5. [Reshaping, Transpose & Copies vs Views](#reshape)  
6. [Joining, Splitting, Set & Sorting Ops](#join)  
7. [Arithmetic Ops, Universal Functions](#arithmetic)  
8. [Broadcasting (Rules + Examples)](#broadcast)  
9. [Statistics & Aggregations](#stats)  
10. [Random Numbers & Reproducibility](#random)  
11. [Structured / Recarrays](#structured)  
12. [Linear Algebra Essentials](#linalg)  
13. [File I/O (`npy`, `npz`, `txt`)](#io)  
14. [Datetime64 & Timedelta64](#datetime)  
15. [Masked Arrays & NaNs](#mask)  
16. [Mini‑Project — Fitness Data Analysis](#project)  
17. [Conclusion & Next Steps](#conclusion)  


## <a name='setup'></a>1️⃣ Setup

In [1]:
import numpy as np, math, os, pathlib, types, textwrap, random
print('NumPy version:', np.__version__)

NumPy version: 2.0.2


## <a name='creation'></a>2️⃣ Array Creation & Dtypes

Key functions: `np.array`, `np.arange`, `np.linspace`, `zeros`, `ones`, `full`, `eye`, `identity`, `diag`, `empty`

In [None]:
# EXAMPLE
arr1 = np.array([1, 2, 3], dtype=np.int32)
arr2 = np.linspace(0, 1, 6)
arr3 = np.full((2,3), 7.5)
print(arr1, arr2, arr3, sep="\n")
print("dtypes:", arr1.dtype, arr2.dtype)


In [3]:
# 🖊️ TODO: create a 10×10 chessboard pattern using zeros & ones
board = np.zeros((10,10), dtype=int)
board[::2, ::2] = 1
board[1::2, 1::2] = 1
print(board)

[[1 0 1 0 1 0 1 0 1 0]
 [0 1 0 1 0 1 0 1 0 1]
 [1 0 1 0 1 0 1 0 1 0]
 [0 1 0 1 0 1 0 1 0 1]
 [1 0 1 0 1 0 1 0 1 0]
 [0 1 0 1 0 1 0 1 0 1]
 [1 0 1 0 1 0 1 0 1 0]
 [0 1 0 1 0 1 0 1 0 1]
 [1 0 1 0 1 0 1 0 1 0]
 [0 1 0 1 0 1 0 1 0 1]]


## <a name='attributes'></a>3️⃣ Array Attributes & Inspection

`shape`, `ndim`, `size`, `dtype`, `itemsize`, `nbytes`

In [6]:
M = np.arange(12).reshape(3,4)
print('shape', M.shape, 'ndim', M.ndim, 'size', M.size, 'itemsize', M.itemsize, 'total bytes', M.nbytes)


shape (3, 4) ndim 2 size 12 itemsize 8 total bytes 96


In [4]:
# 🖊️ TODO: check memory footprint of a 1000×1000 float64 array
# Create array
arr = np.zeros((1000, 1000), dtype=np.float64)

# Check memory usage in bytes
print(arr.nbytes, "bytes")

# Or in MB
print(arr.nbytes / (1024**2), "MB")

8000000 bytes
7.62939453125 MB


## <a name='indexing'></a>4️⃣ Indexing, Slicing & Fancy Indexing

In [5]:
a = np.arange(1,26).reshape(5,5)
print(a[:, 0])     # first column
print(a[::2, ::2]) # every 2nd row/col
mask = (a % 3 == 0)
print('multiples of 3:', a[mask])


[ 1  6 11 16 21]
[[ 1  3  5]
 [11 13 15]
 [21 23 25]]
multiples of 3: [ 3  6  9 12 15 18 21 24]


In [7]:
# 🖊️ TODO: use fancy indexing to swap first and last rows of `a`

# Example array
a = np.arange(1, 13).reshape(4, 3)
print("Original:\n", a)

# Swap first and last rows using fancy indexing
a[[0, -1]] = a[[-1, 0]]

print("\nAfter swapping first and last rows:\n", a)


Original:
 [[ 1  2  3]
 [ 4  5  6]
 [ 7  8  9]
 [10 11 12]]

After swapping first and last rows:
 [[10 11 12]
 [ 4  5  6]
 [ 7  8  9]
 [ 1  2  3]]


## <a name='reshape'></a>5️⃣ Reshaping, Transpose & Copies vs Views

In [8]:
b = np.arange(8)
B = b.reshape(2,4)
B[0,0] = 99
print('b is modified:', b)
C = b.reshape(2,4).copy()
C[0,0] = -1
print('b unchanged with copy:', b)


b is modified: [99  1  2  3  4  5  6  7]
b unchanged with copy: [99  1  2  3  4  5  6  7]


In [10]:
# 🖊️ TODO: Flatten a 3‑D array into 1‑D using both `ravel` and `flatten`; observe copy vs view.
# Create a 3-D array
arr = np.arange(24).reshape(2, 3, 4)
print("Original array:\n", arr)

# Flatten using ravel (view)
r = arr.ravel()
print("\nFlattened with ravel():\n", r)

# Flatten using flatten (copy)
f = arr.flatten()
print("\nFlattened with flatten():\n", f)

# Modify the ravel view
r[0] = 999
print("\nAfter modifying ravel()[0] = 999:")
print("Original array:\n", arr)    # Changes reflect here
print("Flattened ravel:\n", r)
print("Flattened flatten:\n", f)  # No change here


Original array:
 [[[ 0  1  2  3]
  [ 4  5  6  7]
  [ 8  9 10 11]]

 [[12 13 14 15]
  [16 17 18 19]
  [20 21 22 23]]]

Flattened with ravel():
 [ 0  1  2  3  4  5  6  7  8  9 10 11 12 13 14 15 16 17 18 19 20 21 22 23]

Flattened with flatten():
 [ 0  1  2  3  4  5  6  7  8  9 10 11 12 13 14 15 16 17 18 19 20 21 22 23]

After modifying ravel()[0] = 999:
Original array:
 [[[999   1   2   3]
  [  4   5   6   7]
  [  8   9  10  11]]

 [[ 12  13  14  15]
  [ 16  17  18  19]
  [ 20  21  22  23]]]
Flattened ravel:
 [999   1   2   3   4   5   6   7   8   9  10  11  12  13  14  15  16  17
  18  19  20  21  22  23]
Flattened flatten:
 [ 0  1  2  3  4  5  6  7  8  9 10 11 12 13 14 15 16 17 18 19 20 21 22 23]


## <a name='join'></a>6️⃣ Joining, Splitting, Set & Sorting Ops

In [9]:
x = np.array([1,3,5]); y = np.array([2,4,6])
xy = np.concatenate([x,y])
print('union', np.union1d(x,y))
print('intersect', np.intersect1d(xy,[1,2,10]))
print('sorted descending', np.sort(xy)[::-1])


union [1 2 3 4 5 6]
intersect [1 2]
sorted descending [6 5 4 3 2 1]


In [11]:
# 🖊️ TODO: split `xy` back into two equal halves using `np.array_split`
import numpy as np

# Example array
xy = np.arange(10)  # Let's say this is our array
print("Original array:\n", xy)

# Split into two equal halves
x, y = np.array_split(xy, 2)

print("\nFirst half (x):", x)
print("Second half (y):", y)

Original array:
 [0 1 2 3 4 5 6 7 8 9]

First half (x): [0 1 2 3 4]
Second half (y): [5 6 7 8 9]


## <a name='arithmetic'></a>7️⃣ Arithmetic Ops & Universal Functions

In [12]:
v = np.arange(5)
print('exp', np.exp(v))
print('sin', np.sin(v))
print('vectorised addition', v + 10)


exp [ 1.          2.71828183  7.3890561  20.08553692 54.59815003]
sin [ 0.          0.84147098  0.90929743  0.14112001 -0.7568025 ]
vectorised addition [10 11 12 13 14]


In [13]:
# 🖊️ TODO: given degrees [0,30,45,60,90], compute radians and sin values.
# Given degrees
degrees = np.array([0, 30, 45, 60, 90])
print("Degrees:", degrees)

# Convert to radians
radians = np.deg2rad(degrees)
print("Radians:", radians)

# Compute sine values
sine_values = np.sin(radians)
print("Sine values:", sine_values)

Degrees: [ 0 30 45 60 90]
Radians: [0.         0.52359878 0.78539816 1.04719755 1.57079633]
Sine values: [0.         0.5        0.70710678 0.8660254  1.        ]


## <a name='broadcast'></a>8️⃣ Broadcasting Rules

Rules: compare dimensions from right → left; stretch size 1 dims; mismatch error.

In [14]:
row = np.arange(5)
col = np.arange(3).reshape(3,1)
matrix = row + col  # broadcast to 3×5
print(matrix)


[[0 1 2 3 4]
 [1 2 3 4 5]
 [2 3 4 5 6]]


In [15]:
# 🖊️ TODO: use broadcasting to create a 10×10 multiplication table.
# Create row and column vectors
row = np.arange(1, 11).reshape(1, 10)   # Shape (1, 10)
col = np.arange(1, 11).reshape(10, 1)   # Shape (10, 1)

# Broadcasting multiplication
table = row * col

print("10×10 Multiplication Table:\n", table)

10×10 Multiplication Table:
 [[  1   2   3   4   5   6   7   8   9  10]
 [  2   4   6   8  10  12  14  16  18  20]
 [  3   6   9  12  15  18  21  24  27  30]
 [  4   8  12  16  20  24  28  32  36  40]
 [  5  10  15  20  25  30  35  40  45  50]
 [  6  12  18  24  30  36  42  48  54  60]
 [  7  14  21  28  35  42  49  56  63  70]
 [  8  16  24  32  40  48  56  64  72  80]
 [  9  18  27  36  45  54  63  72  81  90]
 [ 10  20  30  40  50  60  70  80  90 100]]


## <a name='stats'></a>9️⃣ Statistics & Aggregations

In [16]:
data = np.random.default_rng(0).integers(1, 100, size=(5,4))
print('data\n', data)
print('row sums', data.sum(axis=1))
print('col means', data.mean(axis=0))


data
 [[85 64 51 27]
 [31  5  8  2]
 [18 81 65 91]
 [50 61 97 73]
 [63 54 56 93]]
row sums [227  46 255 281 266]
col means [49.4 53.  55.4 57.2]


In [17]:
# 🖊️ TODO: compute `np.percentile` (25th, 50th, 75th) of flattened `data`.
# Example data
data = np.array([[10, 20, 30],
                 [40, 50, 60],
                 [70, 80, 90]])

# Flatten and compute percentiles
p25 = np.percentile(data, 25)
p50 = np.percentile(data, 50)  # Median
p75 = np.percentile(data, 75)

print("25th percentile:", p25)
print("50th percentile (median):", p50)
print("75th percentile:", p75)

25th percentile: 30.0
50th percentile (median): 50.0
75th percentile: 70.0


## <a name='random'></a>🔟 Random Numbers & Reproducibility

In [18]:
rng = np.random.default_rng(42)
rand_floats = rng.random(5)
rand_ints = rng.integers(low=10, high=50, size=5)
print(rand_floats, rand_ints)
rng2 = np.random.default_rng(42)
assert np.allclose(rand_floats, rng2.random(5))


[0.77395605 0.43887844 0.85859792 0.69736803 0.09417735] [31 49 39 40 38]


In [19]:
# 🖊️ TODO: simulate rolling a fair six‑sided die 100 times; estimate proportion of 6s.
# Simulate 100 rolls (values between 1 and 6)
rolls = np.random.randint(1, 7, size=100)

# Estimate proportion of 6s
proportion_six = np.mean(rolls == 6)

print("Rolls:", rolls)
print("Proportion of 6s:", proportion_six)

Rolls: [1 1 5 1 6 5 1 6 2 6 4 6 5 5 6 6 3 2 3 4 1 2 2 6 2 3 4 5 1 3 6 3 5 5 5 5 4
 4 6 3 4 1 6 6 1 4 5 2 5 5 4 3 3 2 6 4 1 3 1 4 1 6 1 1 3 4 6 3 6 3 5 5 6 4
 6 6 5 5 1 6 4 6 5 5 6 3 3 3 2 6 1 3 2 2 2 5 2 4 1 5]
Proportion of 6s: 0.22


## <a name='structured'></a>1️⃣1️⃣ Structured / Record Arrays

In [20]:
people = np.array([('Alice', 25, 55.0), ('Bob', 30, 85.5)],
                   dtype=[('name','U10'), ('age','i4'), ('weight','f4')])
print(people['name'], people['age'].mean())


['Alice' 'Bob'] 27.5


In [41]:
# 🖊️ TODO: add a new field 'height' to the structured array using `np.lib.recfunctions.append_fields` (hint: pip install?).
from numpy.lib import recfunctions as rfn  # no extra pip needed unless numpy is outdated
# Example structured array
data = np.array([(1, 'Alice'),
                 (2, 'Bob')],
                dtype=[('id', 'i4'), ('name', 'U10')])

print("Original array:")
print(data)

# New heights to add
heights = [165, 180]

# Append new field 'height'
data_with_height = rfn.append_fields(
    data,                      # original structured array
    'height',                  # new field name
    heights,                    # data for new field
    usemask=False
)

print("\nWith 'height' field added:")
print(data_with_height)

Original array:
[(1, 'Alice') (2, 'Bob')]

With 'height' field added:
[(1, 'Alice', 165) (2, 'Bob', 180)]


## <a name='linalg'></a>1️⃣2️⃣ Linear Algebra Essentials

In [None]:
A = np.random.random((3,3))
b = np.random.random(3)
x = np.linalg.solve(A, b)
print('A·x ≈ b?', np.allclose(A.dot(x), b))


In [42]:
# 🖊️ TODO: compute eigenvalues of `A` using `np.linalg.eig`.
A = np.array([[4, 2],
              [1, 3]])

# Compute eigenvalues and eigenvectors
eigenvalues, eigenvectors = np.linalg.eig(A)

print("Eigenvalues:", eigenvalues)
print("\nEigenvectors:\n", eigenvectors)

Eigenvalues: [5. 2.]

Eigenvectors:
 [[ 0.89442719 -0.70710678]
 [ 0.4472136   0.70710678]]


## <a name='io'></a>1️⃣3️⃣ File I/O (`npy`, `npz`, `txt`)

In [25]:
# Create example arrays
A = np.arange(9).reshape(3, 3)
b = np.linspace(0, 1, 5)

# Save A to a .npy file
np.save('array.npy', A)

# Load A back
loaded = np.load('array.npy')
print('loaded equals A?', np.allclose(loaded, A))

# Save multiple arrays to a .npz file
np.savez('multi_arrays.npz', A=A, b=b)

# Load multiple arrays back
loaded_multi = np.load('multi_arrays.npz')
print("\nKeys in loaded file:", loaded_multi.files)
print("A:\n", loaded_multi['A'])
print("b:", loaded_multi['b'])

loaded equals A? True

Keys in loaded file: ['A', 'b']
A:
 [[0 1 2]
 [3 4 5]
 [6 7 8]]
b: [0.   0.25 0.5  0.75 1.  ]


In [26]:
# 🖊️ TODO: Use `np.savetxt` to write `data` (from stats section) to CSV then reload with `np.loadtxt`.

# Example "data" from a stats section
data = np.array([[1.5, 2.3, 3.1],
                 [4.2, 5.8, 6.4],
                 [7.9, 8.6, 9.0]])

# Save to CSV
np.savetxt("stats_data.csv", data, delimiter=",", fmt="%.2f")

print("Data saved to stats_data.csv")

# Load back from CSV
loaded_data = np.loadtxt("stats_data.csv", delimiter=",")

print("\nLoaded data:")
print(loaded_data)

# Verify it matches
print("\nLoaded equals original?", np.allclose(data, loaded_data))


Data saved to stats_data.csv

Loaded data:
[[1.5 2.3 3.1]
 [4.2 5.8 6.4]
 [7.9 8.6 9. ]]

Loaded equals original? True


## <a name='datetime'></a>1️⃣4️⃣ Datetime64 & Timedelta64

In [27]:
dates = np.arange('2023-01', '2023-04', dtype='datetime64[D]')
delta = dates[1:] - dates[:-1]
print(dates[:5], delta[0])


['2023-01-01' '2023-01-02' '2023-01-03' '2023-01-04' '2023-01-05'] 1 days


In [29]:
# 🖊️ TODO: find how many Mondays appear in `dates` array.
dates = np.array([
    '2025-08-04',  # Monday
    '2025-08-05',  # Tuesday
    '2025-08-11',  # Monday
    '2025-08-13',  # Wednesday
    '2025-08-18'   # Monday
], dtype='datetime64[D]')

# Make 1970-01-05 (Monday) day 0 for correct weekday mapping
weekday_numbers = (dates.astype('datetime64[D]') - np.datetime64('1970-01-05')).astype(int) % 7

mondays_count = np.sum(weekday_numbers == 0)
print("Number of Mondays:", mondays_count)

Number of Mondays: 3


## <a name='mask'></a>1️⃣5️⃣ Masked Arrays & NaNs

In [30]:
arr = np.array([1, 2, np.nan, 4, np.nan])
masked = np.ma.masked_invalid(arr)
print(masked.mean())


2.3333333333333335


In [31]:
# 🖊️ TODO: replace NaNs with column means in a 2‑D array containing NaNs.
arr = np.array([
    [1,  np.nan, 3],
    [4,  5,      np.nan],
    [7,  8,      9]
], dtype=float)

print("Original array:\n", arr)

# Compute column means ignoring NaNs
col_means = np.nanmean(arr, axis=0)

# Find indices where NaNs are present
inds = np.where(np.isnan(arr))

# Replace NaNs with corresponding column means
arr[inds] = np.take(col_means, inds[1])

print("\nAfter replacing NaNs with column means:\n", arr)

Original array:
 [[ 1. nan  3.]
 [ 4.  5. nan]
 [ 7.  8.  9.]]

After replacing NaNs with column means:
 [[1.  6.5 3. ]
 [4.  5.  6. ]
 [7.  8.  9. ]]


## <a name='project'></a>1️⃣6️⃣ Mini‑Project: Fitness Data Analysis

Load `fitness.txt` (tab‑separated) then follow prompts.

In [35]:
from google.colab import files
uploaded = files.upload()

Saving fitness.txt to fitness.txt


In [36]:
fitness = np.genfromtxt('fitness.txt', delimiter='\t', dtype=None, encoding=None, names=True)
print('columns:', fitness.dtype.names, 'rows:', len(fitness))


columns: ('date', 'step_count', 'mood', 'calories_burned', 'hours_of_sleep', 'bool_of_active', 'weight_kg') rows: 96


In [40]:
# 🖊️ TODO: Monthly step count, sleep vs mood correlation, weekly summary, etc.
# Load file
data = np.genfromtxt("fitness.txt", delimiter="\t", names=True, dtype=None, encoding=None)

# Convert 'date' from DD-MM-YYYY → YYYY-MM-DD for datetime64
date_str = np.array(data['date'], dtype=str)
dates = np.array([f"{d[6:]}-{d[3:5]}-{d[:2]}" for d in date_str], dtype='datetime64[D]')

# Extract other columns
steps = data['step_count']
mood = data['mood']
sleep = data['hours_of_sleep']
calories = data['calories_burned']

# ===== 1. Monthly step count =====
months = dates.astype('datetime64[M]')
unique_months = np.unique(months)
monthly_steps = [steps[months == m].sum() for m in unique_months]

print("📅 Monthly Step Count:")
for m, total in zip(unique_months, monthly_steps):
    print(f"{m}: {total}")

# ===== 2. Sleep vs Mood correlation =====
sleep_mean = sleep.mean()
mood_mean = mood.mean()
corr = np.sum((sleep - sleep_mean) * (mood - mood_mean)) / (
    np.sqrt(np.sum((sleep - sleep_mean)**2) * np.sum((mood - mood_mean)**2))
)
print("\n😴 Sleep vs Mood correlation:", corr)

# ===== 3. Weekly summary =====
weeks = dates.astype('datetime64[W]')
unique_weeks = np.unique(weeks)

print("\n📊 Weekly Summary (week start, total_steps, avg_sleep, avg_mood, total_calories):")
for w in unique_weeks:
    mask = weeks == w
    print(f"{w}: {steps[mask].sum()}, {sleep[mask].mean():.2f}, {mood[mask].mean():.2f}, {calories[mask].sum()}")

📅 Monthly Step Count:
2017-10: 79051
2017-11: 103071
2017-12: 89565
2018-01: 10163

😴 Sleep vs Mood correlation: 0.21041666447300156

📊 Weekly Summary (week start, total_steps, avg_sleep, avg_mood, total_calories):
2017-10-05: 28451, 5.50, 133.33, 924
2017-10-12: 19456, 6.14, 128.57, 622
2017-10-19: 19524, 5.71, 142.86, 621
2017-10-26: 16055, 6.14, 242.86, 517
2017-11-02: 24977, 3.86, 285.71, 811
2017-11-09: 27678, 5.86, 300.00, 907
2017-11-16: 20375, 5.43, 285.71, 668
2017-11-23: 21998, 6.14, 257.14, 721
2017-11-30: 20393, 7.00, 300.00, 654
2017-12-07: 22494, 4.14, 157.14, 720
2017-12-14: 22428, 4.43, 185.71, 729
2017-12-21: 19203, 4.43, 157.14, 628
2017-12-28: 13000, 3.57, 171.43, 422
2018-01-04: 5818, 4.67, 200.00, 27


## <a name='conclusion'></a>1️⃣7️⃣ Conclusion & Further Practice
Congrats on covering **all core NumPy topics** from your five lecture notebooks!

*Keep experimenting, read the official docs, and try converting your NumPy pipelines into Pandas or JAX for more fun.*