# Unit 2 Lab: Data Processing with Python (NumPy)

**Focus:** NumPy arrays, dimensions, reshape, slicing, masking, broadcasting. Simulated examples included.

In [None]:
import numpy as np
np.set_printoptions(precision=2)

# Create arrays
arr1 = np.arange(12)
arr2 = arr1.reshape(3,4)
print('1D array:', arr1)
print('\n2D array:\n', arr2)

## Task 1 — Array properties
- Check `ndim`, `shape`, `size`, `dtype` for `arr2`.

In [None]:
print('ndim:', arr2.ndim)
print('shape:', arr2.shape)
print('size:', arr2.size)
print('dtype:', arr2.dtype)

## Task 2 — Slicing & Masking
- Extract the second row, last column, and a submatrix `arr2[0:2, 1:3]`.
- Create a mask for elements greater than 5 and apply it.

In [None]:
print('Second row:', arr2[1])
print('Last column:', arr2[:, -1])
print('Submatrix:\n', arr2[0:2, 1:3])

mask = arr2 > 5
print('\nMask >5:\n', mask)
print('\nElements >5:', arr2[mask])

## Task 3 — Broadcasting & Arithmetic
- Add a vector `[1,10,100,1000]` to each row of `arr2` using broadcasting.
- Compute row-wise and column-wise sums.

In [None]:
vec = np.array([1,10,100,1000])
res = arr2 + vec
print('Broadcasted addition result:\n', res)

row_sum = arr2.sum(axis=1)
col_sum = arr2.sum(axis=0)
print('\nRow sums:', row_sum)
print('Col sums:', col_sum)

## Task 4 — Random Data Simulation
- Use `np.random` to simulate 1000 samples of 'daily steps' with mean 7000 and std 1500. Plot histogram and compute mean/std.

In [None]:
import matplotlib.pyplot as plt
steps = np.random.normal(7000, 1500, 1000).astype(int).clip(min=0)
print('Simulated mean steps:', steps.mean())
print('Simulated std steps:', steps.std())

plt.figure(figsize=(8,4))
plt.hist(steps, bins=30)
plt.title('Histogram of Simulated Daily Steps')
plt.xlabel('Steps')
plt.show()

## Short Reflection
- How can broadcasting help avoid writing explicit loops? Give one practical example.


---
## Trainer's Answers & Expected Outputs — Unit 2 Lab

**Task 1 — Array properties**  
- `arr2.ndim` → 2, `arr2.shape` → (3, 4), `arr2.size` → 12, `arr2.dtype` → int64 (or int32 depending on environment).

**Task 2 — Slicing & Masking**  
- Second row: array of 4 elements (values 4..7 depending on reshape ordering).  
- Last column: array of last-column elements across rows.  
- Submatrix `arr2[0:2,1:3]` → 2x2 block.  
- Mask `arr2 > 5` returns boolean array; `arr2[mask]` lists elements >5 (should be numbers 6..11).

**Task 3 — Broadcasting & Arithmetic**  
- Broadcasting result adds the vector element-wise to each row. Check shapes to confirm no errors.  
- Row sums and column sums printed as arrays of appropriate lengths.

**Task 4 — Random Data Simulation**  
- Simulated mean steps ~7000, std ~1500 (may vary). Histogram shows approximate normal spread, clipped at 0.

**Grading / Discussion Tips:**  
- Ensure students demonstrate correct use of `reshape`, `mask`, and broadcasting instead of loops.  
- Look for correct interpretation of histogram and numeric summary.
---
