# NumPy - Part 4: Statistics and Linear Algebra

Explore NumPy's powerful statistical functions and linear algebra capabilities for data analysis and scientific computing.

## What You'll Learn
- Statistical functions (mean, std, var, min, max)
- Percentiles and quantiles
- Random number generation
- Sorting and finding unique values
- Basic linear algebra operations
- Matrix properties (transpose, inverse, determinant)

## How to Use This Notebook
1. Read each problem description carefully
2. Write your solution in the code cell (replace `None` with your answer)
3. Run the check cell to verify your solution
4. If incorrect, review the hint and try again

**Problems:** 15 (Easy: 1-5, Medium: 6-10, Hard: 11-15)

In [None]:
# ============================================
# SETUP - Run this cell first!
# ============================================
import numpy as np
import sys
sys.path.insert(0, '..')
from utils.checker import check

np.random.seed(42)  # For reproducibility
print("Setup complete! NumPy version:", np.__version__)

---
## Problem 1: Find Maximum Value

### Difficulty: Easy

### Concept
`np.max()` or `.max()` finds the largest value in an array. This is a fundamental operation for data analysis, finding peaks, or identifying outliers.

### Syntax
```python
np.max(arr)         # Maximum of all elements
arr.max()           # Method syntax
np.max(arr, axis=0) # Maximum along axis 0
```

### Example
```python
arr = np.array([3, 7, 1, 9, 2])
np.max(arr)    # 9

arr = np.array([[1, 5], [3, 2]])
np.max(arr, axis=0)   # [3, 5] - max of each column
```

### Task
Find the maximum value in the given array.

### Expected Properties
- Should be a single integer
- Value should be 9

In [None]:
# Given data
arr = np.array([3, 1, 4, 1, 5, 9, 2, 6])

# Your solution:
max_val = None

In [None]:
# Verification
check.is_not_none(max_val, "P1: Value check")
check.is_true(max_val == 9, "P1: Maximum value", "Should be 9")

---
## Problem 2: Find Index of Maximum

### Difficulty: Easy

### Concept
`np.argmax()` returns the index (position) of the maximum value, not the value itself. This is useful when you need to know where the maximum occurs, not just what it is.

### Syntax
```python
np.argmax(arr)         # Index of maximum
arr.argmax()           # Method syntax
np.argmin(arr)         # Index of minimum
```

### Example
```python
arr = np.array([10, 5, 30, 15])
np.argmax(arr)    # 2 (30 is at index 2)
np.argmin(arr)    # 1 (5 is at index 1)
```

### Task
Find the index of the maximum value in the given array.

### Expected Properties
- Should be a single integer
- Value should be 5 (the position of 9)

In [None]:
# Given data
arr = np.array([3, 1, 4, 1, 5, 9, 2, 6])

# Your solution:
max_idx = None

In [None]:
# Verification
check.is_not_none(max_idx, "P2: Value check")
check.is_true(max_idx == 5, "P2: Index of maximum", "Should be 5")

---
## Problem 3: Calculate Standard Deviation

### Difficulty: Easy

### Concept
Standard deviation measures how spread out the values are from the mean. A low standard deviation means values are close to the mean; a high value means they're spread out. This is crucial for understanding data variability.

### Formula
```
σ = √(Σ(x - μ)² / N)
where μ is the mean, N is the count
```

### Syntax
```python
np.std(arr)           # Standard deviation
arr.std()             # Method syntax
np.var(arr)           # Variance (std squared)
```

### Example
```python
arr = np.array([2, 4, 4, 4, 5, 5, 7, 9])
np.std(arr)    # 2.0
np.var(arr)    # 4.0
```

### Task
Calculate the standard deviation of the given array.

### Expected Properties
- Should be a float value
- Value should be approximately 2.0

In [None]:
# Given data
arr = np.array([2, 4, 4, 4, 5, 5, 7, 9])

# Your solution:
std_val = None

In [None]:
# Verification
check.is_not_none(std_val, "P3: Value check")
check.is_true(np.isclose(std_val, 2.0), "P3: Standard deviation", "Should be approximately 2.0")

---
## Problem 4: Sort Array

### Difficulty: Easy

### Concept
`np.sort()` returns a sorted copy of an array in ascending order. The original array remains unchanged. For descending order, you can reverse the sorted result.

### Syntax
```python
np.sort(arr)          # Sorted copy (ascending)
arr.sort()            # In-place sort (modifies arr)
np.sort(arr)[::-1]    # Descending order
```

### Example
```python
arr = np.array([3, 1, 4, 1, 5])
np.sort(arr)          # [1, 1, 3, 4, 5]
np.sort(arr)[::-1]    # [5, 4, 3, 1, 1]
```

### Task
Sort the given array in ascending order.

### Expected Properties
- Should be an array of length 8
- First element should be 1
- Last element should be 9
- Should be sorted in ascending order

In [None]:
# Given data
arr = np.array([3, 1, 4, 1, 5, 9, 2, 6])

# Your solution:
sorted_arr = None

In [None]:
# Verification
check.is_type(sorted_arr, np.ndarray, "P4: Type check")
check.has_length(sorted_arr, 8, "P4: Length check")
check.first_element_is(sorted_arr, 1, "P4: First element")
check.last_element_is(sorted_arr, 9, "P4: Last element")
check.is_sorted(sorted_arr, "P4: Should be sorted")

---
## Problem 5: Find Unique Values

### Difficulty: Easy

### Concept
`np.unique()` returns the sorted unique elements of an array, removing all duplicates. This is useful for finding distinct categories, values, or labels in your data.

### Syntax
```python
np.unique(arr)                    # Unique values only
np.unique(arr, return_counts=True)  # Also return counts of each
```

### Example
```python
arr = np.array([1, 2, 2, 3, 3, 3])
np.unique(arr)    # [1, 2, 3]

values, counts = np.unique(arr, return_counts=True)
# values: [1, 2, 3]
# counts: [1, 2, 3]
```

### Task
Find the unique values in the given array.

### Expected Properties
- Should be an array of length 4
- Should contain values 1, 2, 3, 4
- Should be sorted

In [None]:
# Given data
arr = np.array([1, 2, 2, 3, 3, 3, 4, 4, 4, 4])

# Your solution:
unique = None

In [None]:
# Verification
check.is_type(unique, np.ndarray, "P5: Type check")
check.has_length(unique, 4, "P5: Length check")
check.first_element_is(unique, 1, "P5: First element")
check.last_element_is(unique, 4, "P5: Last element")
check.is_sorted(unique, "P5: Should be sorted")

---
## Problem 6: Calculate Median

### Difficulty: Medium

### Concept
The median is the middle value when data is sorted. If there's an even number of values, it's the average of the two middle values. Unlike the mean, the median is resistant to outliers.

### Syntax
```python
np.median(arr)         # Median of all elements
np.median(arr, axis=0) # Median along axis
```

### Example
```python
arr = np.array([1, 3, 5, 7, 9])
np.median(arr)    # 5.0 (middle value)

arr = np.array([1, 2, 3, 4])
np.median(arr)    # 2.5 (average of 2 and 3)
```

### Task
Calculate the median of the given array.

### Expected Properties
- Should be a float value
- Value should be 5.0

In [None]:
# Given data
arr = np.array([1, 3, 5, 7, 9])

# Your solution:
median_val = None

In [None]:
# Verification
check.is_not_none(median_val, "P6: Value check")
check.is_true(np.isclose(median_val, 5.0), "P6: Median value", "Should be 5.0")

---
## Problem 7: Calculate Percentile

### Difficulty: Medium

### Concept
A percentile indicates the value below which a given percentage of observations fall. The 75th percentile (also called Q3 or upper quartile) is the value below which 75% of the data falls.

### Common Percentiles
- 25th (Q1): First quartile
- 50th (Q2): Median
- 75th (Q3): Third quartile

### Syntax
```python
np.percentile(arr, q)     # q-th percentile
np.percentile(arr, 50)    # Same as median
np.percentile(arr, [25, 50, 75])  # Multiple percentiles
```

### Example
```python
arr = np.arange(1, 11)    # [1, 2, ..., 10]
np.percentile(arr, 50)    # 5.5 (median)
np.percentile(arr, 75)    # 7.75 (75th percentile)
```

### Task
Calculate the 75th percentile of an array containing values 1 to 100.

### Expected Properties
- Should be a float value
- Value should be approximately 75.75

In [None]:
# Given data
arr = np.arange(1, 101)

# Your solution:
p75 = None

In [None]:
# Verification
check.is_not_none(p75, "P7: Value check")
check.is_true(np.isclose(p75, 75.75), "P7: 75th percentile", "Should be approximately 75.75")

---
## Problem 8: Generate Random Integers

### Difficulty: Medium

### Concept
`np.random.randint()` generates random integers from a specified range. This is useful for sampling, shuffling data, or creating test datasets.

### Syntax
```python
np.random.randint(low, high, size)  # high is exclusive
np.random.randint(10, size=5)       # 5 random ints from 0-9
np.random.randint(1, 101, size=10)  # 10 random ints from 1-100
```

### Example
```python
np.random.seed(42)  # For reproducibility
np.random.randint(1, 7, size=5)    # [7, 4, 8, 5, 1] - dice rolls
```

### Task
Generate 10 random integers between 1 and 100 (inclusive).

### Expected Properties
- Should be an array of length 10
- All values should be between 1 and 100 (inclusive)
- Should be integer type

In [None]:
# Your solution:
np.random.seed(42)
random_ints = None

In [None]:
# Verification
check.is_type(random_ints, np.ndarray, "P8: Type check")
check.has_length(random_ints, 10, "P8: Length check")
check.all_values_in_range(random_ints, 1, 100, "P8: Range check")
check.is_true(random_ints.dtype in [np.int32, np.int64], "P8: Integer type", "Should be integer type")

---
## Problem 9: Generate Normal Distribution

### Difficulty: Medium

### Concept
`np.random.normal()` generates samples from a normal (Gaussian) distribution. This is fundamental for statistical modeling and simulations. The normal distribution is characterized by its mean (μ) and standard deviation (σ).

### Syntax
```python
np.random.normal(mean, std, size)
np.random.normal(0, 1, 1000)       # Standard normal (μ=0, σ=1)
np.random.normal(100, 15, 500)     # IQ scores
```

### Example
```python
np.random.seed(42)
samples = np.random.normal(50, 10, 1000)
np.mean(samples)   # Close to 50
np.std(samples)    # Close to 10
```

### Task
Generate 1000 samples from a normal distribution with mean=50 and standard deviation=10.

### Expected Properties
- Should be an array of length 1000
- Mean should be close to 50 (within ±2)
- Standard deviation should be close to 10 (within ±1)

In [None]:
# Your solution:
np.random.seed(42)
normal_samples = None

In [None]:
# Verification
check.is_type(normal_samples, np.ndarray, "P9: Type check")
check.has_length(normal_samples, 1000, "P9: Length check")
check.mean_is_close(normal_samples, 50, "P9: Mean check", tolerance=2)
check.std_is_close(normal_samples, 10, "P9: Std check", tolerance=1)

---
## Problem 10: Argsort

### Difficulty: Medium

### Concept
`np.argsort()` returns the indices that would sort an array, not the sorted values themselves. This is useful when you need to sort multiple related arrays in the same order or track the original positions.

### Syntax
```python
np.argsort(arr)          # Indices for ascending sort
np.argsort(arr)[::-1]    # Indices for descending sort
```

### Example
```python
arr = np.array([30, 10, 40, 20])
indices = np.argsort(arr)    # [1, 3, 0, 2]
arr[indices]                 # [10, 20, 30, 40] - sorted
```

### Use Case
```python
# Sort students by scores, keeping names aligned
scores = np.array([85, 92, 78])
names = np.array(['Alice', 'Bob', 'Carol'])
idx = np.argsort(scores)[::-1]  # Descending
names[idx]  # ['Bob', 'Alice', 'Carol']
```

### Task
Get the indices that would sort the given array in ascending order.

### Expected Properties
- Should be an array of length 4
- First element should be 1 (index of 10)
- Last element should be 2 (index of 40)

In [None]:
# Given data
arr = np.array([30, 10, 40, 20])

# Your solution:
sort_indices = None

In [None]:
# Verification
check.is_type(sort_indices, np.ndarray, "P10: Type check")
check.has_length(sort_indices, 4, "P10: Length check")
check.first_element_is(sort_indices, 1, "P10: First element")
check.last_element_is(sort_indices, 2, "P10: Last element")
# Verify it actually sorts the array
check.is_sorted(arr[sort_indices], "P10: Should produce sorted array")

---
## Problem 11: Matrix Transpose

### Difficulty: Hard

### Concept
The transpose of a matrix flips it over its diagonal, converting rows to columns and vice versa. For a matrix A with shape (m, n), the transpose A^T has shape (n, m).

### Syntax
```python
arr.T              # Transpose property
np.transpose(arr)  # Transpose function
```

### Example
```python
A = np.array([[1, 2, 3],
              [4, 5, 6]])
A.T
# [[1, 4],
#  [2, 5],
#  [3, 6]]
```

### Use Cases
- Converting row vectors to column vectors
- Matrix operations in linear algebra
- Data reshaping (features as rows vs columns)

### Task
Compute the transpose of a 3x4 matrix.

### Expected Properties
- Should be a 2D array with shape (4, 3)
- Element at [0, 0] should equal element at [0, 0] of original
- Element at [1, 2] should equal element at [2, 1] of original

In [None]:
# Given data
matrix = np.arange(12).reshape(3, 4)
print("Original matrix:")
print(matrix)

# Your solution:
transposed = None

In [None]:
# Verification
check.is_type(transposed, np.ndarray, "P11: Type check")
check.has_shape(transposed, (4, 3), "P11: Shape check")
check.first_element_is(transposed.flatten(), 0, "P11: First element")
check.is_true(transposed[1, 2] == matrix[2, 1], "P11: Transpose property", "T[i,j] should equal original[j,i]")

---
## Problem 12: Matrix Inverse

### Difficulty: Hard

### Concept
The inverse of a matrix A (denoted A⁻¹) is a matrix such that A × A⁻¹ = I (identity matrix). Not all matrices have inverses - only square matrices with non-zero determinant.

### Properties
- Only square matrices can have inverses
- A × A⁻¹ = A⁻¹ × A = I
- Used to solve systems of linear equations

### Syntax
```python
np.linalg.inv(A)     # Matrix inverse
np.linalg.det(A)     # Determinant (must be non-zero)
```

### Example
```python
A = np.array([[1, 2], [3, 4]])
A_inv = np.linalg.inv(A)
A @ A_inv   # Should be close to [[1, 0], [0, 1]]
```

### Task
Calculate the inverse of the given 2x2 matrix.

### Expected Properties
- Should be a 2x2 array
- When multiplied with original, should give identity matrix
- Product should have 1s on diagonal and ~0s elsewhere

In [None]:
# Given data
A = np.array([[1, 2], [3, 4]])

# Your solution:
A_inv = None

In [None]:
# Verification
check.is_type(A_inv, np.ndarray, "P12: Type check")
check.has_shape(A_inv, (2, 2), "P12: Shape check")
# Verify A @ A_inv ≈ I
identity = A @ A_inv
check.is_true(np.allclose(identity, np.eye(2)), "P12: Inverse property", "A @ A_inv should equal identity matrix")

---
## Problem 13: Correlation Coefficient

### Difficulty: Hard

### Concept
The Pearson correlation coefficient measures the linear relationship between two variables. It ranges from -1 (perfect negative correlation) to +1 (perfect positive correlation), with 0 indicating no linear correlation.

### Interpretation
- +1: Perfect positive correlation (x increases → y increases)
- 0: No linear correlation
- -1: Perfect negative correlation (x increases → y decreases)

### Syntax
```python
np.corrcoef(x, y)      # Returns 2x2 correlation matrix
np.corrcoef(x, y)[0, 1]  # Extract correlation value
```

### Example
```python
x = np.array([1, 2, 3, 4, 5])
y = np.array([2, 4, 6, 8, 10])  # y = 2x
corr_matrix = np.corrcoef(x, y)
corr = corr_matrix[0, 1]  # 1.0 (perfect correlation)
```

### Task
Calculate the Pearson correlation coefficient between x and y. Extract just the correlation value from the matrix.

### Expected Properties
- Should be a single float value
- Value should be 1.0 (perfect positive correlation)

In [None]:
# Given data
x = np.array([1, 2, 3, 4, 5])
y = np.array([2, 4, 6, 8, 10])

# Your solution:
corr = None

In [None]:
# Verification
check.is_not_none(corr, "P13: Value check")
check.is_true(np.isclose(corr, 1.0), "P13: Correlation value", "Should be 1.0")

---
## Problem 14: Covariance Matrix

### Difficulty: Hard

### Concept
The covariance matrix shows how variables vary together. For two variables, it's a 2x2 matrix where diagonal elements are variances and off-diagonal elements are covariances.

### Matrix Structure
```
[[var(x),    cov(x,y)],
 [cov(y,x),  var(y)  ]]
```

### Interpretation
- Positive covariance: Variables increase together
- Negative covariance: One increases as other decreases
- Covariance magnitude shows strength of relationship

### Syntax
```python
np.cov(x, y)      # Covariance matrix
np.cov(x, y)[0, 1]  # Extract covariance value
```

### Example
```python
x = np.array([1, 2, 3, 4, 5])
y = np.array([5, 4, 3, 2, 1])  # Opposite direction
cov_matrix = np.cov(x, y)
cov_matrix[0, 1]  # Negative value
```

### Task
Calculate the covariance matrix of x and y (perfectly negatively correlated).

### Expected Properties
- Should be a 2x2 array
- Off-diagonal element should be -2.5 (negative covariance)

In [None]:
# Given data
x = np.array([1, 2, 3, 4, 5])
y = np.array([5, 4, 3, 2, 1])

# Your solution:
cov_matrix = None

In [None]:
# Verification
check.is_type(cov_matrix, np.ndarray, "P14: Type check")
check.has_shape(cov_matrix, (2, 2), "P14: Shape check")
check.is_true(np.isclose(cov_matrix[0, 1], -2.5), "P14: Covariance value", "Should be -2.5")

---
## Problem 15: Eigenvalues and Eigenvectors

### Difficulty: Hard

### Concept
For a matrix A, an eigenvector v and its eigenvalue λ satisfy: A×v = λ×v. Eigenvectors point in directions that are only scaled (not rotated) by the matrix. Eigenvalues tell how much scaling occurs.

### Applications
- Principal Component Analysis (PCA)
- Stability analysis in differential equations
- Google's PageRank algorithm
- Quantum mechanics

### Properties
- Sum of eigenvalues = trace (sum of diagonal)
- Product of eigenvalues = determinant

### Syntax
```python
eigenvalues, eigenvectors = np.linalg.eig(A)
```

### Example
```python
A = np.array([[2, 0], [0, 3]])
eigenvalues, eigenvectors = np.linalg.eig(A)
# eigenvalues: [2, 3]
# eigenvectors: columns are the eigenvectors
```

### Task
Calculate the eigenvalues of the given 2x2 matrix.

### Expected Properties
- Should be an array of length 2
- Sum of eigenvalues should be 7 (trace of matrix)

In [None]:
# Given data
A = np.array([[4, 2], [1, 3]])
print("Matrix:")
print(A)
print("Trace (sum of diagonal):", np.trace(A))

# Your solution:
eigenvalues = None

In [None]:
# Verification
check.is_type(eigenvalues, np.ndarray, "P15: Type check")
check.has_length(eigenvalues, 2, "P15: Length check")
check.is_true(np.isclose(np.sum(eigenvalues), 7), "P15: Sum of eigenvalues", "Should equal trace (7)")

---
## Summary

Run the cell below to see your overall progress!

In [None]:
check.summary()

---
## Key Takeaways

### Statistical Functions
| Function | Purpose | Example |
|----------|---------|----------|
| `np.mean()` | Average | `arr.mean()` |
| `np.median()` | Middle value | `np.median(arr)` |
| `np.std()` | Standard deviation | `arr.std()` |
| `np.var()` | Variance | `arr.var()` |
| `np.percentile()` | Percentile value | `np.percentile(arr, 75)` |
| `np.max()` / `np.min()` | Maximum/minimum | `arr.max()` |
| `np.argmax()` / `np.argmin()` | Index of max/min | `arr.argmax()` |

### Array Manipulation
| Function | Purpose | Example |
|----------|---------|----------|
| `np.sort()` | Sort array | `np.sort(arr)` |
| `np.argsort()` | Sorting indices | `np.argsort(arr)` |
| `np.unique()` | Unique values | `np.unique(arr)` |

### Random Number Generation
| Function | Purpose | Example |
|----------|---------|----------|
| `np.random.rand()` | Uniform [0,1) | `np.random.rand(5)` |
| `np.random.randint()` | Random integers | `np.random.randint(1, 10, 5)` |
| `np.random.normal()` | Normal distribution | `np.random.normal(0, 1, 100)` |
| `np.random.seed()` | Set random seed | `np.random.seed(42)` |

### Linear Algebra
| Function | Purpose | Example |
|----------|---------|----------|
| `arr.T` | Transpose | `matrix.T` |
| `np.linalg.inv()` | Matrix inverse | `np.linalg.inv(A)` |
| `np.linalg.det()` | Determinant | `np.linalg.det(A)` |
| `np.linalg.eig()` | Eigenvalues/vectors | `np.linalg.eig(A)` |
| `np.corrcoef()` | Correlation matrix | `np.corrcoef(x, y)` |
| `np.cov()` | Covariance matrix | `np.cov(x, y)` |

### Important Concepts
- **Standard Deviation**: Measures spread of data
- **Median**: Resistant to outliers (unlike mean)
- **Percentiles**: Useful for understanding data distribution
- **Correlation**: Measures linear relationship (-1 to +1)
- **Eigenvalues**: Fundamental in dimensionality reduction and PCA

### Congratulations!
You've completed the NumPy module! You now have a solid foundation in:
- Array creation and manipulation
- Indexing and slicing
- Mathematical operations and broadcasting
- Statistical analysis
- Linear algebra operations

### Next Steps
Continue to the **Pandas module** to learn data manipulation with DataFrames!