# Week 2b: NumPy Fundamentals
## ISM 6251: Introduction to Machine Learning

### Learning Objectives
By the end of this notebook, you will be able to:
1. Create and manipulate NumPy arrays
2. Perform array indexing and slicing
3. Use array broadcasting
4. Apply mathematical operations on arrays
5. Understand array shapes and reshaping
6. Generate random numbers with NumPy

## 1. Introduction to NumPy

NumPy (Numerical Python) is the foundation for scientific computing in Python. It provides:
- Efficient array operations
- Mathematical functions
- Tools for linear algebra
- Random number generation

In [None]:
import numpy as np

# Check NumPy version
print(f"NumPy version: {np.__version__}")

## 2. Creating NumPy Arrays

### From Python Lists

In [None]:
# 1D array
arr1d = np.array([1, 2, 3, 4, 5])
print(f"1D array: {arr1d}")
print(f"Shape: {arr1d.shape}")
print(f"Data type: {arr1d.dtype}")

# 2D array
arr2d = np.array([[1, 2, 3],
                  [4, 5, 6],
                  [7, 8, 9]])
print(f"\n2D array:\n{arr2d}")
print(f"Shape: {arr2d.shape}")
print(f"Dimensions: {arr2d.ndim}")

### Using NumPy Functions

In [None]:
# Zeros
zeros = np.zeros((3, 4))
print(f"Zeros:\n{zeros}")

# Ones
ones = np.ones((2, 3))
print(f"\nOnes:\n{ones}")

# Full (filled with specific value)
full = np.full((2, 2), 7)
print(f"\nFull:\n{full}")

# Identity matrix
identity = np.eye(3)
print(f"\nIdentity:\n{identity}")

### Range Arrays

In [None]:
# Arange (similar to Python's range)
arr_range = np.arange(0, 10, 2)
print(f"Arange: {arr_range}")

# Linspace (evenly spaced numbers)
arr_linspace = np.linspace(0, 1, 5)
print(f"Linspace: {arr_linspace}")

# Logspace (logarithmically spaced)
arr_logspace = np.logspace(0, 2, 5)
print(f"Logspace: {arr_logspace}")

## 3. Array Properties and Attributes

In [None]:
arr = np.array([[1, 2, 3, 4],
                [5, 6, 7, 8],
                [9, 10, 11, 12]])

print(f"Array:\n{arr}")
print(f"\nShape: {arr.shape}")
print(f"Size (total elements): {arr.size}")
print(f"Dimensions: {arr.ndim}")
print(f"Data type: {arr.dtype}")
print(f"Item size (bytes): {arr.itemsize}")
print(f"Total bytes: {arr.nbytes}")

## 4. Array Indexing and Slicing

### 1D Array Indexing

In [None]:
arr1d = np.array([10, 20, 30, 40, 50])

print(f"Array: {arr1d}")
print(f"First element: {arr1d[0]}")
print(f"Last element: {arr1d[-1]}")
print(f"Slice [1:4]: {arr1d[1:4]}")
print(f"Every other element: {arr1d[::2]}")
print(f"Reversed: {arr1d[::-1]}")

### 2D Array Indexing

In [None]:
arr2d = np.array([[1, 2, 3],
                  [4, 5, 6],
                  [7, 8, 9]])

print(f"Array:\n{arr2d}")
print(f"\nElement at [1,2]: {arr2d[1, 2]}")
print(f"First row: {arr2d[0]}")
print(f"First column: {arr2d[:, 0]}")
print(f"Sub-array [0:2, 1:3]:\n{arr2d[0:2, 1:3]}")

### Boolean Indexing

In [None]:
arr = np.array([1, 2, 3, 4, 5, 6, 7, 8, 9])

# Boolean mask
mask = arr > 5
print(f"Array: {arr}")
print(f"Mask (arr > 5): {mask}")
print(f"Elements > 5: {arr[mask]}")

# Direct boolean indexing
print(f"Even numbers: {arr[arr % 2 == 0]}")

## 5. Array Operations

### Element-wise Operations

In [None]:
arr1 = np.array([1, 2, 3, 4])
arr2 = np.array([5, 6, 7, 8])

print(f"arr1: {arr1}")
print(f"arr2: {arr2}")
print(f"\nAddition: {arr1 + arr2}")
print(f"Subtraction: {arr2 - arr1}")
print(f"Multiplication: {arr1 * arr2}")
print(f"Division: {arr2 / arr1}")
print(f"Power: {arr1 ** 2}")

### Scalar Operations

In [None]:
arr = np.array([1, 2, 3, 4, 5])

print(f"Original: {arr}")
print(f"Add 10: {arr + 10}")
print(f"Multiply by 2: {arr * 2}")
print(f"Square root: {np.sqrt(arr)}")
print(f"Exponential: {np.exp(arr)}")

## 6. Broadcasting

Broadcasting allows operations between arrays of different shapes.

In [None]:
# 2D array and 1D array
matrix = np.array([[1, 2, 3],
                   [4, 5, 6],
                   [7, 8, 9]])
vector = np.array([1, 0, 1])

print(f"Matrix:\n{matrix}")
print(f"\nVector: {vector}")
print(f"\nMatrix + Vector (broadcasting):\n{matrix + vector}")

# Column vector
col_vector = np.array([[10], [20], [30]])
print(f"\nColumn vector:\n{col_vector}")
print(f"\nMatrix + Column vector:\n{matrix + col_vector}")

## 7. Array Reshaping

In [None]:
# Create a 1D array
arr = np.arange(12)
print(f"Original: {arr}")
print(f"Shape: {arr.shape}")

# Reshape to 2D
reshaped_2d = arr.reshape(3, 4)
print(f"\nReshaped to (3,4):\n{reshaped_2d}")

# Reshape to 3D
reshaped_3d = arr.reshape(2, 2, 3)
print(f"\nReshaped to (2,2,3):\n{reshaped_3d}")

# Flatten back to 1D
flattened = reshaped_2d.flatten()
print(f"\nFlattened: {flattened}")

### Transpose

In [None]:
matrix = np.array([[1, 2, 3],
                   [4, 5, 6]])

print(f"Original:\n{matrix}")
print(f"Shape: {matrix.shape}")

transposed = matrix.T
print(f"\nTransposed:\n{transposed}")
print(f"Shape: {transposed.shape}")

## 8. Mathematical Functions

### Trigonometric Functions

In [None]:
angles = np.array([0, np.pi/4, np.pi/2, np.pi])

print(f"Angles (radians): {angles}")
print(f"Sin: {np.sin(angles)}")
print(f"Cos: {np.cos(angles)}")
print(f"Tan: {np.tan(angles)}")

### Statistical Functions

In [None]:
data = np.array([2, 4, 6, 8, 10, 12, 14, 16, 18, 20])

print(f"Data: {data}")
print(f"\nMean: {np.mean(data)}")
print(f"Median: {np.median(data)}")
print(f"Standard deviation: {np.std(data)}")
print(f"Variance: {np.var(data)}")
print(f"Min: {np.min(data)}")
print(f"Max: {np.max(data)}")
print(f"Sum: {np.sum(data)}")
print(f"Cumulative sum: {np.cumsum(data)}")

### Aggregation Along Axes

In [None]:
matrix = np.array([[1, 2, 3],
                   [4, 5, 6],
                   [7, 8, 9]])

print(f"Matrix:\n{matrix}")
print(f"\nSum of all elements: {np.sum(matrix)}")
print(f"Sum along rows (axis=0): {np.sum(matrix, axis=0)}")
print(f"Sum along columns (axis=1): {np.sum(matrix, axis=1)}")
print(f"\nMean along rows: {np.mean(matrix, axis=0)}")
print(f"Mean along columns: {np.mean(matrix, axis=1)}")

## 9. Array Concatenation and Splitting

### Concatenation

In [None]:
arr1 = np.array([1, 2, 3])
arr2 = np.array([4, 5, 6])

# Concatenate 1D arrays
concat = np.concatenate([arr1, arr2])
print(f"Concatenated: {concat}")

# 2D arrays
matrix1 = np.array([[1, 2], [3, 4]])
matrix2 = np.array([[5, 6], [7, 8]])

# Vertical stack
vstack = np.vstack([matrix1, matrix2])
print(f"\nVertical stack:\n{vstack}")

# Horizontal stack
hstack = np.hstack([matrix1, matrix2])
print(f"\nHorizontal stack:\n{hstack}")

### Splitting

In [None]:
arr = np.arange(12)
print(f"Original: {arr}")

# Split into 3 equal parts
splits = np.split(arr, 3)
for i, split in enumerate(splits):
    print(f"Split {i+1}: {split}")

# 2D splitting
matrix = np.arange(16).reshape(4, 4)
print(f"\nMatrix:\n{matrix}")

# Split horizontally
hsplits = np.hsplit(matrix, 2)
print(f"\nHorizontal splits:")
for i, split in enumerate(hsplits):
    print(f"Split {i+1}:\n{split}")

## 10. Random Number Generation

### Basic Random Numbers

In [None]:
# Set seed for reproducibility
np.random.seed(42)

# Random floats between 0 and 1
rand_uniform = np.random.random(5)
print(f"Uniform [0,1): {rand_uniform}")

# Random integers
rand_int = np.random.randint(1, 11, size=5)
print(f"Random integers [1,10]: {rand_int}")

# Random choice
choices = np.random.choice(['A', 'B', 'C', 'D'], size=10)
print(f"Random choices: {choices}")

### Random Distributions

In [None]:
# Normal distribution
normal = np.random.normal(loc=0, scale=1, size=5)
print(f"Normal (mean=0, std=1): {normal}")

# Uniform distribution
uniform = np.random.uniform(low=0, high=10, size=5)
print(f"Uniform [0,10): {uniform}")

# Binomial distribution
binomial = np.random.binomial(n=10, p=0.5, size=5)
print(f"Binomial (n=10, p=0.5): {binomial}")

# 2D random array
rand_2d = np.random.randn(3, 3)
print(f"\n2D Normal distribution:\n{rand_2d}")

## 11. Practical Examples

### Example 1: Data Normalization

In [None]:
# Generate sample data
data = np.random.randn(100) * 50 + 100

# Min-max normalization
data_minmax = (data - data.min()) / (data.max() - data.min())

# Z-score normalization
data_zscore = (data - data.mean()) / data.std()

print(f"Original - Mean: {data.mean():.2f}, Std: {data.std():.2f}")
print(f"Min-Max - Min: {data_minmax.min():.2f}, Max: {data_minmax.max():.2f}")
print(f"Z-score - Mean: {data_zscore.mean():.2e}, Std: {data_zscore.std():.2f}")

### Example 2: Distance Calculation

In [None]:
# Two points in 3D space
point1 = np.array([1, 2, 3])
point2 = np.array([4, 6, 8])

# Euclidean distance
euclidean = np.sqrt(np.sum((point2 - point1)**2))
# Or using np.linalg.norm
euclidean2 = np.linalg.norm(point2 - point1)

# Manhattan distance
manhattan = np.sum(np.abs(point2 - point1))

print(f"Point 1: {point1}")
print(f"Point 2: {point2}")
print(f"Euclidean distance: {euclidean:.2f}")
print(f"Manhattan distance: {manhattan}")

### Example 3: Moving Average

In [None]:
# Generate time series data
time_series = np.cumsum(np.random.randn(20))

# Calculate 3-point moving average
window = 3
moving_avg = np.convolve(time_series, np.ones(window)/window, mode='valid')

print(f"Original data: {time_series[:10]}")
print(f"Moving average: {moving_avg[:8]}")

## 12. Performance Comparison: Lists vs NumPy

In [None]:
import time

# Create large lists and arrays
size = 1000000
list1 = list(range(size))
list2 = list(range(size))
arr1 = np.arange(size)
arr2 = np.arange(size)

# List addition
start = time.time()
result_list = [a + b for a, b in zip(list1, list2)]
list_time = time.time() - start

# NumPy addition
start = time.time()
result_arr = arr1 + arr2
numpy_time = time.time() - start

print(f"List operation time: {list_time:.4f} seconds")
print(f"NumPy operation time: {numpy_time:.4f} seconds")
print(f"NumPy is {list_time/numpy_time:.1f}x faster")

## Practice Exercises

### Exercise 1: Array Manipulation

In [None]:
# Create a 5x5 matrix with values 1-25
matrix = np.arange(1, 26).reshape(5, 5)
print(f"Matrix:\n{matrix}")

# Extract the diagonal
diagonal = np.diag(matrix)
print(f"\nDiagonal: {diagonal}")

# Get elements greater than 15
greater_15 = matrix[matrix > 15]
print(f"\nElements > 15: {greater_15}")

# Replace all odd numbers with -1
matrix_copy = matrix.copy()
matrix_copy[matrix_copy % 2 == 1] = -1
print(f"\nOdd replaced with -1:\n{matrix_copy}")

### Exercise 2: Statistical Analysis

In [None]:
# Generate random scores for 3 subjects and 10 students
np.random.seed(42)
scores = np.random.randint(60, 101, size=(10, 3))
subjects = ['Math', 'Science', 'English']

print("Student Scores:")
print(f"{'Student':<10} {subjects[0]:<10} {subjects[1]:<10} {subjects[2]:<10}")
for i, row in enumerate(scores):
    print(f"Student {i+1:<3} {row[0]:<10} {row[1]:<10} {row[2]:<10}")

# Calculate statistics
print(f"\nSubject averages: {np.mean(scores, axis=0)}")
print(f"Student averages: {np.mean(scores, axis=1)}")
print(f"Highest score: {np.max(scores)}")
print(f"Lowest score: {np.min(scores)}")

# Find best student
student_avgs = np.mean(scores, axis=1)
best_student = np.argmax(student_avgs) + 1
print(f"\nBest student: Student {best_student} with average {student_avgs[best_student-1]:.1f}")

## Summary

In this notebook, we covered:

1. **Array Creation**: Various methods to create NumPy arrays
2. **Indexing and Slicing**: Accessing and modifying array elements
3. **Array Operations**: Element-wise and scalar operations
4. **Broadcasting**: Operations on arrays of different shapes
5. **Reshaping**: Changing array dimensions
6. **Mathematical Functions**: Trigonometric and statistical functions
7. **Aggregation**: Computing statistics along axes
8. **Concatenation and Splitting**: Combining and dividing arrays
9. **Random Numbers**: Generating random data
10. **Performance**: NumPy's efficiency advantage

NumPy is essential for:
- Efficient numerical computations
- Data preprocessing in machine learning
- Scientific computing
- Image and signal processing