# NumPy Operations and Statistics

Learn how to perform mathematical operations and statistical analysis efficiently with NumPy arrays.

## What You'll Learn
- Element-wise arithmetic operations
- Array-to-array mathematical operations
- Trigonometric and mathematical functions
- Finding minimum and maximum values
- Calculating sums and other statistics
- Understanding axis-based operations

---

In [None]:
import numpy as np

print(f"NumPy version: {np.__version__}")

---

## The Problem: Loops Are Slow and Verbose

Without NumPy, mathematical operations require explicit loops:

In [None]:
# ❌ Python list approach - slow and verbose
prices = [10.0, 20.0, 30.0, 40.0]
discounted = []
for price in prices:
    discounted.append(price * 0.9)
print(f"Discounted (list): {discounted}")
print()

# ✅ NumPy approach - fast and concise
prices_arr = np.array([10.0, 20.0, 30.0, 40.0])
discounted_arr = prices_arr * 0.9
print(f"Discounted (NumPy): {discounted_arr}")

---

## Element-Wise Arithmetic Operations

NumPy performs operations on each element automatically (vectorization).

### Example 1: Basic Arithmetic with Scalars

In [None]:
temperatures = np.array([20, 25, 30, 22, 28])
print(f"Original temperatures (°C): {temperatures}")
print()

# Addition
adjusted_up = temperatures + 5
print(f"Add 5°C: {adjusted_up}")

# Subtraction
adjusted_down = temperatures - 3
print(f"Subtract 3°C: {adjusted_down}")

# Multiplication
doubled = temperatures * 2
print(f"Double: {doubled}")

# Division
halved = temperatures / 2
print(f"Halve: {halved}")

# Power
squared = temperatures ** 2
print(f"Squared: {squared}")

**What happens:**
- Each operation is applied to every element
- No loops needed - NumPy handles it internally
- Much faster than Python loops for large arrays

### Example 2: Array-to-Array Operations

In [None]:
# Sales data from two weeks
week1_sales = np.array([100, 120, 110, 130, 125])
week2_sales = np.array([105, 115, 125, 135, 130])

print(f"Week 1 sales: {week1_sales}")
print(f"Week 2 sales: {week2_sales}")
print()

# Element-wise addition
total_sales = week1_sales + week2_sales
print(f"Total sales (both weeks): {total_sales}")

# Element-wise difference
growth = week2_sales - week1_sales
print(f"Growth (Week 2 - Week 1): {growth}")

# Element-wise percentage change
percent_change = ((week2_sales - week1_sales) / week1_sales) * 100
print(f"Percent change: {percent_change}%")

# Element-wise multiplication (not typical, but possible)
product = week1_sales * week2_sales
print(f"Element-wise product: {product}")

**Key Point:**
- Arrays must have compatible shapes (same size or broadcastable)
- Operations are performed element-by-element
- Index 0 of array1 operates with index 0 of array2, etc.

### Example 3: Operations on 2D Arrays

In [None]:
# Product prices in two stores
store1_prices = np.array([
    [10.99, 15.99, 20.99],
    [5.99, 8.99, 12.99]
])

store2_prices = np.array([
    [11.49, 15.49, 21.99],
    [5.49, 9.49, 12.49]
])

print("Store 1 prices:")
print(store1_prices)
print()
print("Store 2 prices:")
print(store2_prices)
print()

# Average prices
average_prices = (store1_prices + store2_prices) / 2
print("Average prices:")
print(average_prices)
print()

# Price difference
price_diff = store2_prices - store1_prices
print("Price difference (Store 2 - Store 1):")
print(price_diff)

---

## Mathematical Functions

### Example 4: Trigonometric Functions

In [None]:
# Angles in radians
angles_rad = np.array([0, np.pi/6, np.pi/4, np.pi/3, np.pi/2])
print(f"Angles (radians): {angles_rad}")
print()

# Sine
sine_values = np.sin(angles_rad)
print(f"sin(angles): {sine_values}")

# Cosine
cosine_values = np.cos(angles_rad)
print(f"cos(angles): {cosine_values}")

# Tangent
tangent_values = np.tan(angles_rad)
print(f"tan(angles): {tangent_values}")
print()

# Convert degrees to radians
angles_deg = np.array([0, 30, 45, 60, 90])
angles_rad_converted = np.radians(angles_deg)
print(f"Angles in degrees: {angles_deg}")
print(f"Converted to radians: {angles_rad_converted}")
print(f"Sine values: {np.sin(angles_rad_converted)}")

### Example 5: Other Mathematical Functions

In [None]:
values = np.array([1, 4, 9, 16, 25])
print(f"Original values: {values}")
print()

# Square root
sqrt_values = np.sqrt(values)
print(f"Square roots: {sqrt_values}")

# Exponential (e^x)
exp_values = np.exp(np.array([0, 1, 2]))
print(f"Exponential (e^[0,1,2]): {exp_values}")

# Natural logarithm
log_values = np.log(values)
print(f"Natural log: {log_values}")

# Base-10 logarithm
log10_values = np.log10(values)
print(f"Base-10 log: {log10_values}")
print()

# Absolute value
negative_values = np.array([-5, -2, 3, -7, 1])
abs_values = np.abs(negative_values)
print(f"Original: {negative_values}")
print(f"Absolute: {abs_values}")
print()

# Rounding
decimals = np.array([1.234, 5.678, 9.012])
print(f"Original: {decimals}")
print(f"Rounded: {np.round(decimals, 2)}")
print(f"Floor: {np.floor(decimals)}")
print(f"Ceiling: {np.ceil(decimals)}")

---

## Statistical Operations

### Example 6: Min and Max Values

In [None]:
daily_temps = np.array([22, 25, 19, 28, 24, 21, 26])
print(f"Daily temperatures: {daily_temps}")
print()

# Find min and max
min_temp = daily_temps.min()
max_temp = daily_temps.max()
print(f"Minimum temperature: {min_temp}°C")
print(f"Maximum temperature: {max_temp}°C")
print()

# Find indices of min and max
min_index = daily_temps.argmin()
max_index = daily_temps.argmax()
print(f"Min occurred on day {min_index} (index {min_index})")
print(f"Max occurred on day {max_index} (index {max_index})")

### Example 7: Sum Operations

In [None]:
sales = np.array([100, 150, 120, 180, 200, 175, 160])
print(f"Weekly sales: {sales}")
print()

# Total sum
total = sales.sum()
print(f"Total sales: ${total}")

# Cumulative sum
cumsum = sales.cumsum()
print(f"Cumulative sales: {cumsum}")
print()

# Product of all elements
small_numbers = np.array([2, 3, 4])
product = small_numbers.prod()
print(f"Product of {small_numbers}: {product}")

### Example 8: Mean, Median, and Standard Deviation

In [None]:
exam_scores = np.array([75, 82, 90, 68, 95, 78, 85, 92, 70, 88])
print(f"Exam scores: {exam_scores}")
print()

# Mean (average)
mean = exam_scores.mean()
print(f"Mean score: {mean:.2f}")

# Median (middle value)
median = np.median(exam_scores)
print(f"Median score: {median:.2f}")

# Standard deviation (spread)
std = exam_scores.std()
print(f"Standard deviation: {std:.2f}")

# Variance
variance = exam_scores.var()
print(f"Variance: {variance:.2f}")
print()

# Percentiles
percentile_25 = np.percentile(exam_scores, 25)
percentile_75 = np.percentile(exam_scores, 75)
print(f"25th percentile: {percentile_25}")
print(f"75th percentile: {percentile_75}")

---

## Axis-Based Operations

For 2D arrays, operations can be performed along rows (axis=0) or columns (axis=1).

### Understanding Axes

In [None]:
# Sample data: 3 students, 4 exams
grades = np.array([
    [85, 90, 78, 92],  # Student 1
    [78, 85, 82, 88],  # Student 2
    [92, 88, 95, 90]   # Student 3
])

print("Grades (3 students × 4 exams):")
print(grades)
print(f"Shape: {grades.shape}")
print()

# Visualization of axes:
# axis=0 goes DOWN (across rows)
# axis=1 goes ACROSS (across columns)
#
#         axis=1 →
#       [85, 90, 78, 92]
# axis=0 [78, 85, 82, 88]
#   ↓    [92, 88, 95, 90]

### Example 9: Sum Along Axes

In [None]:
print("Grades:")
print(grades)
print()

# Sum along axis 0 (down columns) - sum each exam across all students
exam_totals = grades.sum(axis=0)
print(f"Total per exam (axis=0): {exam_totals}")
print("↑ Adds down each column")
print()

# Sum along axis 1 (across rows) - sum each student's all exams
student_totals = grades.sum(axis=1)
print(f"Total per student (axis=1): {student_totals}")
print("↑ Adds across each row")
print()

# Sum all elements (no axis specified)
total_sum = grades.sum()
print(f"Grand total (all grades): {total_sum}")

### Example 10: Mean Along Axes

In [None]:
print("Grades:")
print(grades)
print()

# Average per exam (axis=0)
exam_averages = grades.mean(axis=0)
print(f"Average per exam (axis=0): {exam_averages}")
print("↑ Mean of each column (how all students did on each exam)")
print()

# Average per student (axis=1)
student_averages = grades.mean(axis=1)
print(f"Average per student (axis=1): {student_averages}")
print("↑ Mean of each row (each student's average across all exams)")

### Example 11: Min/Max Along Axes

In [None]:
print("Grades:")
print(grades)
print()

# Lowest score per exam
exam_mins = grades.min(axis=0)
print(f"Lowest score per exam (axis=0): {exam_mins}")

# Highest score per exam
exam_maxs = grades.max(axis=0)
print(f"Highest score per exam (axis=0): {exam_maxs}")
print()

# Lowest score per student
student_mins = grades.min(axis=1)
print(f"Lowest score per student (axis=1): {student_mins}")

# Highest score per student
student_maxs = grades.max(axis=1)
print(f"Highest score per student (axis=1): {student_maxs}")

### Example 12: Real-World Analysis

In [None]:
# Monthly sales for 4 products over 6 months
monthly_sales = np.array([
    [120, 135, 142, 130, 155, 160],  # Product A
    [200, 210, 205, 220, 215, 230],  # Product B
    [80, 85, 90, 88, 95, 100],       # Product C
    [150, 145, 160, 155, 170, 165]   # Product D
])

print("Monthly sales (4 products × 6 months):")
print(monthly_sales)
print()

# Total sales per product (across all months)
product_totals = monthly_sales.sum(axis=1)
print("Total sales per product:")
for i, total in enumerate(product_totals):
    print(f"  Product {chr(65+i)}: ${total}")
print()

# Total sales per month (across all products)
month_totals = monthly_sales.sum(axis=0)
print(f"Total sales per month: {month_totals}")
print()

# Average monthly sales per product
product_averages = monthly_sales.mean(axis=1)
print("Average monthly sales per product:")
for i, avg in enumerate(product_averages):
    print(f"  Product {chr(65+i)}: ${avg:.2f}")
print()

# Best performing month for each product
best_months = monthly_sales.argmax(axis=1)
print("Best month for each product:")
for i, month in enumerate(best_months):
    print(f"  Product {chr(65+i)}: Month {month+1} (${monthly_sales[i, month]})")

**Axis Summary:**
- **axis=0**: Operations go **down** (across rows) → result has fewer rows
- **axis=1**: Operations go **across** (across columns) → result has fewer columns
- **No axis**: Operations on entire array → single value result

**Memory Trick:**
- axis=0 is the **first** dimension (rows)
- axis=1 is the **second** dimension (columns)
- Operation along an axis **collapses** that dimension

---

## Best Practices

### ✅ Do:
- Use vectorized operations instead of loops for better performance
- Specify `axis` parameter for clarity in multi-dimensional operations
- Use appropriate statistical functions (`mean`, `median`, `std`) based on data distribution
- Check array shapes before performing operations
- Use `np.round()` for cleaner output when displaying floats

### ❌ Don't:
- Use Python loops when NumPy operations are available
- Forget to handle edge cases (e.g., division by zero)
- Apply operations to incompatible array shapes
- Use mean when data has outliers (prefer median)
- Perform operations that modify original arrays unintentionally

### Performance Comparison: Loops vs. Vectorization

In [None]:
import time

# Create large array
large_array = np.random.rand(1_000_000)

# ❌ Loop approach
start = time.time()
result_loop = []
for x in large_array:
    result_loop.append(x * 2 + 5)
loop_time = time.time() - start

# ✅ Vectorized approach
start = time.time()
result_vectorized = large_array * 2 + 5
vector_time = time.time() - start

print(f"Loop time: {loop_time:.4f} seconds")
print(f"Vectorized time: {vector_time:.4f} seconds")
print(f"Speedup: {loop_time/vector_time:.1f}x faster!")

---

## Summary

### Key Concepts:
- **Vectorization**: NumPy applies operations to all elements automatically (no loops needed)
- **Element-wise operations**: Arithmetic works on corresponding elements
- **Mathematical functions**: `sin`, `cos`, `sqrt`, `exp`, `log`, etc.
- **Statistical functions**: `min`, `max`, `mean`, `median`, `std`, `sum`
- **Axis operations**: Specify `axis=0` (rows) or `axis=1` (columns) for direction

### Syntax Reference:

**Arithmetic Operations:**
```python
arr + 5                # Add scalar
arr1 + arr2            # Element-wise addition
arr * 2                # Multiply by scalar
arr ** 2               # Element-wise power
```

**Mathematical Functions:**
```python
np.sin(arr)            # Sine
np.cos(arr)            # Cosine
np.sqrt(arr)           # Square root
np.exp(arr)            # Exponential
np.log(arr)            # Natural logarithm
np.abs(arr)            # Absolute value
np.round(arr, n)       # Round to n decimals
```

**Statistical Operations:**
```python
arr.min()              # Minimum value
arr.max()              # Maximum value
arr.mean()             # Average
np.median(arr)         # Median
arr.std()              # Standard deviation
arr.sum()              # Sum of all elements
```

**Axis-Based Operations:**
```python
arr.sum(axis=0)        # Sum down columns
arr.sum(axis=1)        # Sum across rows
arr.mean(axis=0)       # Average per column
arr.max(axis=1)        # Max per row
```

### Next Steps:
You now have a solid foundation in NumPy! Continue exploring:
- Boolean indexing and fancy indexing
- Broadcasting rules for array operations
- Linear algebra operations (`np.linalg`)
- Reading/writing data with NumPy
- Integration with pandas for data analysis