# Day 9: NumPy and Pandas Practice Exercises

This notebook contains exercises to practice NumPy array operations and Pandas data analysis. Work through each exercise in order, as they build on concepts from the learning materials.

**Instructions:**
- Complete each exercise in the provided code cells
- Run your code to verify it works correctly
- Compare your output with the expected result shown
- Don't hesitate to experiment and try different approaches

---

## Part 1: NumPy Exercises

These exercises cover array creation, indexing, slicing, manipulation, and operations.

In [None]:
# Import NumPy
import numpy as np

---

### Exercise 1: Create a Checkerboard Pattern

**Goal:** Create an 8Ã—8 array with a checkerboard pattern (alternating 0s and 1s).

**Expected output:**
```
[[0 1 0 1 0 1 0 1]
 [1 0 1 0 1 0 1 0]
 [0 1 0 1 0 1 0 1]
 [1 0 1 0 1 0 1 0]
 [0 1 0 1 0 1 0 1]
 [1 0 1 0 1 0 1 0]
 [0 1 0 1 0 1 0 1]
 [1 0 1 0 1 0 1 0]]
```

**Hint:** Think about how even and odd positions differ. Use slicing with step sizes.

In [None]:
# Your code here

---

### Exercise 2: Create Concentric Squares

**Goal:** Create a 7Ã—7 array with concentric squares of increasing values.

**Expected output:**
```
[[0 0 0 0 0 0 0]
 [0 1 1 1 1 1 0]
 [0 1 2 2 2 1 0]
 [0 1 2 3 2 1 0]
 [0 1 2 2 2 1 0]
 [0 1 1 1 1 1 0]
 [0 0 0 0 0 0 0]]
```

**Hint:** Start with zeros, then use slicing to fill inner squares with increasing values.

In [None]:
# Your code here

---

### Exercise 3: Create a Diagonal Pattern

**Goal:** Create a 6Ã—6 array where:
- Main diagonal has values 5
- Diagonals one step away have values 3
- All other positions have values 1

**Expected output:**
```
[[5 3 1 1 1 1]
 [3 5 3 1 1 1]
 [1 3 5 3 1 1]
 [1 1 3 5 3 1]
 [1 1 1 3 5 3]
 [1 1 1 1 3 5]]
```

**Hint:** Use `np.full()` to create base array, then `np.fill_diagonal()` or indexing to set diagonal values.

In [None]:
# Your code here

---

### Exercise 4: Advanced Indexing and Slicing

**Given array:**
```python
data = np.array([
    [10, 20, 30, 40],
    [50, 60, 70, 80],
    [90, 100, 110, 120],
    [130, 140, 150, 160]
])
```

**Tasks:**
1. Extract the center 2Ã—2 subarray (should be `[[60, 70], [100, 110]]`)
2. Extract all corner values (should be `[10, 40, 130, 160]`)
3. Extract every other row and every other column (should be `[[10, 30], [90, 110]]`)
4. Replace all values greater than 100 with 999

In [None]:
# Given array
data = np.array([
    [10, 20, 30, 40],
    [50, 60, 70, 80],
    [90, 100, 110, 120],
    [130, 140, 150, 160]
])

# Task 1: Extract center 2Ã—2
center = None  # Your code here
print("Center 2Ã—2:")
print(center)
print()

# Task 2: Extract corners
corners = None  # Your code here
print("Corners:", corners)
print()

# Task 3: Every other row and column
sparse = None  # Your code here
print("Every other row/column:")
print(sparse)
print()

# Task 4: Replace values > 100 with 999
data_modified = data.copy()
# Your code here
print("After replacement:")
print(data_modified)

---

### Exercise 5: Array Manipulation

**Goal:** Practice reshaping, stacking, and copying.

**Tasks:**
1. Create a 1D array with values 1 to 12
2. Reshape it into a 3Ã—4 array
3. Reshape it into a 4Ã—3 array
4. Create two 2Ã—3 arrays with random integers (0-9)
5. Stack them vertically (vstack) to create a 4Ã—3 array
6. Stack them horizontally (hstack) to create a 2Ã—6 array

In [None]:
# Task 1: Create 1D array
arr = None  # Your code here
print("Original array:", arr)
print()

# Task 2: Reshape to 3Ã—4
arr_3x4 = None  # Your code here
print("Reshaped to 3Ã—4:")
print(arr_3x4)
print()

# Task 3: Reshape to 4Ã—3
arr_4x3 = None  # Your code here
print("Reshaped to 4Ã—3:")
print(arr_4x3)
print()

# Task 4: Create two 2Ã—3 random arrays
arr_a = None  # Your code here
arr_b = None  # Your code here
print("Array A:")
print(arr_a)
print("Array B:")
print(arr_b)
print()

# Task 5: Vertical stack
vstacked = None  # Your code here
print("Vertically stacked (4Ã—3):")
print(vstacked)
print()

# Task 6: Horizontal stack
hstacked = None  # Your code here
print("Horizontally stacked (2Ã—6):")
print(hstacked)

---

### Exercise 6: Mathematical Operations

**Given arrays:**
```python
prices = np.array([100, 250, 75, 180, 320])
quantities = np.array([3, 2, 5, 1, 4])
```

**Tasks:**
1. Calculate total revenue for each item (price Ã— quantity)
2. Apply a 15% discount to all prices (multiply by 0.85)
3. Find the total revenue across all items
4. Calculate the average price per item
5. Find the item with the maximum revenue
6. Add a $10 shipping fee to all prices using broadcasting

In [None]:
# Given arrays
prices = np.array([100, 250, 75, 180, 320])
quantities = np.array([3, 2, 5, 1, 4])

# Your code here

---

### Exercise 7: Statistical Analysis

**Given:** A 2D array representing test scores (rows = students, columns = subjects)
```python
scores = np.array([
    [85, 92, 78, 88],  # Student 1
    [90, 85, 95, 87],  # Student 2
    [78, 88, 82, 91],  # Student 3
    [92, 79, 88, 84],  # Student 4
    [88, 91, 86, 89]   # Student 5
])
```

**Tasks:**
1. Calculate the average score for each student (axis=1)
2. Calculate the average score for each subject (axis=0)
3. Find the highest score in the entire array
4. Find which subject has the highest average
5. Find which student has the highest average
6. Calculate the standard deviation for each subject

In [None]:
# Given array
scores = np.array([
    [85, 92, 78, 88],  # Student 1
    [90, 85, 95, 87],  # Student 2
    [78, 88, 82, 91],  # Student 3
    [92, 79, 88, 84],  # Student 4
    [88, 91, 86, 89]   # Student 5
])

# Your code here

---

## Part 2: Pandas Exercises

These exercises cover DataFrame creation, data manipulation, filtering, and analysis.

In [None]:
# Import libraries
import pandas as pd
import numpy as np
import seaborn as sns
import matplotlib.pyplot as plt
from pathlib import Path

sns.set_theme(style="whitegrid")

---

### Exercise 8: Create and Explore a DataFrame

**Goal:** Create a DataFrame from scratch and perform basic exploration.

**Given data:**
```python
employee_data = {
    'name': ['Alice', 'Bob', 'Charlie', 'David', 'Eve', 'Frank'],
    'department': ['Engineering', 'Sales', 'Engineering', 'HR', 'Sales', 'Engineering'],
    'salary': [95000, 65000, 88000, 72000, 70000, 91000],
    'years_experience': [5, 3, 4, 6, 3, 7],
    'performance_score': [8.5, 7.2, 9.1, 7.8, 8.0, 8.8]
}
```

**Tasks:**
1. Create a DataFrame from the dictionary
2. Display the first 3 rows
3. Show the shape of the DataFrame
4. Display DataFrame info (data types, non-null counts)
5. Get statistical summary of numeric columns
6. Show value counts for the 'department' column

In [None]:
# Given data
employee_data = {
    'name': ['Alice', 'Bob', 'Charlie', 'David', 'Eve', 'Frank'],
    'department': ['Engineering', 'Sales', 'Engineering', 'HR', 'Sales', 'Engineering'],
    'salary': [95000, 65000, 88000, 72000, 70000, 91000],
    'years_experience': [5, 3, 4, 6, 3, 7],
    'performance_score': [8.5, 7.2, 9.1, 7.8, 8.0, 8.8]
}

# Your code here

---

### Exercise 9: Filtering and Selection

**Goal:** Practice filtering data using boolean indexing and the query method.

**Using the employee DataFrame from Exercise 8:**

**Tasks:**
1. Find all employees in the Engineering department
2. Find employees with salary greater than 80,000
3. Find employees with more than 4 years of experience AND performance score above 8.0
4. Use the `.query()` method to find Sales department employees with salary < 75,000
5. Select only the 'name' and 'salary' columns for all employees

In [None]:
# Your code here

---

### Exercise 10: Create New Columns

**Goal:** Practice feature engineering by creating derived columns.

**Using the employee DataFrame:**

**Tasks:**
1. Create a 'salary_per_year' column (salary Ã· years_experience)
2. Create a 'high_performer' column (True if performance_score >= 8.5, False otherwise)
3. Create a 'salary_category' column:
   - 'Low' if salary < 70,000
   - 'Medium' if 70,000 <= salary < 90,000
   - 'High' if salary >= 90,000
4. Display the updated DataFrame

In [None]:
# Your code here

---

### Exercise 11: Grouping and Aggregation

**Goal:** Practice groupby operations and aggregations.

**Using the employee DataFrame:**

**Tasks:**
1. Calculate the average salary by department
2. Find the total number of employees in each department
3. Calculate multiple statistics by department:
   - Average salary
   - Average performance score
   - Average years of experience
   - Count of employees
4. Create a bar plot showing average salary by department

In [None]:
# Your code here

---

### Exercises 12-15: Complete Data Analysis Workflow

**Goal:** Apply everything you've learned to analyze the hardware testing dataset.

You'll work through the same workflow from the Pandas learning material:
1. Load the data
2. Explore the data
3. Transform and create new features
4. Filter and analyze
5. Create visualizations
6. Aggregate and draw insights

**Dataset:** `hw_measurements.csv` (should be in the root directory)

Work through each exercise step-by-step.

---

### Exercise 12: Load and Explore

**Tasks:**
1. Load the CSV file into a DataFrame
2. Display the first 5 rows
3. Check the shape (rows and columns)
4. Display DataFrame info
5. Get statistical summary
6. Check for missing values
7. Show value counts for 'result' column

In [None]:
# Your code here

---

### Exercise 13: Transform

**Tasks:**
1. Convert 'timestamp' column to datetime type
2. Create a 'power_w' column (supply_v Ã— current_a)
3. Create an 'is_fail' column (1 if result=='FAIL', 0 otherwise)
4. Display the first few rows with the new columns

In [None]:
# Your code here

---

### Exercise 14: Filtering and Visualization

**Tasks:**
1. Filter to find all FAIL results
2. Find measurements where temperature >= 40Â°C
3. Create a line plot: Current vs Temperature (colored by board_rev)
4. Create a scatter plot: SNR vs Current (colored by result)
5. Create temperature bands using `pd.cut()` with bins [15, 30, 45, 60, 75] and appropriate labels
6. Create a box plot: Power distribution by temperature band and board revision

In [None]:
# Your code here

---

### Exercise 15: Aggregation and Insights

**Tasks:**
1. Group by 'board_rev' and 'temp_c', then aggregate:
   - Mean power (mean of power_w)
   - Mean SNR (mean of snr_db)
   - Failure rate (mean of is_fail)
   - Count of measurements
2. Create a line plot: Mean power vs Temperature by board revision (using aggregated data)
3. Create a line plot: Failure rate vs Temperature by board revision (using aggregated data)
4. Calculate correlation matrix for numeric columns
5. Create a heatmap of the correlation matrix
6. **Bonus:** Based on your analysis, answer these questions:
   - At what temperature do failures start to occur?
   - Which board revision performs better?
   - What is the relationship between temperature and SNR?

In [None]:
# Your code here

---

## Congratulations! ðŸŽ‰

You've completed the Day 9 exercises covering:
- NumPy array creation, manipulation, and operations
- Pandas DataFrame operations and analysis
- Complete data analysis workflow
- Data visualization techniques

**Next steps:**
- Review any exercises where you struggled
- Compare your solutions with the solution notebook
- Try variations of the exercises with different data
- Apply these skills to your own datasets