# NumPy and Pandas - Exercises

This notebook contains exercises from Lecture 2: NumPy and Pandas.
Try to solve them yourself before looking at the solutions!

## Part 1: NumPy

### ✏️ Challenge: Create Identity Matrix

**Problem:** Create a 4x4 identity matrix (diagonal 1s, rest 0s)

Expected output:
```
[[1. 0. 0. 0.]
 [0. 1. 0. 0.]
 [0. 0. 1. 0.]
 [0. 0. 0. 1.]]
```

In [None]:
# Your solution here
import numpy as np

# Create a 4x4 identity matrix

### ✏️ Challenge: Temperature Conversion

**Problem:** Convert an array of Celsius temperatures to Fahrenheit.

Formula: F = (C × 9/5) + 32

Input: celsius = [0, 10, 20, 30, 37, 100]

Expected output: [32.0, 50.0, 68.0, 86.0, 98.6, 212.0]

In [None]:
# Your solution here
celsius = np.array([0, 10, 20, 30, 37, 100])

# Convert to Fahrenheit using NumPy operations

### ✏️ Challenge: Filter Valid Exam Scores

**Problem:** From a list of exam scores, filter out invalid ones.

Valid scores are between 0 and 100 (inclusive).

Input: scores = [95, -5, 102, 88, 150, 76, 0, 100]

Expected output: [95, 88, 76, 0, 100]

In [None]:
# Your solution here
all_scores = np.array([95, -5, 102, 88, 150, 76, 0, 100])

# Use boolean indexing to filter valid scores

### 📊 Real-World Case: Student Grade Analysis

**Problem:** Analyze student test scores to find:
1. Class average
2. Students who scored above average
3. Highest and lowest scores

Input: Test scores = [85, 92, 78, 90, 88]

Expected output:
- Average: 86.60
- Above average: [92, 90, 88]
- Highest: 92, Lowest: 78

In [None]:
# Your solution here
scores = np.array([85, 92, 78, 90, 88])

# Calculate statistics
# Find above average scores
# Find highest and lowest

## Part 2: Pandas

### ✏️ Challenge: Calculate Discounted Prices

**Problem:** Apply a 20% discount to fruits that cost more than $1.00

Input: prices = {apple: 1.5, banana: 0.8, orange: 1.2, grape: 2.0}

Expected output:
- apple: $1.20 (discounted)
- banana: $0.80 (no discount)
- orange: $0.96 (discounted)
- grape: $1.60 (discounted)

In [None]:
# Your solution here
import pandas as pd

prices = pd.Series([1.5, 0.8, 1.2, 2.0], 
                  index=['apple', 'banana', 'orange', 'grape'])

# Apply 20% discount to items over $1.00

### 📊 Real-World Case: Employee Salary Analysis

**Problem:** Find employees who:
1. Are younger than 30 years old
2. Have a salary greater than $75,000
3. Display only their names and salaries

Input: Employee DataFrame with Name, Age, City, Salary columns

In [None]:
# Your solution here
data = {
    'Name': ['Alice', 'Bob', 'Charlie', 'Diana'],
    'Age': [25, 30, 35, 28],
    'City': ['New York', 'Paris', 'London', 'Tokyo'],
    'Salary': [70000, 80000, 75000, 85000]
}
df = pd.DataFrame(data)

# Filter employees
# Select only Name and Salary columns

### ✏️ Challenge: Identify Honor Students

**Problem:** Find students who qualify for the honor roll.

Criteria: Average score >= 90 AND all individual scores >= 85

Input: Students DataFrame with Math, Science, English scores

In [None]:
# Your solution here
students = pd.DataFrame({
    'Name': ['Alice', 'Bob', 'Charlie', 'Diana', 'Eve'],
    'Math': [85, 92, 78, 95, 88],
    'Science': [90, 85, 82, 98, 91],
    'English': [88, 79, 85, 92, 87]
})

# Calculate average
# Find honor students based on criteria

### 📊 Real-World Case: Regional Sales Performance

**Problem:** Calculate the total quantity sold in the West region
and compare it with the East region to identify which is performing better.

Input: Sales DataFrame with Product, Region, Sales, Quantity columns

In [None]:
# Your solution here
sales = pd.DataFrame({
    'Product': ['Laptop', 'Phone', 'Tablet', 'Laptop', 'Phone', 'Tablet', 'Laptop', 'Phone'],
    'Region': ['East', 'East', 'East', 'West', 'West', 'West', 'East', 'West'],
    'Sales': [1200, 800, 500, 1100, 850, 480, 1250, 790],
    'Quantity': [3, 5, 2, 2, 6, 3, 4, 4]
})

# Calculate total quantity for each region
# Compare and identify better performing region

## Final Capstone Project

### 🎓 Final Capstone Project: Sales Data Analysis

**Problem:**
You work as a data analyst for a retail company. Your task is to analyze
quarterly sales data and provide insights to management.

**Dataset:**
Sales data with: Date, Product, Category, Region, Sales Amount, Quantity

**Requirements:**
1. Load and clean the data
2. Calculate total revenue per product category
3. Find top 3 products by revenue
4. Calculate average sales by region
5. Identify trends (which month had highest sales)
6. Create summary report

In [None]:
# Your solution here
import numpy as np
import pandas as pd

# Step 1: Create sample data
np.random.seed(42)

dates = pd.date_range('2024-01-01', periods=100, freq='D')
products = ['Laptop', 'Phone', 'Tablet', 'Monitor', 'Keyboard']
categories = ['Electronics', 'Electronics', 'Electronics', 'Accessories', 'Accessories']
regions = ['North', 'South', 'East', 'West']

sales_data = pd.DataFrame({
    'Date': np.random.choice(dates, 100),
    'Product': np.random.choice(products, 100),
    'Region': np.random.choice(regions, 100),
    'Sales': np.random.randint(100, 2000, 100),
    'Quantity': np.random.randint(1, 10, 100)
})

# Add category based on product
category_map = dict(zip(products, categories))
sales_data['Category'] = sales_data['Product'].map(category_map)

# Step 2: Data Overview
# Step 3: Total revenue per category
# Step 4: Top 3 products by revenue
# Step 5: Average sales by region
# Step 6: Monthly trend analysis
# Step 7: Summary Statistics
# Step 8: Key Insights

**✏️ Your Turn:**

Try to answer these questions with the data:
1. What's the total quantity sold for 'Laptop'?
2. Which region has the most transactions?
3. What's the average quantity per transaction?

In [None]:
# Your answers here