# Data Visualization with Pandas and Matplotlib - Exercises

This notebook contains exercises from Lecture 4: Data Visualization.
Try to solve them yourself before looking at the solutions!


## Part 1: Pandas Plotting Basics


### Exercise: Create Your First Plot Step by Step

**Problem:** Create a simple line plot following these steps:

1. Create sample data (Month and Sales)
2. Create a basic plot
3. Add a title
4. Add axis labels
5. Add grid
6. Save the plot

**Data:**
- Months: ['Jan', 'Feb', 'Mar', 'Apr', 'May']
- Sales: [100, 120, 140, 130, 150]


In [None]:
# Your solution here
import pandas as pd
import matplotlib.pyplot as plt

# Step 1: Create sample data
# Step 2: Create basic plot
# Step 3: Add title
# Step 4: Add axis labels
# Step 5: Add grid
# Step 6: Display plot


## Part 2: Line Plots


### ‚úèÔ∏è Challenge: Create Monthly Sales Line Plot

**Problem:** Create a line plot showing monthly total sales

**Steps:**
1. Load sales_data.csv
2. Convert Date column to datetime
3. Create a 'Month' column from Date
4. Group sales by month and sum
5. Create a line plot with title, labels, and grid

**Hint:** Use `.dt.to_period('M')` for monthly grouping and `.plot(kind='line')`


In [None]:
# Your solution here
import pandas as pd
import matplotlib.pyplot as plt
import os

data_dir = os.path.join('..', 'data')
sales_file = os.path.join(data_dir, 'sales_data.csv')

# Load and prepare data
# Create monthly sales
# Create line plot
# Add title, labels, grid
# Display plot


## Part 3: Bar Charts


### ‚úèÔ∏è Challenge: Create Top 5 Products Bar Chart

**Problem:** Create a bar chart showing top 5 products by sales with value labels on bars

**Steps:**
1. Load sales_data.csv
2. Group sales by Product and sum
3. Get top 5 products
4. Create a bar chart
5. Add value labels on top of each bar
6. Add title, labels, and grid

**Hint:** Use `.plot(kind='bar')` and `ax.text()` for value labels


In [None]:
# Your solution here
import pandas as pd
import matplotlib.pyplot as plt
import os

data_dir = os.path.join('..', 'data')
sales_file = os.path.join(data_dir, 'sales_data.csv')

# Load data
# Group by Product and get top 5
# Create bar chart
# Add value labels on bars
# Add title, labels, grid
# Display plot


## Part 4: Histograms


### ‚úèÔ∏è Challenge: Compare Sales Distributions

**Problem:** Create histograms comparing sales distributions for two regions (North and South)

**Steps:**
1. Load sales_data.csv
2. Filter data for North and South regions
3. Create overlapping histograms with different colors
4. Add legend, title, labels, and grid

**Hint:** Use `.plot(kind='hist')` with `alpha` parameter for transparency and `label` for legend


In [None]:
# Your solution here
import pandas as pd
import matplotlib.pyplot as plt
import os

data_dir = os.path.join('..', 'data')
sales_file = os.path.join(data_dir, 'sales_data.csv')

# Load data
# Filter North and South regions
# Create overlapping histograms
# Add legend, title, labels
# Display plot


## Part 5: Scatter Plots


### ‚úèÔ∏è Challenge: Create Correlation Scatter Plot

**Problem:** Create a scatter plot showing correlation between Age and TotalSpent with a trend line

**Steps:**
1. Load customer_data.csv
2. Remove missing values for Age and TotalSpent
3. Create a scatter plot
4. Calculate correlation coefficient
5. Add a trend line using `np.polyfit()` and `np.poly1d()`
6. Display correlation in the title

**Hint:** Use `.plot(kind='scatter')` and `ax.plot()` for trend line


In [None]:
# Your solution here
import pandas as pd
import matplotlib.pyplot as plt
import numpy as np
import os

data_dir = os.path.join('..', 'data')
customer_file = os.path.join(data_dir, 'customer_data.csv')

# Load and clean data
# Calculate correlation
# Create scatter plot
# Add trend line
# Add title with correlation value
# Display plot


## Part 6: Box Plots


### ‚úèÔ∏è Challenge: Compare Sales by Region

**Problem:** Create box plots comparing sales distributions across regions

**Steps:**
1. Load sales_data.csv
2. Clean data (remove missing Sales)
3. Create box plots grouped by Region
4. Customize colors and styling
5. Add title, labels, and grid

**Hint:** Use `.boxplot(column='Sales', by='Region')` and remember to remove default suptitle with `plt.suptitle('')`


In [None]:
# Your solution here
import pandas as pd
import matplotlib.pyplot as plt
import os

data_dir = os.path.join('..', 'data')
sales_file = os.path.join(data_dir, 'sales_data.csv')

# Load and clean data
# Create box plots by Region
# Customize styling
# Remove default suptitle
# Add title, labels
# Display plot


## Part 7: Pie Charts


### ‚úèÔ∏è Challenge: Create Product Sales Pie Chart

**Problem:** Create a pie chart showing sales share of top 3 products with exploded slices

**Steps:**
1. Load sales_data.csv
2. Group sales by Product and sum
3. Get top 3 products
4. Create a pie chart with:
   - Percentage labels (`autopct='%1.1f%%'`)
   - Exploded slices (`explode=(0.05, 0.05, 0.05)`)
   - Shadow effect (`shadow=True`)
   - Custom colors
5. Add title

**Hint:** Use `.plot(kind='pie')` with `explode` and `shadow` parameters


In [None]:
# Your solution here
import pandas as pd
import matplotlib.pyplot as plt
import os

data_dir = os.path.join('..', 'data')
sales_file = os.path.join(data_dir, 'sales_data.csv')

# Load data
# Group by Product and get top 3
# Create pie chart with exploded slices
# Add title
# Display plot


## Part 8: Time Series Visualization


### ‚úèÔ∏è Challenge: Create Sales Trend with Moving Average

**Problem:** Create a time series plot with original daily sales data and 7-day moving average

**Steps:**
1. Load sales_data.csv
2. Convert Date to datetime
3. Group sales by Date and sum
4. Calculate 7-day moving average using `.rolling(window=7).mean()`
5. Plot both original data and moving average on the same plot
6. Add legend, title, labels, and grid

**Hint:** Use `.rolling()` for moving average and plot both series on the same axes


In [None]:
# Your solution here
import pandas as pd
import matplotlib.pyplot as plt
import os

data_dir = os.path.join('..', 'data')
sales_file = os.path.join(data_dir, 'sales_data.csv')

# Load and prepare data
# Calculate daily sales
# Calculate 7-day moving average
# Plot both on same axes
# Add legend, title, labels
# Display plot


## Part 9: Subplots and Layouts


### ‚úèÔ∏è Challenge: Create Customer Analysis Dashboard

**Problem:** Create a 2x2 dashboard showing customer demographics

**Requirements:**
1. Top-left: Age distribution histogram
2. Top-right: TotalSpent distribution histogram
3. Bottom-left: Top 5 cities by spending (bar chart)
4. Bottom-right: Age vs TotalSpent scatter plot

**Steps:**
1. Load customer_data.csv
2. Clean data (remove missing Age and TotalSpent)
3. Create 2x2 subplot layout using `plt.subplots(2, 2)`
4. Create each visualization in its subplot
5. Add overall title using `plt.suptitle()`

**Hint:** Use `axes[0, 0]`, `axes[0, 1]`, `axes[1, 0]`, `axes[1, 1]` to access each subplot


In [None]:
# Your solution here
import pandas as pd
import matplotlib.pyplot as plt
import os

data_dir = os.path.join('..', 'data')
customer_file = os.path.join(data_dir, 'customer_data.csv')

# Load and clean data
# Create 2x2 subplot layout
# Plot 1: Age distribution histogram
# Plot 2: TotalSpent distribution histogram
# Plot 3: Top 5 cities bar chart
# Plot 4: Age vs TotalSpent scatter plot
# Add overall title
# Display dashboard


## üéì Project 1: Sales Analysis Dashboard


### Comprehensive Sales Analysis Dashboard

**Problem:** Create a comprehensive sales dashboard with multiple visualizations

**Requirements:**
Create a dashboard with at least 5 different visualizations showing:
1. Daily sales trend (time series)
2. Top 5 products (bar chart)
3. Sales by region (pie chart)
4. Sales distribution (box plot or histogram)
5. Sales vs quantity (scatter plot)

**Your Task:** 
- Use subplots or custom layout
- Add titles and labels to all plots
- Create a professional-looking dashboard
- Save the dashboard as an image


In [None]:
# Your solution here
import pandas as pd
import matplotlib.pyplot as plt
import os

data_dir = os.path.join('..', 'data')
sales_file = os.path.join(data_dir, 'sales_data.csv')

# Load and prepare data
# Create dashboard layout
# Add visualizations
# Customize styling
# Add overall title
# Save dashboard


## üéì Project 2: Customer Demographics Analysis


### Customer Demographics Analysis Dashboard

**Problem:** Visualize customer demographics and spending patterns

**Requirements:**
Create a dashboard showing:
1. Age distribution (histogram)
2. TotalSpent distribution (histogram)
3. Top cities by spending (bar chart)
4. Age vs TotalSpent relationship (scatter plot)
5. Spending by age groups (bar chart)
6. Spending patterns by city (box plot)

**Your Task:**
- Use customer_data.csv
- Create age groups (Young: <30, Middle: 30-50, Senior: >50)
- Create comprehensive dashboard with custom layout
- Add insights and observations


In [None]:
# Your solution here
import pandas as pd
import matplotlib.pyplot as plt
import os

data_dir = os.path.join('..', 'data')
customer_file = os.path.join(data_dir, 'customer_data.csv')

# Load and prepare data
# Create age groups
# Create dashboard layout
# Add all visualizations
# Add insights
# Save dashboard
