# Unit 3 Lab: Advanced Data Handling & Visualization (Pandas, Matplotlib, Seaborn, Plotly)

**Focus:** Pandas DataFrame operations, grouping, merging, basic plotting, and a small interactive Plotly example.

In [None]:
import pandas as pd
import numpy as np
np.random.seed(0)

# Simulate sales dataset
dates = pd.date_range(start='2025-01-01', periods=30)
store = ['A','B','C']
rows = []
for d in dates:
    for s in store:
        rows.append({'date': d, 'store': s, 'sales': int(np.random.poisson(200) + (0 if s=='A' else 20 if s=='B' else -10))})

df = pd.DataFrame(rows)

df.head()

## Task 1 — Explore the DataFrame
- Use `df.info()`, `df.describe()`, and `df.head()` to inspect data.

In [None]:
df.info()

df.describe()

## Task 2 — Grouping & Aggregation
- Find total sales per store and average daily sales.
- Which store has the highest average sales?

In [None]:
sales_by_store = df.groupby('store')['sales'].agg(['sum','mean','median']).reset_index()
sales_by_store

## Task 3 — Time Series Plot
- Plot daily total sales (sum across stores) as a line chart.

In [None]:
import matplotlib.pyplot as plt

daily = df.groupby('date')['sales'].sum().reset_index()
plt.figure(figsize=(10,4))
plt.plot(daily['date'], daily['sales'], marker='o')
plt.title('Daily Total Sales')
plt.xlabel('Date')
plt.ylabel('Sales')
plt.xticks(rotation=45)
plt.tight_layout()
plt.show()

## Task 4 — Seaborn Visualization
- Create a boxplot showing sales distribution per store and a heatmap of daily sales per store.

In [None]:
import seaborn as sns
plt.figure(figsize=(6,4))
sns.boxplot(x='store', y='sales', data=df)
plt.title('Sales Distribution by Store')
plt.show()

# Pivot for heatmap
pivot = df.pivot_table(index='date', columns='store', values='sales')
plt.figure(figsize=(8,6))
sns.heatmap(pivot.T, cmap='YlGnBu')
plt.title('Heatmap of Sales (Store vs Date)')
plt.show()

## Task 5 — Plotly Interactive Example (optional)
- Create an interactive scatter plot of daily sales for store A. (Requires internet if running in some environments but works offline in many setups.)

In [None]:
import plotly.express as px
storeA = df[df['store']=='A'].copy()
fig = px.scatter(storeA, x='date', y='sales', title='Store A Daily Sales (Interactive)')
fig.show()

## Task 6 — Short Exercise
- Export the summary `sales_by_store` to CSV using `to_csv()` and share one insight from the aggregated table.

In [None]:
sales_by_store.to_csv('sales_by_store_summary.csv', index=False)
print('Saved sales_by_store_summary.csv')

# Show the content for quick review
sales_by_store


---
## Trainer's Answers & Expected Outputs — Unit 3 Lab

**Task 1 — Explore the DataFrame**  
- `df.info()` should show 90 rows (30 dates × 3 stores) and columns: date, store, sales.  
- `df.describe()` shows sales statistics; mean likely around ~200-220 depending on poisson random seeds.

**Task 2 — Grouping & Aggregation**  
- `sales_by_store` displays sum, mean, median per store. Expect store B slightly higher on average (per generation logic).

**Task 3 — Time Series Plot**  
- The line plot of daily total sales will show fluctuations across the 30-day period; no strict trend but small variability day-to-day.

**Task 4 — Seaborn Visualization**  
- Boxplot: shows distribution per store; heatmap: visual grid of daily sales across stores (dates on one axis).

**Task 5 — Plotly Interactive Example**  
- Interactive scatter should render in environments supporting Plotly. In Colab or JupyterLab it opens inline or in a new tab.

**Task 6 — Export CSV**  
- `sales_by_store_summary.csv` is saved and should contain three rows (one per store) with sum/mean/median.

**Grading / Discussion Tips:**  
- Confirm students can read and describe groupby results and create meaningful plots.  
- Check CSV file creation and that students can state one insight, e.g., which store sells most on average.
---
