# ‚òï Mini Project: Coffee Shop Analytics Challenge

## Welcome, Data Analyst!

You've just been hired by **"Brewed Awakening"**, a popular coffee shop in your city. The owner wants to:

1. üìà **Predict** next week's coffee sales
2. üîç **Discover** recurring customer traffic patterns
3. üí° **Recommend** optimal staffing times

Your task is to analyze historical data and provide actionable insights!

---

## üéØ Project Objectives

By the end of this project, you will:
- Clean messy sales data
- Apply preprocessing techniques
- Build a forecasting model
- Detect traffic patterns
- Present findings to the coffee shop owner

---

## üìã Instructions

This is a **guided project**. Follow each step and copy/modify code from your lesson notebooks:
- üìì **Lesson 1**: For data preprocessing (cleaning, smoothing, etc.)
- üìì **Lesson 2**: For forecasting and pattern detection

**Let's begin!** ‚òï‚ú®

---
## Step 1: Setup

**üìù TODO:** Copy the import statements from **Lesson 1, cell 2** to import all necessary libraries.

In [None]:
# TODO: Copy import statements from Lesson 1 here
# You'll need: numpy, pandas, matplotlib, seaborn, scipy



# TODO: Also copy imports from Lesson 2 for ARIMA
# You'll need: ARIMA and related functions



# Run this cell to verify imports work
print("‚úÖ All libraries imported!")

---
## Step 2: Load the Data

The coffee shop has been tracking hourly coffee sales for the past 60 days. Let's load this data!

**Run the cell below** to generate the dataset:

In [None]:
# Generate coffee shop sales data (DO NOT MODIFY - just run this cell)
np.random.seed(123)

# 60 days of hourly data
hours = pd.date_range('2024-01-01', periods=24*60, freq='h')
n = len(hours)

# Base pattern: more sales during day, peaks at morning and afternoon
hour_of_day = np.array([h.hour for h in hours])
day_of_week = np.array([h.dayofweek for h in hours])  # 0=Monday, 6=Sunday

# Morning rush (7-9am) and afternoon rush (2-4pm)
morning_rush = 30 * np.exp(-((hour_of_day - 8)**2) / 4)
afternoon_rush = 25 * np.exp(-((hour_of_day - 15)**2) / 4)

# Weekend boost (10% more sales on Sat/Sun)
weekend_boost = np.where((day_of_week == 5) | (day_of_week == 6), 10, 0)

# Overall trend (business growing)
trend = 0.02 * np.arange(n)

# Base sales + components
base = 20
clean_sales = base + trend + morning_rush + afternoon_rush + weekend_boost

# Add PROBLEMS to make it realistic!
dirty_sales = clean_sales.copy()

# Problem 1: Add noise
dirty_sales += np.random.normal(0, 5, n)

# Problem 2: Missing values (15% - register malfunctions!)
missing_idx = np.random.choice(n, size=int(n * 0.15), replace=False)
dirty_sales[missing_idx] = np.nan

# Problem 3: Outliers (8% - data entry errors!)
outlier_idx = np.random.choice(n, size=int(n * 0.08), replace=False)
dirty_sales[outlier_idx] += np.random.choice([-1, 1], size=len(outlier_idx)) * np.random.uniform(40, 80, size=len(outlier_idx))

# Problem 4: Some negative values (data errors - can't have negative sales!)
dirty_sales = np.maximum(dirty_sales, 0)

# Create DataFrame
df_coffee = pd.DataFrame({
    'timestamp': hours,
    'sales': dirty_sales,
    'hour': hour_of_day,
    'day_of_week': day_of_week
})

print("‚òï Coffee Shop Sales Data Loaded!")
print(f"üìä Dataset: {len(df_coffee)} hours ({len(df_coffee)/24:.0f} days)")
print(f"üìÖ Period: {df_coffee['timestamp'].min()} to {df_coffee['timestamp'].max()}")
print(f"\n‚ö†Ô∏è  Data Quality Issues Detected:")
print(f"   - Missing values: {df_coffee['sales'].isna().sum()} ({df_coffee['sales'].isna().sum()/len(df_coffee)*100:.1f}%)")
print(f"   - Suspicious outliers: ~{int(n * 0.08)} points")
print(f"\nüéØ Your mission: Clean this data and make predictions!")

# Show first few rows
df_coffee.head(10)

---
## Step 3: Visualize the Raw Data

Before cleaning, let's see what we're working with!

**üìù TODO:** Copy the visualization code from **Lesson 1, cell 23** and modify it to plot the coffee sales data.

In [None]:
# TODO: Create a plot showing the coffee sales over time
# Hint: Copy from Lesson 1, cell 23 and adapt for df_coffee



# Your observations:
print("\nü§î What do you notice?")
print("1. Are there any obvious patterns? _________________")
print("2. Do you see outliers? _________________")
print("3. Are there gaps (missing data)? _________________")

---
## Step 4: Handle Missing Data

The cash register sometimes malfunctions! We need to fill in missing sales data.

**üìù TODO:** 
1. Copy the missing data handling code from **Lesson 1, cell 25 and 27**
2. Decide which method is best for coffee shop sales
3. Apply it to `df_coffee['sales']`

In [None]:
# TODO: Copy missing data handling code from Lesson 1, cell 25
# Try different methods: forward fill, backward fill, interpolation



# TODO: Choose the best method and apply it
# Create a new column 'sales_clean' with filled values
df_coffee['sales_clean'] = df_coffee['sales']  # Replace this line with your solution



# Verify missing values are handled
print(f"‚úÖ Missing values remaining: {df_coffee['sales_clean'].isna().sum()}")

# Why did you choose this method?
print("\nüí≠ I chose __________ because: __________________")

---
## Step 5: Detect and Handle Outliers

Some data points are suspiciously high (data entry errors). Let's find and fix them!

**üìù TODO:**
1. Copy the outlier detection code from **Lesson 1, cell 37**
2. Copy the outlier handling code from **Lesson 1, cell 39**
3. Apply the IQR method to detect outliers in coffee sales
4. Choose a strategy to handle outliers

In [None]:
# TODO: Copy outlier detection code from Lesson 1, cell 37
# Use the IQR method on df_coffee['sales_clean']



# TODO: Visualize the outliers



# TODO: Handle outliers using one of the strategies from cell 39
# Options: remove (set to NaN), cap at bounds, or replace with median



print(f"\nüìä Outliers detected: {outliers_iqr.sum()}")
print(f"üí≠ I chose to handle them by: __________________")

---
## Step 6: Smooth the Data

Sales data is noisy! Let's smooth it to see patterns more clearly.

**üìù TODO:**
1. Copy the smoothing functions from **Lesson 1, cell 29**
2. Experiment with different window sizes
3. Apply smoothing to create `df_coffee['sales_smoothed']`

In [None]:
# TODO: Copy the moving_average and exponential_smoothing functions from Lesson 1, cell 29



# TODO: Experiment with different parameters (copy from cell 31)
window_size = 6  # Try different values: 3, 6, 12, 24



# TODO: Apply your chosen smoothing method
df_coffee['sales_smoothed'] = df_coffee['sales_clean']  # Replace with smoothed version



# Visualize before and after
plt.figure(figsize=(15, 6))
plt.plot(df_coffee['timestamp'], df_coffee['sales'], alpha=0.3, label='Raw (with problems)', linewidth=0.5)
plt.plot(df_coffee['timestamp'], df_coffee['sales_smoothed'], linewidth=2, label='Cleaned & Smoothed', color='green')
plt.title('‚òï Coffee Sales: Before vs After Preprocessing', fontsize=16, fontweight='bold')
plt.xlabel('Time')
plt.ylabel('Coffee Sales (cups)')
plt.legend()
plt.grid(True, alpha=0.3)
plt.tight_layout()
plt.show()

print("\n‚úÖ Data preprocessing complete!")

---
## Step 7: Test for Stationarity

Before forecasting, we need to check if our data is stationary!

**üìù TODO:** Copy the ADF test function from **Lesson 2, cell 19** and test the smoothed sales data.

In [None]:
# TODO: Copy the adf_test function from Lesson 2, cell 19



# TODO: Test the smoothed sales data
# Fill any remaining NaN values first
df_coffee['sales_smoothed'] = df_coffee['sales_smoothed'].fillna(method='bfill').fillna(method='ffill')



# Is the data stationary?
print("\nüí≠ The data is: ‚òê Stationary  ‚òê Non-stationary")
print("   This means we need d = ___ for our ARIMA model")

---
## Step 8: Determine ARIMA Parameters

Use ACF and PACF plots to choose (p, d, q) parameters!

**üìù TODO:** Copy the ACF/PACF plotting code from **Lesson 2, cell 21**

In [None]:
# TODO: Copy ACF/PACF plotting code from Lesson 2, cell 21
# Adjust for df_coffee['sales_smoothed']



# TODO: Based on the plots, choose your ARIMA parameters
print("\nüéØ Based on the ACF/PACF plots:")
print("   p (from PACF cutoff) = ___")
print("   d (from stationarity test) = ___")
print("   q (from ACF cutoff) = ___")

---
## Step 9: Fit ARIMA Model and Forecast

Now the exciting part - predicting next week's sales!

**üìù TODO:** Copy the ARIMA fitting and forecasting code from **Lesson 2, cells 23 and 25**

In [None]:
# TODO: Copy ARIMA model fitting code from Lesson 2, cell 23
# Set your chosen (p, d, q) parameters
p = 1  # Change based on your analysis
d = 1  # Change based on your analysis
q = 1  # Change based on your analysis



# TODO: Make a 7-day (168 hour) forecast using code from cell 25
forecast_hours = 24 * 7  # One week



# Visualize the forecast



print(f"\n‚úÖ Forecast complete! Your ARIMA({p},{d},{q}) model predicts:")
print(f"   Average daily sales next week: _____ cups")
print(f"   Peak hour sales: _____ cups")
print(f"   Model confidence: Check the confidence interval width!")

---
## Step 10: Find Traffic Patterns (Motifs)

Let's discover when the coffee shop is busiest!

**üìù TODO:** 
1. Aggregate hourly sales by hour of day (0-23)
2. Visualize the average sales per hour
3. Identify peak times

In [None]:
# TODO: Calculate average sales for each hour of the day
# Hint: Use groupby on the 'hour' column

hourly_avg = df_coffee.groupby('hour')['sales_smoothed'].mean()

# Visualize hourly patterns
plt.figure(figsize=(15, 6))
plt.bar(hourly_avg.index, hourly_avg.values, color='brown', alpha=0.7, edgecolor='black')
plt.axhline(y=hourly_avg.mean(), color='red', linestyle='--', linewidth=2, label=f'Average: {hourly_avg.mean():.1f} cups/hour')
plt.title('‚òï Average Coffee Sales by Hour of Day', fontsize=16, fontweight='bold')
plt.xlabel('Hour of Day (0 = Midnight, 12 = Noon)')
plt.ylabel('Average Sales (cups)')
plt.xticks(range(0, 24))
plt.legend()
plt.grid(True, alpha=0.3, axis='y')
plt.tight_layout()
plt.show()

# Find peak hours
peak_hours = hourly_avg.nlargest(3)
print("\nüî• Top 3 Busiest Hours:")
for hour, sales in peak_hours.items():
    print(f"   {hour}:00 - {hour+1}:00 ‚Üí {sales:.1f} cups (avg)")

# Find slowest hours
slow_hours = hourly_avg.nsmallest(3)
print("\nüò¥ Top 3 Slowest Hours:")
for hour, sales in slow_hours.items():
    print(f"   {hour}:00 - {hour+1}:00 ‚Üí {sales:.1f} cups (avg)")

**üìù TODO:** Now analyze weekend vs weekday patterns

In [None]:
# TODO: Compare weekday vs weekend sales
# Hint: Create a new column 'is_weekend' based on day_of_week

df_coffee['is_weekend'] = df_coffee['day_of_week'].isin([5, 6])

# Calculate averages
weekday_avg = df_coffee[~df_coffee['is_weekend']]['sales_smoothed'].mean()
weekend_avg = df_coffee[df_coffee['is_weekend']]['sales_smoothed'].mean()

# Visualize
plt.figure(figsize=(10, 6))
categories = ['Weekday', 'Weekend']
values = [weekday_avg, weekend_avg]
colors = ['skyblue', 'coral']
plt.bar(categories, values, color=colors, edgecolor='black', linewidth=2)
plt.title('‚òï Weekday vs Weekend Sales', fontsize=16, fontweight='bold')
plt.ylabel('Average Sales per Hour (cups)')
plt.grid(True, alpha=0.3, axis='y')

# Add value labels on bars
for i, v in enumerate(values):
    plt.text(i, v + 1, f'{v:.1f}', ha='center', fontweight='bold', fontsize=12)

plt.tight_layout()
plt.show()

print(f"\nüìä Analysis:")
print(f"   Weekday average: {weekday_avg:.1f} cups/hour")
print(f"   Weekend average: {weekend_avg:.1f} cups/hour")
print(f"   Difference: {abs(weekend_avg - weekday_avg):.1f} cups/hour ({abs(weekend_avg - weekday_avg)/weekday_avg*100:.1f}%)")

if weekend_avg > weekday_avg:
    print(f"\nüí° Insight: Weekends are busier! Consider extra staff.")
else:
    print(f"\nüí° Insight: Weekdays are busier! Likely due to morning commuters.")

---
## Step 11: Business Recommendations

Time to turn your analysis into actionable insights for the coffee shop owner!

**üìù TODO:** Fill in your recommendations based on your analysis

### üìã Your Report to "Brewed Awakening" Owner

---

#### 1. üìà Sales Forecast (Next 7 Days)

**Model Used:** ARIMA(__, __, __)

**Key Predictions:**
- Expected daily average: _____ cups
- Peak day: _______________
- Slowest day: _____________
- Confidence level: _________

---

#### 2. üîç Traffic Patterns Discovered

**Busiest Hours:** (When do we need most staff?)
1. ___:00 - ___:00
2. ___:00 - ___:00
3. ___:00 - ___:00

**Slowest Hours:** (When can we reduce staff?)
1. ___:00 - ___:00
2. ___:00 - ___:00
3. ___:00 - ___:00

**Weekend vs Weekday:**
- Weekend sales are ___% (higher/lower) than weekdays

---

#### 3. üí° Top 3 Recommendations

**Recommendation 1: Staffing**

_______________________________________________________________

_______________________________________________________________

**Recommendation 2: Inventory**

_______________________________________________________________

_______________________________________________________________

**Recommendation 3: Promotions**

_______________________________________________________________

_______________________________________________________________

---

#### 4. üéì What I Learned

**Most challenging part of this project:**

_______________________________________________________________

**Most interesting insight:**

_______________________________________________________________

**How could this analysis be improved?**

_______________________________________________________________

---

---
## üéâ Bonus Challenge (Optional)

Want to go further? Try these extensions:

### Challenge 1: Model Comparison
Try 3 different ARIMA parameter combinations and compare their MAE/RMSE. Which performs best?

### Challenge 2: Seasonal Patterns
Try to detect repeating daily patterns using the motif detection code from Lesson 2, cell 39. Can you find the "morning rush" pattern?

### Challenge 3: What-If Analysis
What if sales increase by 20% next month due to a marketing campaign? How would that affect staffing needs?

### Challenge 4: Create a Dashboard
Make an executive summary with 3-4 key visualizations that tell the complete story!

In [None]:
# Your bonus challenge code here!



---
## üéØ Project Checklist

Before submitting, make sure you completed:

- [ ] ‚úÖ Imported all necessary libraries
- [ ] ‚úÖ Visualized raw data
- [ ] ‚úÖ Handled missing values
- [ ] ‚úÖ Detected and handled outliers
- [ ] ‚úÖ Smoothed the data
- [ ] ‚úÖ Tested for stationarity
- [ ] ‚úÖ Analyzed ACF/PACF plots
- [ ] ‚úÖ Chose ARIMA parameters
- [ ] ‚úÖ Fit ARIMA model
- [ ] ‚úÖ Made 7-day forecast
- [ ] ‚úÖ Found traffic patterns
- [ ] ‚úÖ Analyzed weekday vs weekend
- [ ] ‚úÖ Completed recommendations report
- [ ] ‚úÖ Reflected on what I learned

---

## üéì Congratulations!

You've successfully completed the Coffee Shop Analytics project! You've applied:

- ‚úÖ Time series preprocessing techniques
- ‚úÖ ARIMA forecasting
- ‚úÖ Pattern recognition
- ‚úÖ Business insights generation

**These skills are directly applicable to:**
- Sales forecasting
- Demand planning
- Resource optimization
- Anomaly detection
- Business intelligence

**Great work! ‚òï‚ú®**