# Capstone 3: Approach Overview
## Restaurant Sales Forecasting with Time Series Analysis

This notebook provides a high-level overview of how to approach the Capstone 3 project. It outlines the key steps, decisions, and methodologies without providing complete code solutions.

**Goal:** Help you understand the problem-solving framework and key considerations for time series forecasting.

---

## Project Overview

### Business Problem
You have 3 years of daily sales data (2019-2021) from 6 restaurants selling 100 different menu items. Your task is to:
1. Analyze historical sales patterns
2. Build forecasting models to predict future sales
3. Provide business insights and recommendations

### Why This Matters
- **Inventory Management:** Order the right amount of ingredients
- **Staffing:** Schedule appropriate number of employees
- **Financial Planning:** Forecast revenue for budgeting
- **Marketing:** Plan promotions based on predicted slow periods

### Datasets
1. **sales.csv** - Daily sales records (~110K rows)
   - date, item_id, price, item_count
2. **items.csv** - Menu item details (100 items)
   - id, store_id, name, kcal, cost
3. **resturants.csv** - Restaurant information (6 locations)
   - id, name

---

## Part 1: Data Understanding and Exploration

### 1.1 Initial Data Loading

**First Steps:**
- Load all three CSV files using pandas
- Check the shape of each dataset
- Examine first few rows
- Understand column data types

**Key Questions to Answer:**
- How many sales records do you have?
- What is the date range of the data?
- Are there any missing values?
- Do the datasets link together properly?

**Critical Step: Date Conversion**
- Convert the 'date' column to datetime format
- This is ESSENTIAL for time series analysis
- Use `pd.to_datetime()`

### 1.2 Data Quality Checks

**Important Issues to Look For:**

**1. Zero Sales Records:**
- You'll likely see many rows with `item_count = 0`
- This means the item was available but not sold that day
- **Decision:** Keep or remove these records?
  - For aggregate analysis: Usually OK to keep
  - For item-level analysis: May want to filter out
  - Document your choice!

**2. Data Structure:**
- Each day should have entries for all 100 items (some with 0 sales)
- Total records = days × items = ~1,095 days × 100 items = ~109,500
- Verify this makes sense

**3. Missing Dates:**
- Are there any gaps in the date sequence?
- Check: `date_range = pd.date_range(start=min_date, end=max_date)`
- Compare expected vs actual dates

**4. Outliers:**
- Unusually high sales days?
- Negative values? (shouldn't exist)
- Use `.describe()` to spot anomalies

### 1.3 Merging Datasets

**Why Merge?**
- Sales data only has IDs (item_id)
- Need item names and restaurant names for interpretation
- Enables more meaningful analysis

**How to Merge:**
```python
# Pseudo-code approach:
# 1. Merge sales with items on item_id
# 2. Merge result with restaurants on store_id
# 3. Calculate revenue = price × item_count
```

**Validation:**
- Check merged dataset has same number of rows as sales
- Verify no NaN values in restaurant/item names
- If NaNs appear, investigate data quality issues

### 1.4 Exploratory Data Analysis (EDA)

**Overall Trends:**

**Daily Aggregation:**
- Group by date, sum all items and revenue
- Plot time series: date vs total items sold
- Plot time series: date vs total revenue
- Look for:
  - Trends (increasing/decreasing over time?)
  - Seasonality (weekly, monthly, yearly patterns?)
  - Outliers (unusual spikes or drops?)

**Questions to Explore:**
- Is business growing or declining?
- Are there seasonal effects?
- Any impact from 2020 events?

---

**Restaurant-Level Analysis:**

- Which restaurant sells the most items?
- Which generates the most revenue?
- Are some restaurants underperforming?

**Visualizations:**
- Bar charts: Total sales by restaurant
- Bar charts: Total revenue by restaurant
- Time series: Individual restaurant trends

---

**Item-Level Analysis:**

**Best Sellers:**
- Top 20 items by quantity sold
- Top 20 items by revenue generated
- Compare: Do high-volume items = high-revenue items?

**Insights to Find:**
- Which items drive the business?
- Are expensive items selling well?
- Any items with zero/minimal sales (candidates for removal)?

---

**Temporal Patterns:**

**Monthly Trends:**
- Aggregate sales by year and month
- Plot monthly sales over 3 years
- Look for seasonal patterns (e.g., higher sales in certain months)

**Day of Week Patterns:**
- Calculate average sales for each day (Mon-Sun)
- Are weekends busier than weekdays?
- This is crucial for staffing decisions!

**Visualizations:**
- Line plots for monthly trends
- Bar charts for day-of-week averages

---

## Part 2: Feature Engineering for Time Series

### 2.1 Understanding Time Series Features

**Why Feature Engineering?**

Unlike regular machine learning, time series has special characteristics:
- **Temporal dependency:** Today's sales depend on yesterday's
- **Seasonality:** Patterns repeat (weekly, monthly, yearly)
- **Trends:** Long-term increases or decreases

Machine learning models don't automatically understand "time" - we must create features that capture these patterns!

**Three Categories of Features:**
1. **Time-based features** (from date)
2. **Lag features** (past values)
3. **Rolling window features** (moving statistics)

### 2.2 Time-Based Features

**Extract from Date Column:**

**Basic Temporal Features:**
- `year`: 2019, 2020, 2021
- `month`: 1-12
- `day`: 1-31
- `day_of_week`: 0 (Monday) to 6 (Sunday)
- `day_of_year`: 1-365
- `week_of_year`: 1-52
- `quarter`: 1-4

**Why Each Feature Matters:**
- `month`: Captures yearly seasonality (summer vs winter)
- `day_of_week`: Captures weekly patterns (weekend vs weekday)
- `quarter`: Captures quarterly business cycles

---

**Binary Flags:**
- `is_weekend`: 1 if Saturday/Sunday, 0 otherwise
- `is_month_start`: 1 if first day of month
- `is_month_end`: 1 if last day of month

**Why Binary Flags?**
- Highlight special periods that may have different patterns
- Month-end might have rush due to people getting paid
- Weekends typically have different patterns

---

**Cyclical Features (Advanced but Important!):**

**Problem:** Month=1 and Month=12 are actually close (January and December)
- But numerically, 1 and 12 look far apart
- This confuses machine learning models!

**Solution:** Sine/Cosine Encoding
```python
# Pseudo-code
month_sin = sin(2 * π * month / 12)
month_cos = cos(2 * π * month / 12)
```

**Why This Works:**
- Creates circular representation
- December (12) and January (1) are now mathematically close
- Apply to: month, day_of_week, hour (if you had hourly data)

**When to Use:**
- Essential for capturing seasonality in ML models
- Random Forest and XGBoost benefit greatly from this

### 2.3 Lag Features

**Concept:**
- Use past values as predictors for future values
- "Yesterday's sales help predict today's sales"

**Common Lags to Create:**
- **Lag 1:** Previous day's sales
- **Lag 7:** Sales from 1 week ago (same day of week!)
- **Lag 14:** Sales from 2 weeks ago
- **Lag 30:** Sales from ~1 month ago

**Why These Specific Lags?**
- **Lag 1:** Immediate past is often most predictive
- **Lag 7:** Captures weekly seasonality (e.g., every Monday is similar)
- **Lag 14, 30:** Captures longer-term patterns

**How to Create:**
```python
# Pseudo-code
data['sales_lag_1'] = data['sales'].shift(1)
data['sales_lag_7'] = data['sales'].shift(7)
```

**Important Caveat:**
- Lag features create NaN values at the beginning
- First row has no lag_1 (no previous day)
- First 7 rows have no lag_7
- **You'll need to drop these NaN rows later!**

### 2.4 Rolling Window Features

**Concept:**
- Calculate statistics over a moving window
- Captures recent trends and patterns

**Common Rolling Features:**

**7-day rolling mean:**
- Average of last 7 days
- Smooths out daily noise
- Represents "recent trend"

**7-day rolling std:**
- Standard deviation of last 7 days
- Captures volatility/variability
- High std = unstable sales

**7-day rolling min/max:**
- Minimum and maximum in last 7 days
- Shows range of recent performance

**Common Windows:**
- **7 days:** Recent week's pattern
- **14 days:** Two-week trend
- **30 days:** Monthly trend

**How to Create:**
```python
# Pseudo-code
data['sales_rolling_mean_7'] = data['sales'].rolling(window=7).mean()
data['sales_rolling_std_7'] = data['sales'].rolling(window=7).std()
```

**Important Notes:**
- Also creates NaN values at the beginning
- First 7 rows won't have 7-day rolling mean
- Be consistent with how you handle NaNs

### 2.5 Handling Missing Values from Feature Engineering

**After creating lag and rolling features, you'll have NaNs!**

**Approaches:**

**Option 1: Drop Rows with NaN (Recommended)**
- Simple and clean
- Use `df.dropna()`
- You'll lose ~30-40 days of data (from the beginning)
- With 3 years of data, this is acceptable

**Option 2: Forward Fill**
- Fill NaN with previous valid value
- Less ideal for time series (introduces bias)

**Option 3: Use Shorter Lags Initially**
- Start with only lag 1, 2, 3 to minimize NaNs
- Trade-off: Less feature richness

**Best Practice:**
- Drop NaN rows
- Document how many rows removed
- Ensure you still have sufficient training data

---

## Part 3: Time Series Forecasting Models

### 3.1 Train/Test Split - TIME SERIES IS DIFFERENT!

**Critical Rule: NEVER use random splitting for time series!**

**Why Not Random?**
- Would leak future information into training
- Violates temporal causality
- Gives artificially good (but useless) results

**Correct Approach: Temporal Split**

```
Timeline: |-------- Training --------|--- Test ---|
          2019-01      ...        2021-09  2021-12
```

**Common Split Strategies:**

**1. Last N Days as Test:**
- Last 90 days (3 months) for testing
- Everything before for training
- Simple and effective

**2. Percentage-Based:**
- Last 20% of data as test
- First 80% as train
- Ensures temporal order

**3. Specific Date Split:**
- Train: 2019-2020
- Test: 2021
- Clean yearly boundary

**How to Implement:**
```python
# Pseudo-code
split_date = '2021-10-01'
train = data[data['date'] <= split_date]
test = data[data['date'] > split_date]
```

**Validation:**
- Verify train dates < test dates
- Check no overlap
- Ensure sufficient data in both sets

### 3.2 Baseline Model: Moving Average

**Why Start with a Baseline?**
- Establishes minimum performance
- Simple to understand and implement
- Provides comparison point for complex models

**Simple Moving Average:**

**Concept:**
- Predict tomorrow's sales = average of last N days
- Common windows: 7, 14, 30 days

**Example (7-day MA):**
```
Past 7 days sales: [100, 105, 98, 110, 95, 102, 108]
Prediction for tomorrow: Average = 102.6
```

**Pros:**
- Extremely simple
- No training needed
- Interpretable to non-technical stakeholders

**Cons:**
- Doesn't capture complex patterns
- Lags behind trends
- No seasonality awareness

**Expected Performance:**
- R² score: 0.3 - 0.6 (rough estimate)
- Good enough for very stable data
- Usually outperformed by ML models

**Implementation Approach:**
- For each test day:
  - Take last 7 days of actual sales
  - Calculate average
  - That's your prediction
- Important: Use actual values (not predictions) for rolling window

### 3.3 Machine Learning Model: Random Forest

**Why Random Forest for Time Series?**
- Handles non-linear patterns well
- Robust to outliers
- Provides feature importance (interpretability)
- No assumptions about data distribution
- Generally good out-of-the-box performance

**How It Works (Simplified):**
- Builds many decision trees
- Each tree learns different patterns
- Final prediction = average of all trees
- Reduces overfitting through ensemble

---

**Key Hyperparameters:**

**1. n_estimators (Number of trees)**
- Range: 50-500
- Recommendation: 100 is good starting point
- More trees = more computation, but usually better performance
- Diminishing returns after ~200

**2. max_depth (Tree depth)**
- Range: 5-20
- Recommendation: 10 for this problem
- Deeper = more complex patterns, but risk overfitting
- Shallower = simpler, more generalized

**3. min_samples_split**
- Minimum samples to split a node
- Recommendation: 5
- Prevents overfitting on small groups

**4. min_samples_leaf**
- Minimum samples in each leaf
- Recommendation: 2-4
- Smooths predictions

---

**Training Process:**

```python
# Pseudo-code structure:
# 1. Prepare features and target
X_train = train_data[feature_columns]
y_train = train_data['sales']

# 2. Initialize model
model = RandomForestRegressor(n_estimators=100, max_depth=10, ...)

# 3. Train
model.fit(X_train, y_train)

# 4. Predict
X_test = test_data[feature_columns]
predictions = model.predict(X_test)
```

---

**Feature Importance:**

**Why It Matters:**
- Tells you which features drive predictions
- Business insights (what actually affects sales?)
- Feature selection (remove unimportant features)

**How to Extract:**
```python
# Pseudo-code
importances = model.feature_importances_
# Create dataframe, sort by importance, visualize
```

**Expected Important Features:**
- Lag features (especially lag_1, lag_7)
- Rolling means
- Day of week
- Month (seasonality)

---

**Expected Performance:**
- R² score: 0.6 - 0.85 (depending on data quality)
- Should significantly outperform baseline
- MAE: Typically 15-25% of average daily sales

### 3.4 Advanced Model: XGBoost (Optional)

**What is XGBoost?**
- Extreme Gradient Boosting
- State-of-the-art machine learning algorithm
- Often wins Kaggle competitions
- Similar to Random Forest but builds trees sequentially

**How It Differs from Random Forest:**
- **Random Forest:** Build all trees independently, average results
- **XGBoost:** Build trees sequentially, each fixes previous tree's errors

**Pros:**
- Often better performance than Random Forest
- Built-in regularization (prevents overfitting)
- Handles missing values well
- Fast and efficient

**Cons:**
- More hyperparameters to tune
- Slightly more complex to understand
- Requires separate installation (`pip install xgboost`)

---

**Key Hyperparameters:**

**1. n_estimators**
- Same as Random Forest
- Recommendation: 100-200

**2. learning_rate (unique to boosting)**
- Range: 0.01 - 0.3
- Recommendation: 0.1
- Lower = slower learning, more trees needed, but often better
- Higher = faster, but risk of overfitting

**3. max_depth**
- Recommendation: 5-8 (shallower than Random Forest)
- Boosting compensates with more trees

**4. subsample**
- Fraction of samples used per tree
- Recommendation: 0.8
- Prevents overfitting

**5. colsample_bytree**
- Fraction of features used per tree
- Recommendation: 0.8
- Adds randomness, reduces overfitting

---

**When to Use XGBoost:**
- Want to squeeze out extra performance
- Have time to tune hyperparameters
- Working on a competition or critical application

**When Random Forest is Fine:**
- Good enough performance for business needs
- Easier to explain to stakeholders
- Less tuning required

**Expected Performance:**
- Often 2-5% better R² than Random Forest
- Marginal but measurable improvement
- Most improvement on complex patterns

### 3.5 Model Evaluation Metrics

**Why Multiple Metrics?**
- Each metric reveals different aspects of performance
- No single metric tells the whole story
- Different stakeholders care about different metrics

---

**1. Mean Absolute Error (MAE)**

**Formula:** Average of |actual - predicted|

**What It Means:**
- "On average, predictions are off by X items"
- In same units as target (number of items)
- Easy to interpret for business

**Example:**
- MAE = 50 items
- "Our predictions are off by about 50 items per day"

**Pros:**
- Intuitive
- Not sensitive to outliers
- Direct business meaning

**When to Use:**
- Communicating with non-technical stakeholders
- When all errors matter equally

---

**2. Root Mean Squared Error (RMSE)**

**Formula:** Square root of average of (actual - predicted)²

**What It Means:**
- Similar to MAE but penalizes large errors more
- Also in same units as target

**Difference from MAE:**
- RMSE always ≥ MAE
- Gap between RMSE and MAE indicates outliers
- Large gap = you have some big misses

**When to Use:**
- When large errors are particularly bad
- Standard in many ML applications

---

**3. R² Score (Coefficient of Determination)**

**Formula:** 1 - (Sum of squared residuals / Total variance)

**Range:** -∞ to 1 (usually 0 to 1)

**Interpretation:**
- R² = 0.75 → "Model explains 75% of variance"
- R² = 1.0 → Perfect predictions
- R² = 0.0 → No better than predicting the mean
- R² < 0 → Worse than predicting the mean

**What's Good?**
- Time series: 0.6+ is decent, 0.8+ is excellent
- Context-dependent

**Pros:**
- Scale-independent (compare across datasets)
- Percentage of variance explained

**Cons:**
- Less intuitive than MAE
- Can be misleading with outliers

---

**4. Mean Absolute Percentage Error (MAPE) - Optional**

**Formula:** Average of |actual - predicted| / |actual| × 100

**What It Means:**
- "On average, predictions are off by X%"
- Percentage error

**Example:**
- MAPE = 15%
- "Predictions are typically within 15% of actual"

**When to Use:**
- Comparing across different scales
- Business-friendly metric

**Caution:**
- Undefined when actual = 0
- Sensitive to small denominators

---

**Which Metrics to Report?**

**Minimum:**
- MAE (business interpretation)
- R² (model quality)

**Comprehensive:**
- MAE, RMSE, R², MAPE
- Shows complete picture

**For Stakeholders:**
- Focus on MAE or MAPE
- "We predict sales within ±50 items on average"
- Much clearer than R²!

### 3.6 Model Comparison and Selection

**Create a Comparison Table:**

```
Model              MAE    RMSE    R²     Training Time
──────────────────────────────────────────────────────
Moving Average     X      X       X      < 1 sec
Random Forest      X      X       X      30 sec
XGBoost            X      X       X      45 sec
```

**Selection Criteria:**

**Best Performance:**
- Choose model with highest R²
- Or lowest MAE/RMSE
- Usually XGBoost or Random Forest

**Simplicity vs Performance:**
- If Random Forest is 1% worse than XGBoost, but much simpler?
- Consider the trade-off
- Easier to explain and maintain

**Production Considerations:**
- Training time important for daily retraining?
- Prediction speed critical?
- Available computational resources?

**Recommendation:**
- Start with Random Forest
- Try XGBoost if you want best performance
- Keep baseline for sanity check

---

## Part 4: Model Interpretation and Visualization

### 4.1 Prediction Visualization

**Time Series Plot:**

Essential visualization:
- X-axis: Date
- Y-axis: Sales (items sold)
- Two lines:
  - Actual sales (solid line)
  - Predicted sales (dashed line)

**What to Look For:**
- Do predictions follow actual trends?
- Are predictions consistently high or low (bias)?
- Do predictions capture peaks and valleys?
- How far off are they typically?

**Good Signs:**
- Lines track each other closely
- Predictions capture general trend
- Errors are random (not systematic)

**Warning Signs:**
- Predictions always lag behind actuals
- Predictions miss major spikes/drops
- Consistent over/under-prediction

---

**Zoomed Views:**

Create multiple plots:
- **Full test period:** Overall performance
- **First 2 weeks:** Detailed view
- **Last 2 weeks:** Recent performance
- **Worst period:** Where model struggles

Helps identify specific issues

### 4.2 Residual Analysis

**What are Residuals?**
- Residual = Actual - Predicted
- The "error" for each prediction
- Positive = under-predicted
- Negative = over-predicted

**Why Analyze Residuals?**
- Reveals patterns in errors
- Indicates model deficiencies
- Suggests improvements

---

**Residual Plot:**

**X-axis:** Predicted values  
**Y-axis:** Residuals  
**Add:** Horizontal line at y=0

**Ideal Pattern:**
- Random scatter around y=0
- No clear pattern
- Constant variance (homoscedasticity)

**Problem Patterns:**

**1. Funnel Shape:**
```
   |     *   *
   |   *   *
 0 |──*──*────
   |   *   *
   |     *   *
   └──────────→ predicted
```
- Variance increases with prediction
- Model less reliable for high values
- Solution: Log transformation

**2. Curved Pattern:**
```
   |   *     *
   |  *  *  *
 0 |──*───*────
   | *     *
   |*       *
   └──────────→
```
- Non-linear relationship missed
- Need more complex features

**3. Clusters:**
- Distinct groups in residuals
- Missing categorical variable?
- Different restaurants/items behave differently

---

**Residual Distribution:**

**Histogram of Residuals:**

**Ideal:**
- Bell-shaped (normal distribution)
- Centered at 0
- Symmetric

**Problems:**
- **Skewed:** Systematic over/under-prediction
- **Bimodal:** Two different regimes
- **Heavy tails:** Many large errors (outliers)

**Statistics to Calculate:**
- Mean residual (should be ~0)
- Std dev (spread of errors)
- Min/max (worst errors)

### 4.3 Feature Importance Analysis

**What is Feature Importance?**
- Measures how much each feature contributes to predictions
- Higher importance = more influential
- Helps understand what drives sales

**How to Interpret:**

**Top Features Likely Include:**
1. **Lag features** (sales_lag_1, sales_lag_7)
   - Past sales are strong predictors
   - Lag_1 often most important

2. **Rolling means** (sales_rolling_mean_7)
   - Recent trend matters
   - Smoothed signal useful

3. **Day of week**
   - Weekly seasonality strong
   - Weekends vs weekdays

4. **Month** (or month_sin/cos)
   - Yearly seasonality
   - Holiday seasons

**Surprising Low Importance:**
- If day-of-month has low importance: Good! It shouldn't matter much
- If lag_1 has low importance: Concerning - may indicate data quality issues

---

**Visualization:**

Horizontal bar chart:
- Top 10-20 features
- Sorted by importance
- Easy to see what matters

**Business Insights:**
- High importance for lag_7 → Weekly patterns strong
  - **Action:** Plan weekly promotions consistently
  
- High importance for day_of_week → Different staffing needed
  - **Action:** Adjust schedules by day
  
- High importance for month → Seasonal trends
  - **Action:** Adjust inventory seasonally

---

**Feature Selection:**

**If you have many low-importance features:**
- Consider removing them
- Simplifies model
- Faster training
- Sometimes improves performance (reduces noise)

**Threshold approach:**
- Keep features with importance > 0.01
- Or top 80% cumulative importance
- Retrain and compare

---

## Part 5: Business Insights and Recommendations

### 5.1 Translating Model Performance to Business Value

**Connect Metrics to Business Impact:**

**Example:**
- MAE = 50 items per day
- Average daily sales = 800 items
- Prediction accuracy = 1 - (50/800) = 93.75%

**Business Translation:**
- "Our forecasts are 94% accurate on average"
- "We can predict daily sales within ±50 items"
- "This allows 90% reduction in inventory waste"

**Quantify Value:**
- Cost of overstocking: waste, storage
- Cost of understocking: lost sales, customer dissatisfaction
- Forecasting reduces both!

**Example Calculation:**
```
Without forecasting:
- Average inventory error: 100 items/day
- Waste cost: $500/day

With forecasting:
- Average inventory error: 50 items/day
- Waste cost: $250/day
- Savings: $250/day × 365 = $91,250/year
```

**ROI Story:**
- Implementation cost: X
- Annual savings: Y
- Payback period: X/Y months

### 5.2 Actionable Recommendations

**1. Inventory Management**

**Current Problem:**
- Ordering same amount every day
- Leads to waste on slow days
- Stockouts on busy days

**Recommendation:**
- Use daily forecasts to adjust orders
- Add safety stock = 1.5 × MAE
- Different orders for weekdays vs weekends

**Implementation:**
- Run model daily to get next-day forecast
- Automated ordering system
- Alert if forecast > threshold

**Expected Impact:**
- 20-30% reduction in waste
- 15-20% reduction in stockouts

---

**2. Staffing Optimization**

**Insights from Analysis:**
- Weekends 30% busier than weekdays
- Month-end 15% higher sales
- Predictable weekly patterns

**Recommendations:**
- Schedule 30% more staff on Fridays-Sundays
- Add part-time staff for peak days
- Reduce staffing on slowest days (e.g., Tuesday)

**Implementation:**
- Use 7-day forecast for next week's schedule
- Build shift templates based on day-of-week patterns
- Flexibility for predicted unusual days

**Expected Impact:**
- Better customer service (reduced wait times)
- Lower labor costs (right-sized staffing)
- Improved employee satisfaction (predictable schedules)

---

**3. Menu Optimization**

**Findings:**
- Top 20% of items drive 80% of revenue
- Some items never sell (always 0)
- High-price items have inconsistent sales

**Recommendations:**
- **Promote top sellers:** Feature on menu, marketing
- **Remove zero-sellers:** Free up kitchen capacity
- **Analyze slow-movers:** 
  - Price too high?
  - Poor placement on menu?
  - Quality issues?
- **Seasonal menu:** Align with sales patterns

**Implementation:**
- Quarterly menu review based on sales data
- A/B test menu changes
- Track impact on overall sales

---

**4. Marketing & Promotions**

**Strategic Timing:**
- Run promotions on predicted slow days
- Boost naturally busy days with targeted offers
- Seasonal campaigns aligned with forecasts

**Example:**
- Tuesday is slowest day
- Recommendation: "Tuesday Special - 20% off"
- Goal: Smooth demand across week

**Targeted Promotions:**
- Item-specific: Push slow-moving items
- Time-based: Happy hour during slow periods
- Location-specific: Focus on underperforming restaurants

---

**5. Financial Planning**

**Revenue Forecasting:**
- 30-day revenue forecast
- Budget planning
- Cash flow management

**Scenario Analysis:**
- Best case: +10% from forecast
- Expected case: Forecast
- Worst case: -10% from forecast
- Plan for all scenarios

**Investment Decisions:**
- Expansion timing based on growth trends
- Equipment purchases aligned with forecast volume
- Data-driven business case

### 5.3 Model Limitations and Future Improvements

**Current Limitations:**

**1. Aggregate-Only Forecasting:**
- Currently predicting total sales
- Not item-level or restaurant-level
- Less actionable for specific decisions

**2. Missing External Factors:**
- Weather (rain reduces sales?)
- Holidays (Christmas, Thanksgiving)
- Local events (concerts, sports games)
- Competitor actions
- Economic indicators

**3. No Price Elasticity:**
- Doesn't model impact of price changes
- Can't predict effect of discounts
- Assumes constant pricing

**4. Short History:**
- Only 3 years of data
- Limited for capturing rare events
- More data = better models

---

**Future Improvements:**

**Phase 2: Item-Level Forecasting**
- Build separate models for each item
- Or hierarchical forecasting
- More granular insights
- Better inventory management

**Phase 3: External Data Integration**
- Weather data from API
- Holiday calendar
- Event calendar (local)
- Foot traffic data

**Phase 4: Advanced Models**
- LSTM/RNN (deep learning for time series)
- Prophet (Facebook's forecasting tool)
- Ensemble of multiple models
- Bayesian methods for uncertainty quantification

**Phase 5: Real-Time System**
- Automated daily retraining
- Dashboard for stakeholders
- API for integration with ordering system
- Mobile app for managers
- Alerting for anomalies

---

**Quick Wins (Low Effort, High Impact):**

1. **Add holiday indicators** (0/1 flag)
   - Major holidays known in advance
   - Easy to implement
   - Likely significant impact

2. **Restaurant-specific models**
   - Filter data by restaurant
   - Train separate models
   - Captures location-specific patterns

3. **Weekend/Weekday separate models**
   - Different patterns
   - Two simpler models may outperform one complex model

4. **Automated reporting**
   - Daily email with forecast
   - Weekly performance review
   - Builds trust in system

---

## Overall Best Practices

### Time Series Golden Rules

**1. Never Shuffle Time Series Data**
- ALWAYS use temporal train/test split
- Past predicts future, not vice versa
- Violating this gives meaningless results

**2. Be Careful with Leakage**
- Don't use future information in features
- Rolling windows must only use past data
- Example bad practice: Using day-30 average when predicting day 15

**3. Start Simple, Then Complexify**
- Baseline → Random Forest → XGBoost → Deep Learning
- Each step should improve performance
- If not, previous model was sufficient

**4. Validate on Realistic Time Periods**
- Don't test on 1 week (could be unusual)
- Use at least 1-3 months
- Captures different patterns

**5. Monitor Model Drift**
- Models degrade over time
- Business patterns change
- Re-train regularly (monthly/quarterly)

### Code Organization

**Recommended Structure:**

```
1. Imports and Setup
2. Data Loading
3. Data Cleaning
4. Exploratory Data Analysis
5. Feature Engineering
6. Train/Test Split
7. Baseline Model
8. ML Models (Random Forest, XGBoost)
9. Model Comparison
10. Visualization and Interpretation
11. Business Insights
12. Conclusions
```

**Functions to Create:**
- `create_time_features(df)` - Add temporal features
- `create_lag_features(df, lags)` - Add lag features
- `create_rolling_features(df, windows)` - Add rolling stats
- `evaluate_model(y_true, y_pred)` - Calculate metrics
- `plot_predictions(dates, actual, predicted)` - Visualization

**Why Functions?**
- Reusable code
- Easier to test
- Cleaner notebooks
- Professional practice

### Documentation and Communication

**Throughout Your Notebook:**

**Markdown Cells for:**
- Section explanations
- Interpretation of results
- Business insights
- Decisions made and why

**Code Comments for:**
- Complex logic
- Non-obvious choices
- Parameter selections

**Visualizations Should:**
- Have clear titles
- Labeled axes with units
- Legends when needed
- Annotations for key points

**Final Summary Should Include:**
- Key findings from EDA
- Model performance comparison
- Best model and why
- Actionable recommendations
- Limitations and next steps

---

## Key Takeaways

### Technical Skills

**You will learn:**
- Time series data handling and preprocessing
- Feature engineering for temporal data
- Proper train/test splitting for time series
- Multiple forecasting approaches (statistical and ML)
- Model evaluation specific to time series
- Visualization of temporal patterns
- Interpretation of model predictions and errors

**Key Concepts:**
- Temporal dependency and autocorrelation
- Seasonality (daily, weekly, monthly)
- Trends and cycles
- Lag effects
- Moving averages and smoothing
- Feature importance in time series context

### Business Skills

**You will demonstrate:**
- Translating data patterns into business insights
- Quantifying business value of models
- Creating actionable recommendations
- Communicating technical results to non-technical stakeholders
- Understanding operational applications (inventory, staffing)
- Recognizing model limitations and risks
- Planning phased implementation

**Real-World Applications:**
- Inventory optimization
- Workforce planning
- Revenue forecasting
- Marketing campaign timing
- Financial budgeting
- Strategic decision support

### Critical Thinking Questions

**Throughout the project, ask yourself:**

**Data Quality:**
- Why are there so many zero sales records?
- Are there any suspicious patterns or outliers?
- How complete and accurate is the data?

**Modeling:**
- Why did I choose these lag periods?
- Which features matter most and why?
- Is my model overfitting or underfitting?
- Could I be leaking future information?

**Business:**
- How would a restaurant actually use these forecasts?
- What's the cost of being wrong?
- Is the model improvement worth the implementation cost?
- What could cause the model to fail in production?

**Ethics and Risks:**
- Could this lead to understaffing and poor service?
- What if the model is wrong during a critical period?
- How do we monitor and maintain model quality?
- What's the fallback if the system fails?

---

## Next Steps

1. **Load and explore the data** - Understand what you're working with
2. **Start with EDA** - Find patterns before modeling
3. **Create basic features first** - Time-based features
4. **Add lag and rolling features** - Capture temporal dependencies
5. **Build baseline model** - Establish minimum performance
6. **Try Random Forest** - Likely your best model
7. **Experiment with XGBoost** - If you want optimal performance
8. **Visualize extensively** - Understand model behavior
9. **Extract business insights** - Make it actionable
10. **Document everything** - Tell the complete story

**Remember:**
- Time series is different from regular ML - respect temporal order!
- Feature engineering is critical for good performance
- Interpretability matters - explain your model's decisions
- Business value > Technical sophistication
- Simple models that work > Complex models that don't

**You're building a real forecasting system that could:**
- Reduce food waste
- Improve customer service
- Optimize labor costs
- Increase profitability

That's impactful work! Good luck!