## 1. Introduction & Project Overview
**Objective:**
The goal of this project is to predict the annual coffee production of the world's top 5 coffee-producing nations (Brazil, Vietnam, Colombia, Indonesia, Ethiopia). Accurate production forecasting is critical for global supply chain optimization and price stability.

**Dataset:**
We utilize the **USDA Production, Supply and Distribution (PSD)** dataset, focusing on the `psd_coffee.csv` file. This dataset provides a comprehensive history of coffee market attributes (Production, Consumption, Trade, Stocks) from 1960 to the present.

**Methodology:**
This project treats the problem as a **Time-Series Regression** task.
- **Evaluation Metric:** Root Mean Squared Error (RMSE) to quantify prediction error in the same units as production (1000 60kg bags).
- **Modeling Strategy:** We benchmark four algorithms (Ridge, Lasso, Random Forest, XGBoost) using a strictly temporal cross-validation strategy (`TimeSeriesSplit`) to prevent data leakage.

In [1]:
# ==========================================
# 1. Library Imports
# ==========================================

# Standard Data Science Stack
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
import os
import warnings

# Machine Learning: Preprocessing & Selection
from sklearn.model_selection import TimeSeriesSplit, GridSearchCV
from sklearn.preprocessing import StandardScaler, OneHotEncoder
from sklearn.pipeline import Pipeline
from sklearn.compose import ColumnTransformer
from sklearn.impute import SimpleImputer

# Machine Learning: Metrics
from sklearn.metrics import mean_squared_error, r2_score

# Machine Learning: Models (The 4 Selected Algorithms)
from sklearn.linear_model import Ridge, Lasso
from sklearn.ensemble import RandomForestRegressor
import xgboost as xgb  # The Non-linear Challenger

# Interpretability
from sklearn.inspection import permutation_importance
import shap

# Verify installation of critical packages
print(f"Pandas version: {pd.__version__}")
print(f"XGBoost version: {xgb.__version__}")

Pandas version: 2.3.3
XGBoost version: 3.1.2
