# Truck Allocation Forecast Model

## Background and Objective
The goal of this project is to build a model that predicts the number of trucks required for daily shipments in a logistics optimization context. Specifically, the objective is to predict the number of trucks needed per day, which helps in streamlining the logistics process and improving delivery efficiency.

## Logic and Formulas

### 1. Shipment Units Calculation
The number of shipment units is calculated using the following formulas:

#### 1.1 Single Shipment Units

$$S_{\text{single}} = U_d \times R_{\text{single}}$$

where:

- $U_d$ is the daily shipment volume (in units)
- $R_{\text{single}}$ is the single shipment ratio

#### 1.2 Multi Shipment Units

$$S_{\text{multi}} = \frac{U_d \times R_{\text{multi}}}{U_{\text{multi}}}$$

where:

- $U_{\text{multi}}$ is the average units per multi shipment
- $R_{\text{multi}}$ is the multi shipment ratio

### 2. Email and Box Shipment Ratios
The total shipment units are divided into email and box shipments, calculated based on the following ratios:

#### 2.1 Email Shipments

$$S_M = S_{\text{total}} \times S_{M\text{ratio}}$$

#### 2.2 Box Shipments

$$S_B = S_{\text{total}} \times S_{B\text{ratio}}$$

where:

- $S_M$ is the total email shipments
- $S_B$ is the total box shipments
- $S_{\text{total}}$ is the total shipment units
- $S_{M\text{ratio}}$ is the email shipment ratio
- $S_{B\text{ratio}}$ is the box shipment ratio

### 3. Truck Allocation Calculation
The total number of trucks required is calculated by dividing the email and box shipments by their respective cargo capacities, then summing the results and dividing by the truck capacity. The final truck allocation is calculated as follows:

$$\text{Total Trucks} = \left\lceil \frac{\frac{S_M}{S_{M\text{capacity}}} + \frac{S_B}{S_{B\text{capacity}}}}{\text{cargo\_per\_truck}} \right\rceil$$

where:

- $S_{M\text{capacity}} = 400$ is the email shipment capacity
- $S_{B\text{capacity}} = 75$ is the box shipment capacity
- $\text{cargo\_per\_truck} = 22$ is the cargo capacity per truck

### 4. Moving Average
A 7-day moving average of the truck numbers is added as a feature to improve the model's accuracy. The moving average of truck numbers is calculated as follows:

$$\text{Moving Average of Trucks} = \frac{1}{7}\sum_{i=t-6}^t T_i$$

where:

- $T_i$ is the truck number for day $i$
- $t$ is the current day

## Models Used

### 1. Linear Regression
Linear regression assumes a linear relationship between the explanatory variables and the target variable. While simple and interpretable, linear regression may struggle to capture complex nonlinear relationships in the data.

$$y = \beta_0 + \beta_1X_1 + \beta_2X_2 + \cdots + \beta_nX_n$$

where:

- $y$ is the target variable (truck number)
- $X_1, X_2, \ldots, X_n$ are the explanatory variables (daily shipment volume, shipment ratios, moving average, etc.)

### 2. Random Forest
Random Forest is an ensemble learning method that builds multiple decision trees and aggregates their predictions. It is capable of capturing complex nonlinear relationships between the features and the target variable.

The Random Forest model involves the following hyperparameters:

- $\text{max\_depth} = \text{None}$
- $\text{min\_samples\_split} = 2$
- $\text{min\_samples\_leaf} = 2$
- $\text{n\_estimators} = 50$

These parameters were optimized using GridSearch.

## Results
Linear Regression MAE: 0.59

Random Forest MAE (after hyperparameter tuning): 1.93

## Next Steps

### 1. Model Improvement
Further improvements to the Random Forest model by refining the data preprocessing steps or exploring other models (e.g., Gradient Boosting) to improve accuracy.

### 2. Data Collection
Use real-world data to assess the model's accuracy. Incorporating external data, such as weather forecasts or public holiday schedules, may improve prediction accuracy.

### 3. Model Evaluation
Evaluate the model using additional metrics such as RMSE and $R^2$, to gain a more comprehensive understanding of the model's performance.

## Explanation for the Models and Methods

### Linear Regression
Linear regression assumes a simple relationship between input features and the target variable. However, in real-world problems, relationships are often more complex and nonlinear. Linear regression provides a baseline to compare more complex models like Random Forest.

### Random Forest
Random Forest is an ensemble technique that helps reduce overfitting compared to a single decision tree by averaging the predictions of multiple trees. This makes it more robust and suitable for complex datasets with nonlinear relationships. The model's performance is further enhanced by hyperparameter tuning, which helps find the best settings for the trees.

### Moving Average
The moving average of the truck numbers over the past 7 days is used to capture trends and seasonal effects. This feature can help improve prediction accuracy by providing the model with information on recent trends.

## Why We Chose These Approaches
The linear regression model was chosen to provide a baseline comparison. Random Forest, with its ability to capture complex patterns in data, was selected as a more powerful model. Hyperparameter tuning was essential for optimizing the Random Forest model's performance. 

In [None]:
import math
import pandas as pd
import numpy as np
from sklearn.model_selection import train_test_split, GridSearchCV
from sklearn.ensemble import RandomForestRegressor
from sklearn.linear_model import LinearRegression
from sklearn.metrics import mean_absolute_error

# Create sample data
data = {
    "Date": pd.date_range(start="2024-04-01", periods=30, freq='D'),
    "U_d": np.random.randint(180000, 220000, size=30),
    "R_single": np.random.uniform(0.25, 0.35, size=30),
    "R_multi": np.random.uniform(0.65, 0.75, size=30),
    "U_multi": np.random.uniform(2.0, 2.2, size=30),
    "S_M_ratio": np.full(30, 0.4),
    "S_B_ratio": np.full(30, 0.6),
    "Weather": np.random.choice(["Sunny", "Rainy", "Cloudy"], size=30),
    "Holiday": np.random.choice([0, 1], size=30),
    "Sale_Flag": np.random.choice([0, 1], size=30)
}

df = pd.DataFrame(data)

# Encode categorical variables
df = pd.get_dummies(df, columns=["Weather"], drop_first=True)

# Calculate target variable (number of trucks needed)
def calculate_trucks(row, S_M_capacity=400, S_B_capacity=75, cargo_per_truck=22):
    S_single = row["U_d"] * row["R_single"]
    S_multi = (row["U_d"] * row["R_multi"]) / row["U_multi"]
    S_total = S_single + S_multi
    S_M = S_total * row["S_M_ratio"]
    S_B = S_total * row["S_B_ratio"]
    C_total = (S_M / S_M_capacity) + (S_B / S_B_capacity)
    return math.ceil(C_total / cargo_per_truck)

df["Total_Trucks"] = df.apply(calculate_trucks, axis=1)

# Add moving average (7-day average of truck numbers)
df["Moving_Avg_Trucks"] = df["Total_Trucks"].rolling(window=7, min_periods=1).mean()

# Split features and target variable
X = df.drop(columns=["Date", "Total_Trucks"])
y = df["Total_Trucks"]

# Split into training and test data
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Train and predict with linear regression model
lr_model = LinearRegression()
lr_model.fit(X_train, y_train)
y_pred_lr = lr_model.predict(X_test)
lr_mae = mean_absolute_error(y_test, y_pred_lr)

# Grid search for Random Forest
gr_param_grid = {
    'n_estimators': [50, 100, 200],
    'max_depth': [None, 10, 20],
    'min_samples_split': [2, 5, 10],
    'min_samples_leaf': [1, 2, 4]
}

gr_search = GridSearchCV(RandomForestRegressor(random_state=42), gr_param_grid, cv=3, scoring='neg_mean_absolute_error', n_jobs=-1)
gr_search.fit(X_train, y_train)

# Train and predict with best model
best_rf_model = gr_search.best_estimator_
y_pred_rf = best_rf_model.predict(X_test)
rf_mae = mean_absolute_error(y_test, y_pred_rf)

# Display result
print(f"Linear Regression MAE: {lr_mae}")
print(f"Random Forest MAE (Tuned): {rf_mae}")
print(f"Best Random Forest Parameters: {gr_search.best_params_}")

Linear Regression MAE: 0.28616298755118424
Random Forest MAE (Tuned): 2.061666666666666
Best Random Forest Parameters: {'max_depth': None, 'min_samples_leaf': 1, 'min_samples_split': 2, 'n_estimators': 100}


# Truck Allocation Forecast Model

## Background and Objective
The goal of this project is to build a model that predicts the number of trucks required for daily shipments in a logistics optimization context. Specifically, the objective is to predict the number of trucks needed per day, which helps in streamlining the logistics process and improving delivery efficiency.

## Logic and Formulas

### 1. Shipment Units Calculation
The number of shipment units is calculated using the following formulas:

#### 1.1 Single Shipment Units

$$S_{\text{single}} = U_d \times R_{\text{single}}$$

where:

- $U_d$ is the daily shipment volume (in units)
- $R_{\text{single}}$ is the single shipment ratio

#### 1.2 Multi Shipment Units

$$S_{\text{multi}} = \frac{U_d \times R_{\text{multi}}}{U_{\text{multi}}}$$

where:

- $U_{\text{multi}}$ is the average units per multi shipment
- $R_{\text{multi}}$ is the multi shipment ratio

### 2. Email and Box Shipment Ratios
The total shipment units are divided into email and box shipments, calculated based on the following ratios:

#### 2.1 Email Shipments

$$S_M = S_{\text{total}} \times S_{M\text{ratio}}$$

#### 2.2 Box Shipments

$$S_B = S_{\text{total}} \times S_{B\text{ratio}}$$

where:

- $S_M$ is the total email shipments
- $S_B$ is the total box shipments
- $S_{\text{total}}$ is the total shipment units
- $S_{M\text{ratio}}$ is the email shipment ratio
- $S_{B\text{ratio}}$ is the box shipment ratio

### 3. Truck Allocation Calculation
The total number of trucks required is calculated by dividing the email and box shipments by their respective cargo capacities, then summing the results and dividing by the truck capacity. The final truck allocation is calculated as follows:

$$\text{Total Trucks} = \left\lceil \frac{\frac{S_M}{S_{M\text{capacity}}} + \frac{S_B}{S_{B\text{capacity}}}}{\text{cargo\_per\_truck}} \right\rceil$$

where:

- $S_{M\text{capacity}} = 400$ is the email shipment capacity
- $S_{B\text{capacity}} = 75$ is the box shipment capacity
- $\text{cargo\_per\_truck} = 22$ is the cargo capacity per truck

### 4. Moving Average
A 7-day moving average of the truck numbers is added as a feature to improve the model's accuracy. The moving average of truck numbers is calculated as follows:

$$\text{Moving Average of Trucks} = \frac{1}{7}\sum_{i=t-6}^t T_i$$

where:

- $T_i$ is the truck number for day $i$
- $t$ is the current day

## Models Used

### 1. Linear Regression
Linear regression assumes a linear relationship between the explanatory variables and the target variable. While simple and interpretable, linear regression may struggle to capture complex nonlinear relationships in the data.

$$y = \beta_0 + \beta_1X_1 + \beta_2X_2 + \cdots + \beta_nX_n$$

where:

- $y$ is the target variable (truck number)
- $X_1, X_2, \ldots, X_n$ are the explanatory variables (daily shipment volume, shipment ratios, moving average, etc.)

### 2. Random Forest
Random Forest is an ensemble learning method that builds multiple decision trees and aggregates their predictions. It is capable of capturing complex nonlinear relationships between the features and the target variable.

The Random Forest model involves the following hyperparameters:

- $\text{max\_depth} = \text{None}$
- $\text{min\_samples\_split} = 2$
- $\text{min\_samples\_leaf} = 2$
- $\text{n\_estimators} = 50$

These parameters were optimized using GridSearch.

## Results
Linear Regression MAE: 0.59

Random Forest MAE (after hyperparameter tuning): 1.93

## Next Steps

### 1. Model Improvement
Further improvements to the Random Forest model by refining the data preprocessing steps or exploring other models (e.g., Gradient Boosting) to improve accuracy.

### 2. Data Collection
Use real-world data to assess the model's accuracy. Incorporating external data, such as weather forecasts or public holiday schedules, may improve prediction accuracy.

### 3. Model Evaluation
Evaluate the model using additional metrics such as RMSE and $R^2$, to gain a more comprehensive understanding of the model's performance.

## Explanation for the Models and Methods

### Linear Regression
Linear regression assumes a simple relationship between input features and the target variable. However, in real-world problems, relationships are often more complex and nonlinear. Linear regression provides a baseline to compare more complex models like Random Forest.

### Random Forest
Random Forest is an ensemble technique that helps reduce overfitting compared to a single decision tree by averaging the predictions of multiple trees. This makes it more robust and suitable for complex datasets with nonlinear relationships. The model's performance is further enhanced by hyperparameter tuning, which helps find the best settings for the trees.

### Moving Average
The moving average of the truck numbers over the past 7 days is used to capture trends and seasonal effects. This feature can help improve prediction accuracy by providing the model with information on recent trends.

## Why We Chose These Approaches
The linear regression model was chosen to provide a baseline comparison. Random Forest, with its ability to capture complex patterns in data, was selected as a more powerful model. Hyperparameter tuning was essential for optimizing the Random Forest model's performance. 

In [None]:
import math
import pandas as pd
import numpy as np
from sklearn.model_selection import train_test_split, GridSearchCV
from sklearn.ensemble import RandomForestRegressor
from sklearn.linear_model import LinearRegression
from sklearn.metrics import mean_absolute_error

# Create sample data
data = {
    "Date": pd.date_range(start="2024-04-01", periods=30, freq='D'),
    "U_d": np.random.randint(180000, 220000, size=30),
    "R_single": np.random.uniform(0.25, 0.35, size=30),
    "R_multi": np.random.uniform(0.65, 0.75, size=30),
    "U_multi": np.random.uniform(2.0, 2.2, size=30),
    "S_M_ratio": np.full(30, 0.4),
    "S_B_ratio": np.full(30, 0.6),
    "Weather": np.random.choice(["Sunny", "Rainy", "Cloudy"], size=30),
    "Holiday": np.random.choice([0, 1], size=30),
    "Sale_Flag": np.random.choice([0, 1], size=30)
}

df = pd.DataFrame(data)

# Encode categorical variables
df = pd.get_dummies(df, columns=["Weather"], drop_first=True)

# Calculate target variable (number of trucks needed)
def calculate_trucks(row, S_M_capacity=400, S_B_capacity=75, cargo_per_truck=22):
    S_single = row["U_d"] * row["R_single"]
    S_multi = (row["U_d"] * row["R_multi"]) / row["U_multi"]
    S_total = S_single + S_multi
    S_M = S_total * row["S_M_ratio"]
    S_B = S_total * row["S_B_ratio"]
    C_total = (S_M / S_M_capacity) + (S_B / S_B_capacity)
    return math.ceil(C_total / cargo_per_truck)

df["Total_Trucks"] = df.apply(calculate_trucks, axis=1)

# 移動平均（過去7日間の平均トラック台数）の追加
df["Moving_Avg_Trucks"] = df["Total_Trucks"].rolling(window=7, min_periods=1).mean()

# 説明変数と目的変数の分割
X = df.drop(columns=["Date", "Total_Trucks"])
y = df["Total_Trucks"]

# 学習データとテストデータに分割
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# 線形回帰モデルの学習と予測
lr_model = LinearRegression()
lr_model.fit(X_train, y_train)
y_pred_lr = lr_model.predict(X_test)
lr_mae = mean_absolute_error(y_test, y_pred_lr)

# ランダムフォレストのグリッドサーチ
gr_param_grid = {
    'n_estimators': [50, 100, 200],
    'max_depth': [None, 10, 20],
    'min_samples_split': [2, 5, 10],
    'min_samples_leaf': [1, 2, 4]
}

gr_search = GridSearchCV(RandomForestRegressor(random_state=42), gr_param_grid, cv=3, scoring='neg_mean_absolute_error', n_jobs=-1)
gr_search.fit(X_train, y_train)

# 最適なモデルで学習と予測
best_rf_model = gr_search.best_estimator_
y_pred_rf = best_rf_model.predict(X_test)
rf_mae = mean_absolute_error(y_test, y_pred_rf)

# 結果の表示
print(f"Linear Regression MAE: {lr_mae}")
print(f"Random Forest MAE (Tuned): {rf_mae}")
print(f"Best Random Forest Parameters: {gr_search.best_params_}")

Linear Regression MAE: 0.28616298755118424
Random Forest MAE (Tuned): 2.061666666666666
Best Random Forest Parameters: {'max_depth': None, 'min_samples_leaf': 1, 'min_samples_split': 2, 'n_estimators': 100}
