## **Food Demand Forecasting**
### **Background information:**
There is a meal delivery company that operates in multiple cities. It has various fulfillment centers in these cities for dispatching food orders to their customers. The client needs help with demand forecasting for upcoming weeks for these centers to plan for the raw materials stocking accordingly. The replenishment of the majority of raw materials is done on a weekly basis and since the raw material is perishable, the procurement planning is of utmost importance. Secondly, staffing of the centers is also one area wherein accurate demand forecasts are really helpful. Given the following dataset, the task is to predict the demand for the next 10 weeks for the center-food combinations in the test set.

### **Dataset source:**
https://www.kaggle.com/datasets/kannanaikkal/food-demand-forecasting


In [3]:
# Import relevant python libraries
import os
import numpy as np
import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt

In [4]:
# Loading the training datset (created during data wrangling section) into pandas dataframe
train_df = pd.read_csv('drive/MyDrive/DSC Capstone 2/train_revised.csv')

# Displaying the loaded dataset
train_df.head()

Unnamed: 0,id,week,center_id,meal_id,checkout_price,base_price,emailer_for_promotion,homepage_featured,num_orders,city_code,region_code,center_type,op_area,category,cuisine
0,1379560,1,55,1885,136.83,152.29,0,0,177,647,56,TYPE_C,2.0,Beverages,Thai
1,1466964,1,55,1993,136.83,135.83,0,0,270,647,56,TYPE_C,2.0,Beverages,Thai
2,1346989,1,55,2539,134.86,135.86,0,0,189,647,56,TYPE_C,2.0,Beverages,Thai
3,1338232,1,55,2139,339.5,437.53,0,0,54,647,56,TYPE_C,2.0,Beverages,Indian
4,1448490,1,55,2631,243.5,242.5,0,0,40,647,56,TYPE_C,2.0,Beverages,Indian


In [5]:
train_df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 456548 entries, 0 to 456547
Data columns (total 15 columns):
 #   Column                 Non-Null Count   Dtype  
---  ------                 --------------   -----  
 0   id                     456548 non-null  int64  
 1   week                   456548 non-null  int64  
 2   center_id              456548 non-null  int64  
 3   meal_id                456548 non-null  int64  
 4   checkout_price         456548 non-null  float64
 5   base_price             456548 non-null  float64
 6   emailer_for_promotion  456548 non-null  int64  
 7   homepage_featured      456548 non-null  int64  
 8   num_orders             456548 non-null  int64  
 9   city_code              456548 non-null  int64  
 10  region_code            456548 non-null  int64  
 11  center_type            456548 non-null  object 
 12  op_area                456548 non-null  float64
 13  category               456548 non-null  object 
 14  cuisine                456548 non-nu

### **1. One hot encoding for features, i.e. center_type, category and cuisine**

In [7]:
# Selecting categorical columns
categorical_cols = ['center_type', 'category', 'cuisine']

# Perform one-hot encoding
data_df_encoded = pd.get_dummies(train_df, columns=categorical_cols)

# Display the first few rows of the one-hot encoded DataFrame
print(data_df_encoded.head())
print('\n', data_df_encoded.info())

        id  week  center_id  meal_id  checkout_price  base_price  \
0  1379560     1         55     1885          136.83      152.29   
1  1466964     1         55     1993          136.83      135.83   
2  1346989     1         55     2539          134.86      135.86   
3  1338232     1         55     2139          339.50      437.53   
4  1448490     1         55     2631          243.50      242.50   

   emailer_for_promotion  homepage_featured  num_orders  city_code  ...  \
0                      0                  0         177        647  ...   
1                      0                  0         270        647  ...   
2                      0                  0         189        647  ...   
3                      0                  0          54        647  ...   
4                      0                  0          40        647  ...   

   category_Rice Bowl  category_Salad  category_Sandwich  category_Seafood  \
0                   0               0                  0      

### **2. Splitting the encoded dataset into testing and training splits**

In [8]:
from sklearn.model_selection import train_test_split

# Separate features (X) and target variable (Y)
X = data_df_encoded.drop('num_orders', axis=1)
Y = data_df_encoded['num_orders']

# Splitting the dataset into training and testing sets (80% train, 20% test)
X_train, X_test, Y_train, Y_test = train_test_split(X, Y, test_size=0.2, random_state=42)

# Display the shapes of the resulting sets
print("X_train shape:", X_train.shape)
print("Y_train shape:", Y_train.shape)
print("X_test shape:", X_test.shape)
print("Y_test shape:", Y_test.shape)

X_train shape: (365238, 32)
Y_train shape: (365238,)
X_test shape: (91310, 32)
Y_test shape: (91310,)
