Module 01: Exploratory Data Analysis for Demand & Inventory

This notebook performs exploratory data analysis (EDA) for Module 01 of the **"Intelligent System for Supply Chain Management"** project.  

The primary goal is to optimize inventory and purchasing management, with a target of **reducing overstocking by 20%** within six months.

---

## Data Generation
### Import Necessary Libraries

In [1]:
import pandas as pd
import numpy as np
import os
import json
import plotly.express as px
import plotly.io as pio
import ast

from plotly.subplots import make_subplots

from smart_supply_chain_ai.utils import create_data_functions

import warnings
warnings.filterwarnings('ignore')

# Set up display options and plotting template
pd.set_option('display.max_columns', None)
pio.templates.default = "plotly_white"
px.defaults.width = 800
px.defaults.height = 600

### Paths

In [2]:
# Define data paths
raw_data_path = os.path.join('../data', 'raw')

json_path = os.path.join('../src','smart_supply_chain_ai' , 'utils/')

## Create Synthetic Dates

Features of the adjusted data:
Specific categories: Using only the categories present in your data

Realistic distribution: Based on observed frequency in the provided data

Realistic parameters per category:

ü•¶ **Produce**
- **Lead Time:** 1‚Äì3 days (locally sourced), 5‚Äì10 days (imported)
- **Shelf Life:** 3‚Äì10 days (most fresh items), up to 2 weeks for hardy vegetables like carrots or potatoes

üåæ **Grains and Flours**
- **Lead Time:** 3‚Äì7 days (domestic), 10‚Äì15 days (imported specialty grains)
- **Shelf Life:** 6 months to 1 year (dry, sealed), up to 2 years for rice and flour stored properly

üßÄ **Dairy and Cold Cuts**
- **Lead Time:** 2‚Äì5 days (regional suppliers), 7‚Äì10 days (specialty cheeses)
- **Shelf Life:**
  - Milk & cream: 7‚Äì14 days refrigerated
  - Yogurt & soft cheeses: 2‚Äì3 weeks
  - Hard cheeses: 1‚Äì3 months
  - Cold cuts: 1‚Äì2 weeks sealed

‚òï **Beverages**
- **Lead Time:** 2‚Äì7 days (coffee/tea distributors)
- **Shelf Life:**
  - Tea: 1‚Äì2 years (dry)
  - Coffee beans: 6‚Äì12 months (sealed), 1‚Äì2 weeks after grinding
  - Brewed drinks: 1‚Äì3 days refrigerated

ü•ö **Eggs and Poultry**
- **Lead Time:** 1‚Äì3 days (local farms), 5‚Äì7 days (wholesale)
- **Shelf Life:**
  - Eggs: 3‚Äì5 weeks refrigerated
  - Fresh poultry: 1‚Äì2 days raw, 3‚Äì4 days cooked

üêü **Meats and Fish**
- **Lead Time:** 1‚Äì5 days (fresh), 7‚Äì10 days (frozen or imported)
- **Shelf Life:**
  - Fresh fish: 1‚Äì2 days
  - Frozen fish: 3‚Äì6 months
  - Cured fish (e.g., sardines): up to 1 year

üõ¢Ô∏è **Oils and Fats**
- **Lead Time:** 3‚Äì7 days (bulk suppliers)
- **Shelf Life:**
  - Vegetable oils: 6‚Äì12 months
  - Butter: 1 month refrigerated, 6 months frozen
  - Coconut oil: up to 2 years

üç¨ **Sugars and Sweets**
- **Lead Time:** 2‚Äì5 days
- **Shelf Life:**
  - Sugars: indefinite if dry and sealed
  - Dried fruits (e.g., plum): 6‚Äì12 months

üç™ **Miscellaneous and Biscuits**
- **Lead Time:** 2‚Äì6 days
- **Shelf Life:**
  - Biscuits: 3‚Äì6 months sealed


Seasonal patterns:

- Fruits/vegetables with reduced shelf life in summer

- Dairy with shorter lead time in winter

Realistic temporal distribution:

- 80% of deliveries on weekdays

Controlled outliers: Only 3% of data with unusual situations

These synthetic data preserve the specific characteristics of the categories in your original dataset, with realistic temporal relationships for supply chain analysis.

In [3]:
# List of JSON filenames (without extension) to be loaded
arch_json = ['products','products_categories', 'suppliers']

# Dictionary to store the loaded JSON content
store_catalog = {}

# Loop through each filename, build the full path, and load the JSON data
for name in arch_json:
    file_path = os.path.join(json_path, f"{name}.json")  # Construct full file path
    with open(file_path, "r", encoding="utf-8") as f:     # Open the JSON file
        store_catalog[name] = json.load(f)                        # Load and store the data under its name

# Product catalog information

In [4]:
# Create a DataFrame of products with product names as a column
products = pd.DataFrame.from_dict(store_catalog['products']).T.reset_index().rename(columns={'index': 'product'})


In [5]:
# Replace product with new IDs
products['product_id'] = create_data_functions.create_IDs(products.shape[0], suffix='P')

# Supplier catalog and distribution details

In [6]:
# Create a DataFrame of suppliers with supplier names as a column
suppliers = pd.DataFrame.from_dict(store_catalog['suppliers']).T.reset_index().rename(columns={'index': 'supplier'})

In [7]:
# Insert supplier IDs as the second column
suppliers.insert(1, 'supplier_id', create_data_functions.create_IDs(suppliers.shape[0], suffix='S'))

In [8]:
# Remove 'category' and 'subcategories' columns from the suppliers DataFrame
suppliers.drop(columns=['category', 'subcategories'], inplace=True)


In [9]:
# Split each supplier's product list into separate rows and reset the index
suppliers = suppliers.explode('products').reset_index(drop=True)


# Merge product and supplier tables to consolidate supply chain information

In [10]:
# Merge product and supplier data on matching product names, then drop duplicate 'products' column from suppliers
supply_df = pd.merge(products, suppliers, left_on='product', right_on='products').drop(columns='products')


In [11]:
supply_df

Unnamed: 0,product,product_id,category,sub_category,shelf_life_days,min_stock,max_stock,seasonality,storage_recommendation,unit_of_measurement,barcode_ean,reorder_point,supplier,supplier_id,distance_km
0,Strawberries,1500394|P,Fresh Foods,Fruits,5,10,25,"[July, August, September, October, November]",Refrigerated,unit,8712345000018,10,FreshHarvest Ltd.,1638531|S,84
1,Strawberries,1500394|P,Fresh Foods,Fruits,5,10,25,"[July, August, September, October, November]",Refrigerated,unit,8712345000018,10,PrimeProduce,1904179|S,238
2,Strawberries,1500394|P,Fresh Foods,Fruits,5,10,25,"[July, August, September, October, November]",Refrigerated,unit,8712345000018,10,AgroPrime Foods,1824201|S,101
3,Spinach,1113533|P,Fresh Foods,Leafy Greens,5,10,25,[],Refrigerated,bunch,8712345000025,8,GreenFields Co.,1126591|S,127
4,Spinach,1113533|P,Fresh Foods,Leafy Greens,5,10,25,[],Refrigerated,bunch,8712345000025,8,UrbanFarmers,1178653|S,95
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
178,Coconut Sugar,1085517|P,Oils & Condiments,Condiments,9999,100,250,[],"Cool, dry place in an airtight container",kg,8712345001114,80,GlobalFoods,1887101|S,1450
179,Coconut Sugar,1085517|P,Oils & Condiments,Condiments,9999,100,250,[],"Cool, dry place in an airtight container",kg,8712345001114,80,North Brazil Distributor,1745532|S,1943
180,Oatmeal Biscuit,1711392|P,Breads & Biscuits,Biscuits,240,100,250,[],"Cool, dry place in an airtight container",box,8712345001121,100,Sunrise Traders,1573694|S,1890
181,Butter Biscuit,1982588|P,Breads & Biscuits,Biscuits,240,100,250,[],"Cool, dry place in an airtight container",box,8712345001138,100,Plain Distributor,1112920|S,1254


In [None]:
# Extract the unique product categories from the product DataFrame
categories = products_df.category.unique().tolist()

# Define realistic probability distribution for each category based on supermarket supply chain dynamics
categories_probabilities = [0.20, 0.10, 0.10, 0.10, 0.08, 0.08, 0.07, 0.12, 0.15]

# List of supplier names representing a diverse pool of distributors and producers
suppliers = suppliers_df['supplier']

# Assign each supplier a category based on the defined probabilities using the custom function
suppliers_cat = create_data_functions.create_supplier_cat(
    categories,
    categories_prob=categories_probabilities,
    supplier_pool=suppliers
)


In [None]:
# Convert the supplier-category dictionary into a DataFrame and reset the index to expose supplier names
suppliers_cat_df = pd.DataFrame.from_dict(suppliers_cat, orient='index', columns=['category']).reset_index()

# Rename the default index column to 'supplier' for clarity
suppliers_cat_df.rename(columns={'index': 'supplier'}, inplace=True)

# Add supplier details to the category DataFrame
suppliers_df = pd.merge(suppliers_c# Suppliers tablesat_df, suppliers_df, on='supplier', how='left')

# Generate a unique supplier ID for each supplier using a custom function, with 'S' as the suffix
suppliers_df['supplier_id'] = create_data_functions.create_IDs(suppliers_df.shape[0], suffix='S')


In [None]:
# Generate enriched supplier DataFrame by linking suppliers to products and categories,
# using storage, pricing, and product data as input
suppliers_info = create_data_functions.create_suppliers(
    storage_df=storage_df,
    cost_price_df=cost_price_df,
    products_df=products_df,
    suppliers_df=suppliers_df
)


In [None]:
# Merge supplier info into the main DataFrame, remove 'category' column, and drop rows with missing values
suppliers_df = pd.merge(suppliers_df, suppliers_info, on='supplier_id', how='left').drop(columns='category')
# .dropna()


# Connecting Information

In [None]:
# Merge product and supplier data on 'product_id', keeping all products even if they have no matching supplier
base_df = pd.merge(products_df, suppliers_df, on='product_id')


In [None]:
base_df.head(3)

In [None]:
# Convert the 'product_mean_shelf_life' dictionary from store_catalog into a DataFrame,
# reset the index to turn product names into a column, and rename columns for clarity.
shelf_life = pd.DataFrame.from_dict(store_catalog['product_mean_shelf_life'], orient='index')\
    .reset_index().rename(columns={'index': 'product', 0: 'shelf_life'})


In [None]:
# Merge shelf life data into the base_df DataFrame using 'product' as the key,
# ensuring all products are retained even if shelf life info is missing.
base_df = pd.merge(base_df, shelf_life, on='product', how='left')


In [None]:
# Merge cost and price data into the base_df DataFrame using 'product' as the key,
# keeping all products even if cost/price info is missing.
base_df = pd.merge(base_df, cost_price_df, on='product', how='left')

# Remove 'price' column from the DataFrame.
base_df.drop(columns=['price'], inplace=True)



In [None]:
base_df

In [None]:
# Function to calculate the suggested selling price
def calculate_selling_price(product):
    """
        Calculate the suggested selling price for a product based on its supply cost and category-specific rates.

        Parameters:
        ----------
        product : object
            An object representing a product, expected to have the attributes:
            - supply_pricece (float): The purchase cost of the product.
            - category (str): The product's category, used to look up rates.

        Returns:
        -------
        float
            The suggested selling price, calculated using logistics, loss, and markup rates
            specific to the product's category.

        Calculation:
        -----------
        - Actual Unit Cost = supply_pricece * (1 + logistics_rate) / (1 - loss_rate)
        - Suggested Price = Actual Unit Cost * markup

        Reference Tables:
        ----------------
        - logistics_table: % increase due to logistics per category.
        - loss_table: % expected loss per category.
        - markup_table: multiplier to determine final selling price per category.
    """
    
    # Retrieve purchase cost and category-specific rates from reference tables
    purchase_cost = product.supply_price
    category = product.category
    logistics_rate = logistics_table[category]
    loss_rate = loss_table[category]
    markup = markup_table[category]

    # Calculate the Actual Unit Cost
    actual_unit_cost = purchase_cost * (1 + logistics_rate) / (1 - loss_rate)

    # Calculate the Suggested Selling Price
    suggested_price = actual_unit_cost * markup

    return suggested_price



# Reference tables (implemented as Python dictionaries)
logistics_table = {
    'Produce': 0.07,
    'Meats and Fish': 0.06,
    'Dairy and Cold Cuts': 0.06,
    'Grains and Flours': 0.01,
    'Beverages': 0.01,
    'Oils and Fats': 0.01,
    'Eggs and Poultry': 0.05,
    'Sugars and Sweets': 0.01,
    'Miscellaneous and Biscuits': 0.01
}

loss_table = {
    'Produce': 0.0610,
    'Meats and Fish': 0.0375,
    'Grains and Flours': 0.0153,
    'Beverages': 0.0153,
    'Oils and Fats': 0.0153,
    'Dairy and Cold Cuts': 0.01,  # Assumed value for example
    'Eggs and Poultry': 0.01,     # Assumed value for example
    'Sugars and Sweets': 0.0153,
    'Miscellaneous and Biscuits': 0.0153
}

markup_table = {
    'Produce': 2.50,
    'Meats and Fish': 1.43,
    'Dairy and Cold Cuts': 1.39,
    'Grains and Flours': 1.25,
    'Beverages': 1.25,
    'Oils and Fats': 1.25,
    'Eggs and Poultry': 1.33,
    'Sugars and Sweets': 1.54,
    'Miscellaneous and Biscuits': 1.43
}


In [None]:
# Calculate and round the suggested sell price for each row using the calculate_selling_price function
base_df['sell_price'] = round(base_df.apply(calculate_selling_price, axis=1))

In [None]:
base_df

In [None]:
"stock_quantity"
"reorder_level"
"reorder_quantity"
"date_received"
"last_order_date"
"expiration_date"
"sales_volume"
"status"
"is_weekend"
"is_holiday"
"economic_index"
"weather_impact_score"
"promotion_active"
"goods_received_today"
"goods_received_quantity"

In [None]:
# Ponto de Reposi√ß√£o: Adicione um campo para reorder_point (geralmente 20-30% do stock m√°ximo)

In [None]:
def generate_stock_quantity(product_name):
    # L√≥gica baseada no tipo de produto
    if any(word in product_name.lower() for word in ['berry', 'spinach', 'lettuce', 'kale']):
        return random.randint(80, 200)
    elif any(word in product_name.lower() for word in ['milk', 'yogurt', 'cream', 'cheese']):
        return random.randint(150, 400)
    elif any(word in product_name.lower() for word in ['rice', 'flour', 'grain']):
        return random.randint(500, 1500)
    elif any(word in product_name.lower() for word in ['oil', 'sugar']):
        return random.randint(400, 900)
    elif any(word in product_name.lower() for word in ['bread', 'biscuit']):
        return random.randint(200, 450)
    elif any(word in product_name.lower() for word in ['egg', 'fish']):
        return random.randint(100, 300)
    else:
        # Valor padr√£o para produtos n√£o categorizados
        return random.randint(100, 500)

# Aplicar a fun√ß√£o para todos os produtos
products = ['Strawberries', 'Spinach', 'Cabbage', 'Mushrooms', 'Cucumber', 
            'Zucchini', 'Mango', 'Lemon', 'Kiwi', 'Orange', 'Carrot', 
            'Broccoli', 'Bell Pepper', 'Potato', 'Peas', 'Lettuce', 'Coconut', 
            'Banana', 'Pomegranate', 'Peach', 'Cauliflower', 'Watermelon', 
            'Grapes', 'Apricot', 'Papaya', 'Apple', 'Tomato', 'Sweet Potato', 
            'Cherry', 'Lime', 'Kale', 'Asparagus', 'Green Beans', 'Onion', 
            'Garlic', 'Sushi Rice', 'Black Rice', 'Long Grain Rice', 
            'All-Purpose Flour', 'Rye Bread', 'Bread Flour', 'Sourdough Bread', 
            'Whole Wheat Flour', 'Basmati Rice', 'Brown Rice', 'Jasmine Rice', 
            'Rice Flour', 'Wild Rice', 'Short Grain Rice', 'White Rice', 
            'Whole Wheat Bread', 'White Bread', 'Arborio Rice', 
            'Multigrain Bread', 'Almond Flour', 'Greek Yogurt', 'Feta Cheese', 
            'Swiss Cheese', 'Parmesan Cheese', 'Ricotta Cheese', 
            'Mozzarella Cheese', 'Heavy Cream', 'Cream', 'Whipped Cream', 
            'Cottage Cheese', 'Milk', 'Gouda Cheese', 'Buttermilk', 
            'Sour Cream', 'Yogurt', 'Evaporated Milk', 'Cheese', 
            'Arabica Coffee', 'Herbal Tea', 'Black Coffee', 'Black Tea', 
            'White Tea', 'Green Tea', 'Green Coffee', 'Robusta Coffee', 
            'Egg (Goose)', 'Egg (Duck)', 'Egg (Quail)', 'Egg (Turkey)', 
            'Egg (Chicken)', 'Trout', 'Haddock', 'Sardines', 'Anchovies', 
            'Mackerel', 'Halibut', 'Salmon', 'Tilapia', 'Cod', 'Tuna', 
            'Corn Oil', 'Olive Oil', 'Peanut Oil', 'Palm Oil', 'Avocado Oil', 
            'Canola Oil', 'Sesame Oil', 'Sunflower Oil', 'Coconut Oil', 
            'Vegetable Oil', 'Butter', 'White Sugar', 'Raw Sugar', 
            'Powdered Sugar', 'Coconut Sugar', 'Plum', 'Oatmeal Biscuit', 
            'Butter Biscuit', 'Chocolate Biscuit', 'Digestive Biscuit', 
            'Vanilla Biscuit']

stock_data = []
for product in products:
    stock_quantity = generate_stock_quantity(product)
    stock_data.append({'product_name': product, 'stock_quantity': stock_quantity})

# Criar DataFrame
df = pd.DataFrame(stock_data)
print(df)

In [None]:
len(products)

In [None]:
def calculate_reorder_quantity(product):
    # Baseado na demanda semanal, lead time e shelf life
    base_quantity = product['weekly_demand_avg'] * (product['lead_time_days'] / 7 + 1)
    
    # Ajuste para perecibilidade
    if product['shelf_life_days'] < 14:
        # Produtos muito perec√≠veis - pedidos menores e mais frequentes
        reorder_qty = base_quantity * 0.7
    elif product['shelf_life_days'] < 30:
        # Produtos moderadamente perec√≠veis
        reorder_qty = base_quantity * 1.0
    else:
        # Produtos n√£o perec√≠veis - pedidos maiores
        reorder_qty = base_quantity * 1.5
    
    return round(reorder_qty / 5) * 5  # Arredonda para m√∫ltiplos de 5

In [None]:
# Number of rows in the dataset (i.e., total number of records)
n_rows = 300 

# Start date for the time range used in the analysis or simulation
start_date = '2023-09-01'  

# End date for the time range used in the analysis or simulation
end_date = '2025-09-02'    

In [None]:
# Convert start date to Unix timestamp (in seconds)
start_ts = pd.Timestamp(start_date).value // 10**9

# Convert end date to Unix timestamp (in seconds)
end_ts = pd.Timestamp(end_date).value // 10**9


In [None]:
# Generate received dates (with more realistic distribution ‚Äì more deliveries on weekdays)
date_received_ts = np.zeros(n_rows, dtype=np.int64)

for i in range(n_rows):
    # 80% chance of being a weekday (Monday to Friday)
    if np.random.random() < 0.8:
        # Weekday: normal distribution centered around Wednesday
        day_offset = int(np.random.normal(2, 1.5))  # 0=Mon, 1=Tue, 2=Wed, 3=Thu, 4=Fri
        day_offset = max(0, min(4, day_offset))  # Clamp between 0 and 4
    else:
        # Weekend: Saturday or Sunday
        day_offset = np.random.choice([5, 6])
    
    # Select a random week in the year
    week_offset = np.random.randint(0, 52) * 7
    base_date_ts = start_ts + (week_offset + day_offset) * 86400
    
    # Add hour variation (deliveries usually in the morning)
    hour = int(np.random.normal(10, 2))  # Mean 10am, standard deviation 2h
    hour = max(6, min(18, hour))  # Clamp between 6am and 6pm
    
    date_received_ts[i] = base_date_ts + hour * 3600


In [None]:
# Initialize array to store last order timestamps for each product
last_order_ts = np.zeros(n_rows, dtype=np.int64)

# Initialize array to store expiration timestamps for each product
expiration_ts = np.zeros(n_rows, dtype=np.int64)


In [None]:
# Loop through each product and retrieve its category
for i, category in enumerate(product_categories):
    params = category_params[category]
    
    # Calculate lead time in seconds based on category parameters
    lead_time_days = np.random.uniform(params['lead_min'], params['lead_max'])
    lead_time_seconds = int(lead_time_days * 86400)
    
    # Calculate shelf life in seconds based on category parameters
    shelf_life_days = np.random.uniform(params['shelf_min'], params['shelf_max'])
    shelf_life_seconds = int(shelf_life_days * 86400)
    
    # Compute last order and expiration timestamps
    last_order_ts[i] = date_received_ts[i] - lead_time_seconds
    expiration_ts[i] = date_received_ts[i] + shelf_life_seconds


In [None]:
# Convert received timestamps to datetime format
date_received = pd.to_datetime(date_received_ts, unit='s')

# Convert last order timestamps to datetime format
last_order = pd.to_datetime(last_order_ts, unit='s')

# Convert expiration timestamps to datetime format
expiration = pd.to_datetime(expiration_ts, unit='s')


In [None]:
# Create synthetic DataFrame with category and date information
df_synthetic = pd.DataFrame({
    'Category': product_categories,
    'Date_Received': date_received,
    'Last_Order_Date': last_order,
    'Expiration_Date': expiration
})


In [None]:
# Adjust for seasonal patterns

# Fruits and vegetables have shorter shelf life during summer (due to heat)
summer_mask = (df_synthetic['Date_Received'].dt.month.isin([6, 7, 8])) & (df_synthetic['Category'] == 'Fruits & Vegetables')
df_synthetic.loc[summer_mask, 'Expiration_Date'] -= pd.to_timedelta(np.random.randint(2, 5), unit='d')

# Dairy products have shorter lead time during winter (lower spoilage risk)
winter_mask = (df_synthetic['Date_Received'].dt.month.isin([12, 1, 2])) & (df_synthetic['Category'] == 'Dairy')
df_synthetic.loc[winter_mask, 'Last_Order_Date'] += pd.to_timedelta(np.random.randint(1, 3), unit='d')


In [None]:
# Add some outliers (3% of the data) ‚Äì unusual situations
outlier_mask = np.random.random(n_rows) < 0.03

# Apply early order dates for outlier records
df_synthetic.loc[outlier_mask, 'Last_Order_Date'] -= pd.to_timedelta(np.random.randint(15, 30), unit='d')

# Apply reduced shelf life for perishable outlier products
df_synthetic.loc[outlier_mask & (df_synthetic['Category'].isin(['Fruits & Vegetables', 'Seafood'])), 
       'Expiration_Date'] -= pd.to_timedelta(np.random.randint(3, 7), unit='d')

# Ensure Last_Order_Date is always earlier than Date_Received
date_inconsistency = df_synthetic['Last_Order_Date'] > df_synthetic['Date_Received']
df_synthetic.loc[date_inconsistency, 'Last_Order_Date'] = df_synthetic.loc[date_inconsistency, 'Date_Received'] - pd.to_timedelta(
    np.random.randint(1, 5), unit='d')

# Ensure Expiration_Date is always later than Date_Received
exp_inconsistency = df_synthetic['Expiration_Date'] <= df_synthetic['Date_Received']
df_synthetic.loc[exp_inconsistency, 'Expiration_Date'] = df_synthetic.loc[exp_inconsistency, 'Date_Received'] + pd.to_timedelta(
    np.random.randint(1, 10), unit='d')


In [None]:
# Format received date for display (YYYY-MM-DD)
df_synthetic['Date_Received'] = df_synthetic['Date_Received'].dt.strftime('%Y-%m-%d')

df_synthetic['Last_Order_Date'] = df_synthetic['Last_Order_Date'].dt.strftime('%Y-%m-%d')

df_synthetic['Expiration_Date'] = df_synthetic['Expiration_Date'].dt.strftime('%Y-%m-%d')


In [None]:
# Identify columns present in df but not in df_raw
columns_drop = list(set(df.columns.tolist()) - set(df_raw.columns.tolist()))

# Drop the extra columns from df to align with df_raw structure
df.drop(columns=columns_drop, inplace=True)


In [None]:
# Overwrite matching columns in df with values from the synthetic dataset
df[df_synthetic.columns] = df_synthetic


In [None]:
# Convert 'Date_Received' column to datetime format, assuming year comes first
df['Date_Received'] = pd.to_datetime(df['Date_Received'], yearfirst=True)

# Convert 'Last_Order_Date' column to datetime format, assuming year comes first
df['Last_Order_Date'] = pd.to_datetime(df['Last_Order_Date'], yearfirst=True)

# Convert 'Expiration_Date' column to datetime format, assuming year comes first
df['Expiration_Date'] = pd.to_datetime(df['Expiration_Date'], yearfirst=True)


In [None]:
# Show data information
df.info()

In [None]:
# # Define data paths
# processed_data_path = os.path.join('../data', 'processed')

# utils_data_path = os.path.join('../docs/column_descriptions.json')

In [None]:
# Sort DataFrame by Date_Received in ascending order
# df = df.sort_values(by='Date_Received').reset_index(drop=True)

In [None]:
# # Save Data
# df.to_pickle(processed_data_path + '/grocery.pkl')

# # save Dictionary JSON archive
# with open(utils_data_path, 'w') as f:
#     json.dump(column_inventory, f, indent=4)

In [None]:
class Supplier:

    def __init__(self, supply_rate):

        self.supply_rate = supply_rate

    def supply(self):

        return self.supply_rate

class Manufacturer:

    def __init__(self, production_rate, supplier):

        self.production_rate = production_rate

        self.supplier = supplier

        self.inventory = 0

    def produce(self):

        supply = self.supplier.supply()

        production = min(supply, self.production_rate)

        self.inventory += production

        return production

class Warehouse:

    def __init__(self, capacity):

        self.capacity = capacity

        self.inventory = 0

    def store(self, products):

        space_available = self.capacity - self.inventory

        stored = min(products, space_available)

        self.inventory += stored

        return stored

    def ship(self, demand):

        shipped = min(demand, self.inventory)

        self.inventory -= shipped

        return shipped

class Customer:

    def __init__(self, demand_rate):

        self.demand_rate = demand_rate

    def demand(self):

        return self.demand_rate

# Simulation parameters

days = 30

supplier_rate = 100

production_rate = 80

warehouse_capacity = 500

customer_demand_rate = 70

# Create supply chain components

supplier = Supplier(supplier_rate)

manufacturer = Manufacturer(production_rate, supplier)

warehouse = Warehouse(warehouse_capacity)

customer = Customer(customer_demand_rate)

# Arrays to store results

supplier_inventory = np.zeros(days)

manufacturer_inventory = np.zeros(days)

warehouse_inventory = np.zeros(days)

customer_demand = np.zeros(days)

customer_fulfilled = np.zeros(days)

# Simulation loop

for day in range(days):

    # Supplier supplies raw materials to the manufacturer

    supplier_inventory[day] = supplier.supply()

    # Manufacturer produces products

    produced = manufacturer.produce()

    manufacturer_inventory[day] = manufacturer.inventory

    # Warehouse stores produced products

    stored = warehouse.store(produced)

    warehouse_inventory[day] = warehouse.inventory

    # Customer demands products

    demand = customer.demand()

    customer_demand[day] = demand

    # Warehouse ships products to the customer

    fulfilled = warehouse.ship(demand)

    customer_fulfilled[day] = fulfilled

# Plot results

plt.figure(figsize=(12, 8))

plt.subplot(3, 1, 1)

plt.plot(supplier_inventory, label='Supplier Inventory')

plt.plot(manufacturer_inventory, label='Manufacturer Inventory')

plt.legend()

plt.title('Supply Chain Simulation')

plt.ylabel('Inventory Level')

plt.subplot(3, 1, 2)

plt.plot(warehouse_inventory, label='Warehouse Inventory')

plt.legend()

plt.ylabel('Inventory Level')

plt.subplot(3, 1, 3)

plt.plot(customer_demand, label='Customer Demand')

plt.plot(customer_fulfilled, label='Customer Fulfilled')

plt.legend()

plt.xlabel('Day')

plt.ylabel('Products')

plt.tight_layout()

plt.show()