## Create Synthetic Data for Grocery Supply Chain

Features of the adjusted data:
Specific categories: Using only the categories present in your data

Realistic distribution: Based on observed frequency in the provided data

Realistic parameters per category:

ü•¶ **Produce**
- **Lead Time:** 1‚Äì3 days (locally sourced), 5‚Äì10 days (imported)
- **Shelf Life:** 3‚Äì10 days (most fresh items), up to 2 weeks for hardy vegetables like carrots or potatoes

üåæ **Grains and Flours**
- **Lead Time:** 3‚Äì7 days (domestic), 10‚Äì15 days (imported specialty grains)
- **Shelf Life:** 6 months to 1 year (dry, sealed), up to 2 years for rice and flour stored properly

üßÄ **Dairy and Cold Cuts**
- **Lead Time:** 2‚Äì5 days (regional suppliers), 7‚Äì10 days (specialty cheeses)
- **Shelf Life:**
  - Milk & cream: 7‚Äì14 days refrigerated
  - Yogurt & soft cheeses: 2‚Äì3 weeks
  - Hard cheeses: 1‚Äì3 months
  - Cold cuts: 1‚Äì2 weeks sealed

‚òï **Beverages**
- **Lead Time:** 2‚Äì7 days (coffee/tea distributors)
- **Shelf Life:**
  - Tea: 1‚Äì2 years (dry)
  - Coffee beans: 6‚Äì12 months (sealed), 1‚Äì2 weeks after grinding
  - Brewed drinks: 1‚Äì3 days refrigerated

ü•ö **Eggs and Poultry**
- **Lead Time:** 1‚Äì3 days (local farms), 5‚Äì7 days (wholesale)
- **Shelf Life:**
  - Eggs: 3‚Äì5 weeks refrigerated
  - Fresh poultry: 1‚Äì2 days raw, 3‚Äì4 days cooked

üêü **Meats and Fish**
- **Lead Time:** 1‚Äì5 days (fresh), 7‚Äì10 days (frozen or imported)
- **Shelf Life:**
  - Fresh fish: 1‚Äì2 days
  - Frozen fish: 3‚Äì6 months
  - Cured fish (e.g., sardines): up to 1 year

üõ¢Ô∏è **Oils and Fats**
- **Lead Time:** 3‚Äì7 days (bulk suppliers)
- **Shelf Life:**
  - Vegetable oils: 6‚Äì12 months
  - Butter: 1 month refrigerated, 6 months frozen
  - Coconut oil: up to 2 years

üç¨ **Sugars and Sweets**
- **Lead Time:** 2‚Äì5 days
- **Shelf Life:**
  - Sugars: indefinite if dry and sealed
  - Dried fruits (e.g., plum): 6‚Äì12 months

üç™ **Miscellaneous and Biscuits**
- **Lead Time:** 2‚Äì6 days
- **Shelf Life:**
  - Biscuits: 3‚Äì6 months sealed


Seasonal patterns:

- Fruits/vegetables with reduced shelf life in summer

- Dairy with shorter lead time in winter

Realistic temporal distribution:

- 80% of deliveries on weekdays

Controlled outliers: Only 3% of data with unusual situations

These synthetic data preserve the specific characteristics of the categories in your original dataset, with realistic temporal relationships for supply chain analysis.

## Data Generation
### Import Necessary Libraries

In [1]:
import pandas as pd
import numpy as np
import os
import json

from smart_supply_chain_ai.utils import create_data_functions, combine_df_functions

import warnings
warnings.filterwarnings('ignore')

### Paths

In [2]:
# Define data paths
raw_data_path = os.path.join('../data', 'raw/')

external_data_path = os.path.join('../data', 'external/')

json_path = os.path.join('../src','smart_supply_chain_ai' , 'utils/')

In [3]:
# List of JSON filenames (without extension) to be loaded
arch_json = ['products','products_categories', 'suppliers']

# Dictionary to store the loaded JSON content
store_catalog = {}

# Loop through each filename, build the full path, and load the JSON data
for name in arch_json:
    file_path = os.path.join(json_path, f"{name}.json")  # Construct full file path
    with open(file_path, "r", encoding="utf-8") as f:     # Open the JSON file
        store_catalog[name] = json.load(f)                        # Load and store the data under its name

# Product catalog information

In [4]:
# Create a DataFrame of products with product names as a column
products = pd.DataFrame.from_dict(store_catalog['products']).T.reset_index().rename(columns={'index': 'product'})


In [5]:
# Replace product with new IDs
products['product_id'] = create_data_functions.create_IDs(products.shape[0], suffix='P')

# Supplier catalog and distribution details

In [6]:
# Create a DataFrame of suppliers with supplier names as a column
suppliers = pd.DataFrame.from_dict(store_catalog['suppliers']).T.reset_index().rename(columns={'index': 'supplier'})

In [7]:
# Insert supplier IDs as the second column
suppliers.insert(1, 'supplier_id', create_data_functions.create_IDs(suppliers.shape[0], suffix='S'))

In [8]:
# Remove 'category' and 'subcategories' columns from the suppliers DataFrame
suppliers.drop(columns=['category', 'subcategories'], inplace=True)


In [9]:
# Split each supplier's product list into separate rows and reset the index
suppliers = suppliers.explode('products').reset_index(drop=True)


In [10]:
# Merge product and supplier data on matching product names, then drop duplicate 'products' column from suppliers
supply_df = pd.merge(products, suppliers, left_on='product', right_on='products').drop(columns='products')


In [11]:
# Initialize a random number generator with a fixed seed for reproducibility.
rng = np.random.default_rng(seed=43)
# Assign random supplier ratings between 1 and 4 to all suppliers.
supply_df['supplier_rating'] = rng.integers(1, 5, size=supply_df.shape[0])
# Randomly select 15 unique suppliers to be considered "top suppliers".
suppliers_top = np.random.choice(supply_df['supplier'].unique(), 15, replace=False)
# Update ratings: if the supplier is in the top list, set rating to 5; otherwise keep the original rating.
supply_df['supplier_rating'] = np.where(supply_df['supplier'].isin(suppliers_top), 5, supply_df['supplier_rating'])

## Meteorological Data for Supply Chain Management

In [12]:
# Set the path to the weather CSV file
archive_csv = external_data_path + 'dados_83967_D_2015-01-01_2025-09-18.csv'

# Read the CSV file into a DataFrame
weather_df = pd.read_csv(archive_csv, sep=";", decimal=",", skiprows=9, engine="python")

# Show the first rows of the DataFrame
weather_df.head()

Unnamed: 0,Data Medicao,"PRECIPITACAO TOTAL, DIARIO(mm)","TEMPERATURA MAXIMA, DIARIA(¬∞C)","TEMPERATURA MINIMA, DIARIA(¬∞C)","VENTO, VELOCIDADE MEDIA DIARIA(m/s)",Unnamed: 5
0,2015-01-01,4.9,28.3,22.0,2.1,
1,2015-01-02,17.8,26.8,19.7,2.4,
2,2015-01-03,0.0,29.0,18.3,1.9,
3,2015-01-04,0.0,32.1,17.8,2.1,
4,2015-01-05,0.0,33.5,19.1,2.2,


In [13]:
# Remove columns that contain only missing values
weather_df.dropna(axis=1, how='all', inplace=True)


In [14]:
# Rename columns to clear and descriptive English names
weather_df.columns = [
    "measurement_date",
    "daily_total_precipitation_mm",
    "daily_maximum_temperature_c",
    "daily_minimum_temperature_c",
    "daily_average_wind_speed_mps"
]


In [15]:
# Set 'measurement_date' as index and remove rows with all missing values
weather_df.set_index('measurement_date').dropna(how='all', inplace=True)

# Show the first rows of the DataFrame
weather_df.head()


Unnamed: 0,measurement_date,daily_total_precipitation_mm,daily_maximum_temperature_c,daily_minimum_temperature_c,daily_average_wind_speed_mps
0,2015-01-01,4.9,28.3,22.0,2.1
1,2015-01-02,17.8,26.8,19.7,2.4
2,2015-01-03,0.0,29.0,18.3,1.9
3,2015-01-04,0.0,32.1,17.8,2.1
4,2015-01-05,0.0,33.5,19.1,2.2


In [16]:
# Convert 'measurement_date' column to datetime format
weather_df['measurement_date'] = pd.to_datetime(weather_df['measurement_date'])


In [17]:
# Display summary information about the DataFrame
weather_df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 3914 entries, 0 to 3913
Data columns (total 5 columns):
 #   Column                        Non-Null Count  Dtype         
---  ------                        --------------  -----         
 0   measurement_date              3914 non-null   datetime64[ns]
 1   daily_total_precipitation_mm  3851 non-null   float64       
 2   daily_maximum_temperature_c   3628 non-null   float64       
 3   daily_minimum_temperature_c   3782 non-null   float64       
 4   daily_average_wind_speed_mps  3828 non-null   float64       
dtypes: datetime64[ns](1), float64(4)
memory usage: 153.0 KB


In [18]:
# Dropped all missing values
weather_df.dropna(inplace=True)

In [19]:
# Apply the weather classification function to the cleaned DataFrame to generate severity and category labels
weather_severity_df = create_data_functions.classify_weather(weather_df)

In [20]:
# Show 10 samples rows of the DataFrame
weather_severity_df.sample(10)

Unnamed: 0,measurement_date,daily_total_precipitation_mm,daily_maximum_temperature_c,daily_minimum_temperature_c,daily_average_wind_speed_mps,daily_average_temperature_c,temperature_classification,precipitation_classification,wind_classification,weather_severity
1746,2019-10-13,0.0,30.4,20.6,3.6,25.5,Warm,No Precipitation,Gentle to Fresh Breeze,Moderate
2903,2022-12-13,21.5,26.2,17.4,1.5,21.8,Mild to Temperate,Heavy Rain,Calm / Light Breeze,Severe
2842,2022-10-13,0.3,26.5,14.8,1.4,20.65,Mild to Temperate,Light Rain,Calm / Light Breeze,Moderate
147,2015-05-28,57.7,18.2,15.0,2.4,16.6,Cool,Violent Rainfall,Gentle to Fresh Breeze,Severe
2609,2022-02-22,2.6,29.3,23.8,1.6,26.55,Warm,Moderate Rain,Gentle to Fresh Breeze,Moderate
3517,2024-08-18,0.0,26.5,16.7,1.7,21.6,Mild to Temperate,No Precipitation,Gentle to Fresh Breeze,Moderate
2269,2021-03-19,0.0,28.3,20.7,1.8,24.5,Warm,No Precipitation,Gentle to Fresh Breeze,Moderate
201,2015-07-21,29.8,17.5,11.9,1.3,14.7,Cool,Heavy Rain,Calm / Light Breeze,Severe
693,2016-11-24,0.0,28.8,13.1,2.8,20.95,Mild to Temperate,No Precipitation,Gentle to Fresh Breeze,Moderate
799,2017-03-10,52.1,26.4,20.4,1.4,23.4,Mild to Temperate,Violent Rainfall,Calm / Light Breeze,Severe


In [21]:
# Generate and transpose summary statistics for all numeric columns in the classified weather DataFrame
weather_severity_df.describe().T

Unnamed: 0,count,mean,min,25%,50%,75%,max,std
measurement_date,3617.0,2019-12-16 09:59:34.122200832,2015-01-01 00:00:00,2017-06-23 00:00:00,2019-12-14 00:00:00,2022-06-10 00:00:00,2025-01-05 00:00:00,
daily_total_precipitation_mm,3617.0,4.419795,0.0,0.0,0.0,2.0,141.7,11.644829
daily_maximum_temperature_c,3617.0,25.919021,8.6,21.7,26.4,30.1,40.3,5.674949
daily_minimum_temperature_c,3617.0,16.25817,2.1,13.2,16.7,20.0,26.5,4.681626
daily_average_wind_speed_mps,3617.0,1.873403,0.0,1.3,1.8,2.3,6.0,0.7797
daily_average_temperature_c,3617.0,21.088596,7.3,17.85,21.5,24.75,32.45,4.843685


In [22]:
# Define the start date of the time range
start_date = pd.to_datetime('2020-01-01')

# Define the end date
end_date = pd.to_datetime('2025-01-05')

# Create a daily date range from start_date to end_date
date_range = pd.date_range(start=start_date, end=end_date, freq='D')

# Specify the number of rows in the dataset (i.e., total number of records to generate)
n_rows = 6000

In [23]:
# Randomly sample dates from the date range
random_dates = np.random.choice(date_range, size=n_rows, replace=True)

# Create a DataFrame with the sampled dates
date_df = pd.DataFrame({
    'LPO': random_dates
})

In [24]:
# Apply the day classification function to each value in the 'LPO' (latest_purchase_order) column and store the result in a new column
date_df['day_classification'] = date_df['LPO'].apply(create_data_functions.day_classification)

In [25]:
# Merge weather severity data into the date DataFrame based on matching dates ('LPO' (latest_purchase_order) and 'measurement_date'),
# then drop the redundant 'measurement_date' column after the join
climate_date_df = pd.merge(date_df, weather_severity_df, left_on='LPO', right_on='measurement_date', how='inner').drop(columns='measurement_date')


# Realistic supply chain modeling based on weather and product data

In [26]:
# Create a copy of the climate data DataFrame to work with weather-specific analysis
df_weather = climate_date_df.copy()

# Create a copy of the supply data DataFrame to work with product-related operations
df_products = supply_df.copy()

In [27]:
# Display the first row of the weather DataFrame to preview its structure
df_weather.head(1)


Unnamed: 0,LPO,day_classification,daily_total_precipitation_mm,daily_maximum_temperature_c,daily_minimum_temperature_c,daily_average_wind_speed_mps,daily_average_temperature_c,temperature_classification,precipitation_classification,wind_classification,weather_severity
0,2024-07-02,Weekdays,0.0,24.2,5.2,0.0,14.7,Cool,No Precipitation,Calm / Light Breeze,Normal


In [28]:
# Preview the first row of the products DataFrame to check column names and initial data
df_products.head(1)

Unnamed: 0,product,product_id,category,sub_category,shelf_life_days,min_stock,max_stock,seasonality,storage_recommendation,unit_of_measurement,barcode_ean,reorder_point,supplier,supplier_id,distance_km,supplier_rating
0,Strawberries,1256111|P,Fresh Foods,Fruits,5,10,25,"[July, August, September, October, November]",Refrigerated,unit,8712345000018,10,FreshHarvest Ltd.,1689765|S,84,5


In [29]:
# Converter a coluna 'LPO' para o formato datetime
df_weather['LPO'] = pd.to_datetime(df_weather['LPO'])

# Extrair o n√∫mero do m√™s da data
df_weather['month'] = df_weather['LPO'].dt.month

# Extrair o nome do m√™s (ex: Janeiro, Fevereiro)
df_weather['month_name'] = df_weather['LPO'].dt.month_name()

# Extrair o nome do dia da semana (ex: Segunda-feira)
df_weather['day_of_week'] = df_weather['LPO'].dt.day_name()

# Extrair o n√∫mero do dia do m√™s
df_weather['day_of_month'] = df_weather['LPO'].dt.day

# Criar uma coluna booleana indicando se o dia √© feriado
df_weather['is_holiday'] = np.where(df_weather['day_classification'] == 'Holiday', True, False)

# Criar uma coluna booleana indicando se o dia √© fim de semana (s√°bado ou domingo)
df_weather['is_weekend'] = np.where(df_weather['LPO'].dt.dayofweek > 4, True, False)


In [30]:
# Display the first few rows of the weather DataFrame
df_weather

Unnamed: 0,LPO,day_classification,daily_total_precipitation_mm,daily_maximum_temperature_c,daily_minimum_temperature_c,daily_average_wind_speed_mps,daily_average_temperature_c,temperature_classification,precipitation_classification,wind_classification,weather_severity,month,month_name,day_of_week,day_of_month,is_holiday,is_weekend
0,2024-07-02,Weekdays,0.0,24.2,5.2,0.0,14.70,Cool,No Precipitation,Calm / Light Breeze,Normal,7,July,Tuesday,2,False,False
1,2024-07-02,Weekdays,0.0,24.2,5.2,0.0,14.70,Cool,No Precipitation,Calm / Light Breeze,Normal,7,July,Tuesday,2,False,False
2,2024-07-02,Weekdays,0.0,24.2,5.2,0.0,14.70,Cool,No Precipitation,Calm / Light Breeze,Normal,7,July,Tuesday,2,False,False
3,2024-07-02,Weekdays,0.0,24.2,5.2,0.0,14.70,Cool,No Precipitation,Calm / Light Breeze,Normal,7,July,Tuesday,2,False,False
4,2024-07-02,Weekdays,0.0,24.2,5.2,0.0,14.70,Cool,No Precipitation,Calm / Light Breeze,Normal,7,July,Tuesday,2,False,False
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
5871,2021-05-25,Weekdays,0.0,19.6,8.3,1.9,13.95,Cool,No Precipitation,Gentle to Fresh Breeze,Moderate,5,May,Tuesday,25,False,False
5872,2020-04-05,Sunday,0.0,30.1,14.5,1.5,22.30,Mild to Temperate,No Precipitation,Calm / Light Breeze,Normal,4,April,Sunday,5,False,True
5873,2024-06-29,Saturday,0.0,13.6,7.7,3.8,10.65,Cold,No Precipitation,Gentle to Fresh Breeze,Moderate,6,June,Saturday,29,False,True
5874,2023-02-06,Weekdays,0.1,33.3,21.8,1.3,27.55,Warm,Light Rain,Calm / Light Breeze,Moderate,2,February,Monday,6,False,False


In [31]:
# Display the first few rows of the products DataFrame
df_products

Unnamed: 0,product,product_id,category,sub_category,shelf_life_days,min_stock,max_stock,seasonality,storage_recommendation,unit_of_measurement,barcode_ean,reorder_point,supplier,supplier_id,distance_km,supplier_rating
0,Strawberries,1256111|P,Fresh Foods,Fruits,5,10,25,"[July, August, September, October, November]",Refrigerated,unit,8712345000018,10,FreshHarvest Ltd.,1689765|S,84,5
1,Strawberries,1256111|P,Fresh Foods,Fruits,5,10,25,"[July, August, September, October, November]",Refrigerated,unit,8712345000018,10,PrimeProduce,1699972|S,238,3
2,Strawberries,1256111|P,Fresh Foods,Fruits,5,10,25,"[July, August, September, October, November]",Refrigerated,unit,8712345000018,10,AgroPrime Foods,1533390|S,101,5
3,Spinach,1167149|P,Fresh Foods,Leafy Greens,5,10,25,[],Refrigerated,bunch,8712345000025,8,GreenFields Co.,1470887|S,127,1
4,Spinach,1167149|P,Fresh Foods,Leafy Greens,5,10,25,[],Refrigerated,bunch,8712345000025,8,UrbanFarmers,1059058|S,95,3
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
178,Coconut Sugar,1367507|P,Oils & Condiments,Condiments,9999,100,250,[],"Cool, dry place in an airtight container",kg,8712345001114,80,GlobalFoods,1762899|S,1450,3
179,Coconut Sugar,1367507|P,Oils & Condiments,Condiments,9999,100,250,[],"Cool, dry place in an airtight container",kg,8712345001114,80,North Brazil Distributor,1642678|S,1943,1
180,Oatmeal Biscuit,1855825|P,Breads & Biscuits,Biscuits,240,100,250,[],"Cool, dry place in an airtight container",box,8712345001121,100,Sunrise Traders,1006112|S,1890,2
181,Butter Biscuit,1533952|P,Breads & Biscuits,Biscuits,240,100,250,[],"Cool, dry place in an airtight container",box,8712345001138,100,Plain Distributor,1686189|S,1254,4


In [32]:
# Initialize the supply chain simulator with weather and product data,
# then run the simulation to generate a combined DataFrame with results.
simulator = combine_df_functions.SupplyChainSimulator(df_weather, df_products)
df_combined = simulator.run_simulation()


In [33]:
# Display DataFrame
df_combined

Unnamed: 0,date,product_id,product,category,sub_category,supplier_id,supplier,is_in_season,demand_factor,adjusted_demand,...,max_stock,reorder_point,distance_km,supplier_rating,temperature,precipitation,wind_speed,weather_severity,is_weekend,is_holiday
0,2024-07-02,1256111|P,Strawberries,Fresh Foods,Fruits,1689765|S,FreshHarvest Ltd.,True,1.297,8,...,25,10,84,5,14.70,0.0,0.0,Normal,False,False
1,2024-07-02,1256111|P,Strawberries,Fresh Foods,Fruits,1699972|S,PrimeProduce,True,1.297,21,...,25,10,238,3,14.70,0.0,0.0,Normal,False,False
2,2024-07-02,1256111|P,Strawberries,Fresh Foods,Fruits,1533390|S,AgroPrime Foods,True,1.297,18,...,25,10,101,5,14.70,0.0,0.0,Normal,False,False
3,2024-07-02,1167149|P,Spinach,Fresh Foods,Leafy Greens,1470887|S,GreenFields Co.,True,1.431,16,...,25,8,127,1,14.70,0.0,0.0,Normal,False,False
4,2024-07-02,1167149|P,Spinach,Fresh Foods,Leafy Greens,1059058|S,UrbanFarmers,True,1.431,16,...,25,8,95,3,14.70,0.0,0.0,Normal,False,False
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
1075303,2023-11-18,1367507|P,Coconut Sugar,Oils & Condiments,Condiments,1762899|S,GlobalFoods,True,0.890,16,...,250,80,1450,3,21.45,51.1,2.4,Severe,True,False
1075304,2023-11-18,1367507|P,Coconut Sugar,Oils & Condiments,Condiments,1642678|S,North Brazil Distributor,True,0.890,7,...,250,80,1943,1,21.45,51.1,2.4,Severe,True,False
1075305,2023-11-18,1855825|P,Oatmeal Biscuit,Breads & Biscuits,Biscuits,1006112|S,Sunrise Traders,True,1.008,10,...,250,100,1890,2,21.45,51.1,2.4,Severe,True,False
1075306,2023-11-18,1533952|P,Butter Biscuit,Breads & Biscuits,Biscuits,1686189|S,Plain Distributor,True,0.538,3,...,250,100,1254,4,21.45,51.1,2.4,Severe,True,False


In [34]:
# Create a delivery plan with up to 5 products per day.
# Allows 30% of products to be out-of-season.
deliveries, stats = simulator.create_balanced_delivery(max_products_per_day=5, out_of_season_percentage=0.3)

In [35]:
# Display DataFrame
stats

Unnamed: 0,date,products_delivered,in_season_delivered,out_of_season_delivered,is_holiday,is_weekend,total_in_season_available,total_out_of_season_available
0,2020-01-01,0,0,0,True,False,106,5
1,2020-01-02,4,3,1,False,False,106,5
2,2020-01-03,4,3,1,False,False,106,5
3,2020-01-05,2,1,1,False,True,106,5
4,2020-01-06,3,2,1,False,False,106,5
...,...,...,...,...,...,...,...,...
1715,2024-12-05,4,3,1,False,False,104,7
1716,2024-12-06,3,2,1,False,False,104,7
1717,2024-12-07,0,0,0,False,True,104,7
1718,2025-01-03,4,3,1,False,False,106,5


In [36]:
# Display DataFrame
deliveries

Unnamed: 0,product,date,product_id,category,sub_category,supplier_id,supplier,is_in_season,demand_factor,adjusted_demand,...,max_stock,reorder_point,distance_km,supplier_rating,temperature,precipitation,wind_speed,weather_severity,is_weekend,is_holiday
0,All-Purpose Flour,2020-01-02,1798524|P,Grains & Flours,Flours,1301922|S,FarmDirect,True,1.271,8,...,250,100,1098,5,24.70,0.7,4.0,Moderate,False,False
1,Almond Flour,2020-01-02,1640592|P,Grains & Flours,Flours,1529328|S,Supreme Supplier,True,1.594,22,...,100,50,1756,5,24.70,0.7,4.0,Moderate,False,False
2,Anchovies,2020-01-02,1463502|P,Meats & Fish,Fish,1262003|S,QualityMax Supplier,True,1.405,11,...,25,5,1173,3,24.70,0.7,4.0,Moderate,False,False
3,Asparagus,2020-01-02,1833804|P,Fresh Foods,Vegetables,1990733|S,AgroTop Supplies,False,0.831,7,...,25,7,150,4,24.70,0.7,4.0,Moderate,False,False
4,All-Purpose Flour,2020-01-03,1798524|P,Grains & Flours,Flours,1301922|S,FarmDirect,True,1.133,18,...,250,100,1098,5,23.75,0.8,3.8,Moderate,False,False
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
4853,Asparagus,2024-12-06,1833804|P,Fresh Foods,Vegetables,1990733|S,AgroTop Supplies,False,0.887,4,...,25,7,150,4,27.40,0.0,1.4,Moderate,False,False
4854,All-Purpose Flour,2025-01-03,1798524|P,Grains & Flours,Flours,1301922|S,FarmDirect,True,1.523,29,...,250,100,1098,5,21.65,0.0,2.2,Moderate,False,False
4855,Almond Flour,2025-01-03,1640592|P,Grains & Flours,Flours,1529328|S,Supreme Supplier,True,1.178,12,...,100,50,1756,5,21.65,0.0,2.2,Moderate,False,False
4856,Anchovies,2025-01-03,1463502|P,Meats & Fish,Fish,1262003|S,QualityMax Supplier,True,1.201,22,...,25,5,1173,3,21.65,0.0,2.2,Moderate,False,False


In [37]:
# Save Data
deliveries.to_csv(raw_data_path + 'grocery_data.csv', index=False)
stats.to_csv(raw_data_path + 'grocery_stats.csv', index=False)

# Load data for text and include stock quantity, sales volume

In [11]:
# Load grocery dataset into a pandas DataFrame
df = pd.read_csv(raw_data_path + 'grocery_data.csv')

In [12]:
# Shows the first 5 rows
df.head()

Unnamed: 0,product,date,product_id,category,sub_category,supplier_id,supplier,is_in_season,demand_factor,adjusted_demand,...,distance_km,supplier_rating,temperature,precipitation,wind_speed,weather_severity,is_weekend,is_holiday,stock_quantity,sales_volume
0,All-Purpose Flour,2020-01-02,1798524|P,Grains & Flours,Flours,1301922|S,FarmDirect,True,1.271,8,...,1098,5,24.7,0.7,4.0,Moderate,False,False,77,2
1,Almond Flour,2020-01-02,1640592|P,Grains & Flours,Flours,1529328|S,Supreme Supplier,True,1.594,22,...,1756,5,24.7,0.7,4.0,Moderate,False,False,54,24
2,Anchovies,2020-01-02,1463502|P,Meats & Fish,Fish,1262003|S,QualityMax Supplier,True,1.405,11,...,1173,3,24.7,0.7,4.0,Moderate,False,False,34,13
3,Asparagus,2020-01-02,1833804|P,Fresh Foods,Vegetables,1990733|S,AgroTop Supplies,False,0.831,7,...,150,4,24.7,0.7,4.0,Moderate,False,False,26,8
4,All-Purpose Flour,2020-01-03,1798524|P,Grains & Flours,Flours,1301922|S,FarmDirect,True,1.133,18,...,1098,5,23.75,0.8,3.8,Moderate,False,False,186,19


In [13]:
# Apply the function to generate the new 'stock_quantity' column
df['stock_quantity'] = df.apply(create_data_functions.simulate_stock_quantity, axis=1)

# Display the first few rows with relevant columns
print(df[['product', 'min_stock', 'max_stock', 'adjusted_demand', 'stock_quantity']].head())


             product  min_stock  max_stock  adjusted_demand  stock_quantity
0  All-Purpose Flour        100        250                8             189
1       Almond Flour         50        100               22              55
2          Anchovies         10         25               11               6
3          Asparagus         10         25                7              30
4  All-Purpose Flour        100        250               18             342


In [14]:
# Apply the function to generate the new 'sales_volume' column
df['sales_volume'] = df.apply(create_data_functions.simulate_sales_volume, axis=1)

# Display the first few rows with relevant columns
print(df[['product', 'min_stock', 'max_stock', 'adjusted_demand', 'stock_quantity', 'sales_volume']].head())

             product  min_stock  max_stock  adjusted_demand  stock_quantity  \
0  All-Purpose Flour        100        250                8             189   
1       Almond Flour         50        100               22              55   
2          Anchovies         10         25               11               6   
3          Asparagus         10         25                7              30   
4  All-Purpose Flour        100        250               18             342   

   sales_volume  
0            11  
1            23  
2             6  
3             5  
4            14  


In [15]:
# Save the updated DataFrame to CSV, excluding the index column
df.to_csv(raw_data_path + 'grocery_data.csv', index=False)

In [16]:
# Load and preview the saved data
df_test = pd.read_csv(raw_data_path + 'grocery_data.csv')
df_test.head()

Unnamed: 0,product,date,product_id,category,sub_category,supplier_id,supplier,is_in_season,demand_factor,adjusted_demand,...,distance_km,supplier_rating,temperature,precipitation,wind_speed,weather_severity,is_weekend,is_holiday,stock_quantity,sales_volume
0,All-Purpose Flour,2020-01-02,1798524|P,Grains & Flours,Flours,1301922|S,FarmDirect,True,1.271,8,...,1098,5,24.7,0.7,4.0,Moderate,False,False,189,11
1,Almond Flour,2020-01-02,1640592|P,Grains & Flours,Flours,1529328|S,Supreme Supplier,True,1.594,22,...,1756,5,24.7,0.7,4.0,Moderate,False,False,55,23
2,Anchovies,2020-01-02,1463502|P,Meats & Fish,Fish,1262003|S,QualityMax Supplier,True,1.405,11,...,1173,3,24.7,0.7,4.0,Moderate,False,False,6,6
3,Asparagus,2020-01-02,1833804|P,Fresh Foods,Vegetables,1990733|S,AgroTop Supplies,False,0.831,7,...,150,4,24.7,0.7,4.0,Moderate,False,False,30,5
4,All-Purpose Flour,2020-01-03,1798524|P,Grains & Flours,Flours,1301922|S,FarmDirect,True,1.133,18,...,1098,5,23.75,0.8,3.8,Moderate,False,False,342,14
