`Notebook Description:` contains different methods for generating Inventory datasets
- 1. Random Number Generation with Uniform Distribution
- 2. Random Number Generation with Normal Distribution
- 3. Generation with Incorporating Seasonality and Trends

## Setup 

In [1]:
import os 
import pandas as pd
import numpy as np 
import random

In [None]:
# Defining directory paths
data_dir = '../data/generated/inventory/'

In [16]:
# Set parameters
num_products = 1000
start_date = '2024-08-01'
end_date = '2024-08-31'

min_batches_per_product = 5

date_range = pd.date_range(start=start_date, end=end_date)

start_expiry = '2024-08-01'
end_expiry = '2024-08-31'

INVENTORY_COLUMNS = ["PRODUCT_ID", "BATCH_ID", "EXPIRY_DATE", "INVENTORY"]

In [4]:
# generate sequantial ids for the products 
products_ids = np.arange(1, num_products + 1)

In [5]:
date_range

DatetimeIndex(['2024-08-01', '2024-08-02', '2024-08-03', '2024-08-04',
               '2024-08-05', '2024-08-06', '2024-08-07', '2024-08-08',
               '2024-08-09', '2024-08-10', '2024-08-11', '2024-08-12',
               '2024-08-13', '2024-08-14', '2024-08-15', '2024-08-16',
               '2024-08-17', '2024-08-18', '2024-08-19', '2024-08-20',
               '2024-08-21', '2024-08-22', '2024-08-23', '2024-08-24',
               '2024-08-25', '2024-08-26', '2024-08-27', '2024-08-28',
               '2024-08-29', '2024-08-30', '2024-08-31'],
              dtype='datetime64[ns]', freq='D')

## Generate a toy inventory dataset 
with the same 1000 products and at least 5 batches for each product with different expiry dates.

### 1. Random Number Generation with Uniform Distribution
- Assign inventoryed sales using a uniform distribution, where each product has an equal chance of selling any number within a defined range.

In [None]:
# Pick a random number as minmum and another random number as maxmum to be a range for inventory and expiry_date
min_inventory = 10
max_inventory = 100

In [14]:
random_inventory_generated_data = []
for product_id in products_ids:
    num_batches = np.random.randint(min_batches_per_product, min_batches_per_product + 3)  
    batch_ids = range(1, num_batches + 1)
    # Generate random expiry dates within the date range
    expiry_dates = pd.to_datetime(np.random.choice(pd.date_range(start_expiry, end_expiry), size=num_batches, replace=False))
    # Sort expiry dates to simulate realistic batch expiries
    expiry_dates = expiry_dates.sort_values()
    # Generate random inventory levels
    inventories = np.random.randint(min_inventory, max_inventory + 1, size=num_batches)
    
    for batch_id, expiry_date, inventory in zip(batch_ids, expiry_dates, inventories):
        random_inventory_generated_data.append([
            int(product_id),
            int(batch_id),
            expiry_date.strftime('%d/%m/%Y'),
            int(inventory)
        ])
random_inventory_generated_data[:5]

[[1, 1, '07/08/2024', 89],
 [1, 2, '12/08/2024', 33],
 [1, 3, '17/08/2024', 56],
 [1, 4, '28/08/2024', 58],
 [1, 5, '31/08/2024', 80]]

In [17]:
# Convert to DataFrame
random_inventory_df = pd.DataFrame(random_inventory_generated_data, columns=INVENTORY_COLUMNS)

In [18]:
random_inventory_df.head()

Unnamed: 0,PRODUCT_ID,BATCH_ID,EXPIRY_DATE,INVENTORY
0,1,1,07/08/2024,89
1,1,2,12/08/2024,33
2,1,3,17/08/2024,56
3,1,4,28/08/2024,58
4,1,5,31/08/2024,80


In [20]:
random_inventory_df.EXPIRY_DATE.min(), random_inventory_df.EXPIRY_DATE.max()

('01/08/2024', '31/08/2024')

In [21]:
random_inventory_df.INVENTORY.min()

np.int64(10)

In [19]:
random_inventory_df.describe()

Unnamed: 0,PRODUCT_ID,BATCH_ID,INVENTORY
count,6033.0,6033.0,6033.0
mean,501.759158,3.572684,55.118681
std,288.299403,1.81224,26.329493
min,1.0,1.0,10.0
25%,253.0,2.0,32.0
50%,502.0,4.0,55.0
75%,752.0,5.0,78.0
max,1000.0,7.0,100.0


**Constraints Verification:**
- At least 5 batches per product: Ensured by setting min_batches = 5.
- Expiry dates between August 1st and August 31st, 2024: Achieved by selecting dates within this range.:

### 2.Random Number Generation with Normal Distribution
- Use a normal (Gaussian) distribution to generate sales inventorys, allowing for more realistic variation around a mean value.

In [22]:
mean_inventory = 50
std_dev_inventory = 15

In [25]:
random_ND_inventory_data = []

for product_id in products_ids:
    num_batches = np.random.randint(min_batches_per_product, min_batches_per_product + 3)
    batch_ids = range(1, num_batches + 1)
    expiry_dates = pd.to_datetime(np.random.choice(pd.date_range(start_expiry, end_expiry), size=num_batches, replace=False))
    expiry_dates = expiry_dates.sort_values()
    # Generate inventory levels using normal distribution
    inventories = np.random.normal(loc=mean_inventory, scale=std_dev_inventory, size=num_batches)
    # Ensure inventories are non-negative and rounded
    inventories = [max(1, int(round(inv))) for inv in inventories]
    
    for batch_id, expiry_date, inventory in zip(batch_ids, expiry_dates, inventories):
        random_ND_inventory_data.append([
            int(product_id),
            int(batch_id),
            expiry_date.strftime('%d/%m/%Y'),
            int(inventory)
        ])
    
random_ND_inventory_data[:5]

[[1, 1, '09/08/2024', 32],
 [1, 2, '10/08/2024', 59],
 [1, 3, '11/08/2024', 34],
 [1, 4, '20/08/2024', 39],
 [1, 5, '21/08/2024', 37]]

In [27]:
# generate dataframe from it 
random_ND_inventory_df = pd.DataFrame(random_ND_inventory_data, columns=INVENTORY_COLUMNS)
random_ND_inventory_df.head()

Unnamed: 0,PRODUCT_ID,BATCH_ID,EXPIRY_DATE,INVENTORY
0,1,1,09/08/2024,32
1,1,2,10/08/2024,59
2,1,3,11/08/2024,34
3,1,4,20/08/2024,39
4,1,5,21/08/2024,37


In [28]:
random_ND_inventory_df.describe()

Unnamed: 0,PRODUCT_ID,BATCH_ID,INVENTORY
count,6040.0,6040.0,6040.0
mean,500.048013,3.575662,50.057119
std,288.653646,1.813339,14.843642
min,1.0,1.0,3.0
25%,250.0,2.0,40.0
50%,501.5,4.0,50.0
75%,750.0,5.0,60.0
max,1000.0,7.0,103.0


**Constraints Verification:**

- At least 5 batches per product: As before, ensured via min_batches.
- Expiry dates within range: Dates are selected from the specified range.

### 3.Generation with Incorporating Seasonality and Trends
- Simulate patterns such as weekdays vs. weekends, holidays, or promotional periods.

##### Steps:

1. **Identify Seasonal Factors:**:
   - Weekends: Inventory levels may be higher or lower due to anticipated changes in demand.
   - Specific Dates: Certain days (e.g., holidays) might influence inventory requirements.
  
2. **Define Baseline Inventory:**:
   - Establish a base inventory level for each product, representing the typical batch size

3. **Apply Seasonal Adjustments:**:
   - Modify the baseline inventory using factors that represent seasonality (e.g., increase inventory by 20% for batches expiring on weekends).

4. **Add Random Variation**:
   - Introduce randomness to the adjusted inventory levels to mimic real-world unpredictability.


In [29]:
dates = pd.date_range(start=start_expiry, end=end_expiry)
weekend_days = dates[dates.weekday >= 4].strftime('%d/%m/%Y').tolist() # Egyption weekdays Friday and Saturday

In [30]:
weekend_days

['02/08/2024',
 '03/08/2024',
 '04/08/2024',
 '09/08/2024',
 '10/08/2024',
 '11/08/2024',
 '16/08/2024',
 '17/08/2024',
 '18/08/2024',
 '23/08/2024',
 '24/08/2024',
 '25/08/2024',
 '30/08/2024',
 '31/08/2024']

In [33]:
inventory_seasonality_data = []
for product_id in range(1, num_products + 1):
    baseline_inventory = np.random.randint(50, 150)
    num_batches = np.random.randint(min_batches_per_product, min_batches_per_product + 3)
    
    # Generate expiry dates and sort them
    expiry_dates = np.random.choice(dates, size=num_batches, replace=False)
    expiry_dates.sort()
    
    # Generate batches
    for i, expiry_date in enumerate(expiry_dates):
        # Convert numpy.datetime64 to Python datetime object
        expiry_date = pd.to_datetime(expiry_date)
        expiry_date_str = expiry_date.strftime('%d/%m/%Y')
        
        # Step 3: Apply seasonal adjustments
        inventory_level = baseline_inventory
        
        # If the expiry date is on a weekend, increase inventory by 20%
        if expiry_date.weekday() >= 5:  # Check if it's a weekend
            inventory_level *= 1.2
        
        # Step 4: Add random variation
        inventory_level += np.random.normal(0, baseline_inventory * 0.1)
        
        # Ensure inventory is at least 1 and round
        inventory_level = max(1, int(round(inventory_level)))
        
        # Assign batch ID
        batch_id = i + 1
        
        # Compile data
        inventory_seasonality_data.append([
            int(product_id),
            int(batch_id),
            expiry_date_str,
            int(inventory_level)
        ])

inventory_seasonality_data[:5]


[[1, 1, '01/08/2024', 158],
 [1, 2, '03/08/2024', 145],
 [1, 3, '06/08/2024', 121],
 [1, 4, '21/08/2024', 115],
 [1, 5, '22/08/2024', 114]]

In [35]:
# Create DataFrame
inventory_seasonality_df = pd.DataFrame(inventory_seasonality_data, columns=INVENTORY_COLUMNS)
inventory_seasonality_df.head()

Unnamed: 0,PRODUCT_ID,BATCH_ID,EXPIRY_DATE,INVENTORY
0,1,1,01/08/2024,158
1,1,2,03/08/2024,145
2,1,3,06/08/2024,121
3,1,4,21/08/2024,115
4,1,5,22/08/2024,114


**Constraints Verification:**

- At least 5 batches per product: As before, ensured via min_batches.
- Expiry dates within range: Dates are selected from the specified range.

### 4.Clustering Products into Categories with Different Sales Patterns 
`(Not Implemented)`  **Future Enhancement**

- Group products into categories (e.g., high-demand, medium-demand, low-demand) and assign different sales behaviors.

### Testing the dataframes lengths of different methods

In [37]:
#enusre columns of each dataframe
assert list(random_inventory_df.columns) == list(random_ND_inventory_df.columns) == list(inventory_seasonality_df.columns)

### Saving results csv files

In [41]:
random_inventory_df.to_csv(data_dir+'method_1_uniform_random_inventory.csv', index=False)
random_ND_inventory_df.to_csv(data_dir+'method_2_random_ND_inventory.csv', index=False)
inventory_seasonality_df.to_csv(data_dir+'method_3_seasonality_inventory.csv', index=False)