# SET UP

This notebook sets up the project environment and directory structure for the **Retail Stockout Risk Scoring** project. 

It defines the project enviroment, paths, imports required libraries, and loads the initial datasets. 

It also creates the training, validation, and working samples that will be used in later stages. The goal is to establish a clear and organized foundation before beginning data quality, trasnformation and modeling steps.

## PROJECT ENVIROMENT

We will activate the enviroment we use on risk scoring projects (called 'riesgos'). 

This enviroment covers all needs we'll have during this new project.

FYI, steps to create the enviroment 'riesgos':


**1. Create the enviroment (terminal):**

conda create --name riesgos -c conda-forge python=3.11.11 numpy pandas matplotlib seaborn scikit-learn=1.3.1 scipy=1.9.3 sqlalchemy xgboost=2.0.3

**2. Activate the enviroment:**

conda activate riesgos

**3. Install libraries from other channels:**

conda install -c conda-forge pyjanitor scikit-plot yellowbrick imbalanced-learn cloudpickle 

conda install -c districtdatalabs yellowbrick

conda install -c conda-forge streamlit

pip install notebook==5.7.8 jupyter_client jupyter_contrib_nbextensions category_encoders==2.6.2

pip install streamlit-echarts

pip install pipreqs

**4. Save the environment as a .yml**

conda env export > riesgos.yml

## IMPORT LIBRARIES

In [19]:
import os
import numpy as np
import pandas as pd

#Automcomplete
%config IPCompleter.greedy=True

## DIRECTORY

In [20]:
root = '/Users/rober/'

### Project name

In [21]:
dir_name = 'retail-stockout-risk-scoring'

### Create the directory and project structure

In [22]:
path = root + dir_name

In [23]:
try:
    os.mkdir(path)
    os.mkdir(path + '/01_Documents')
    os.mkdir(path + '/02_Data')
    os.mkdir(path + '/02_Data/01_Raw')
    os.mkdir(path + '/02_Data/02_Validation')
    os.mkdir(path + '/02_Data/03_Working')
    os.mkdir(path + '/02_Data/04_Caches')
    os.mkdir(path + '/03_Notebooks')
    os.mkdir(path + '/03_Notebooks/01_Functions')
    os.mkdir(path + '/03_Notebooks/02_Development')
    os.mkdir(path + '/03_Notebooks/03_System')
    os.mkdir(path + '/04_Models')
    os.mkdir(path + '/05_Outputs')
    os.mkdir(path + '/09_Other')
    
except OSError:
    print ("The directory %s has NOT been created" % path)
else:
    print ("The directory %s has been succesfully created" % path)

The directory /Users/rober/retail-stockout-risk-scoring has NOT been created


## IMPORT DATA

In [24]:
file_name_data = 'retail_store_inventory.csv'

In [25]:
path_data_raw = path + '/02_Data/01_Raw/' + file_name_data 

data = pd.read_csv(path_data_raw)
data

Unnamed: 0,Date,Store ID,Product ID,Category,Region,Inventory Level,Units Sold,Units Ordered,Demand Forecast,Price,Discount,Weather Condition,Holiday/Promotion,Competitor Pricing,Seasonality
0,2022-01-01,S001,P0001,Groceries,North,231,127,55,135.47,33.50,20,Rainy,0,29.69,Autumn
1,2022-01-01,S001,P0002,Toys,South,204,150,66,144.04,63.01,20,Sunny,0,66.16,Autumn
2,2022-01-01,S001,P0003,Toys,West,102,65,51,74.02,27.99,10,Sunny,1,31.32,Summer
3,2022-01-01,S001,P0004,Toys,North,469,61,164,62.18,32.72,10,Cloudy,1,34.74,Autumn
4,2022-01-01,S001,P0005,Electronics,East,166,14,135,9.26,73.64,0,Sunny,0,68.95,Summer
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
73095,2024-01-01,S005,P0016,Furniture,East,96,8,127,18.46,73.73,20,Snowy,0,72.45,Winter
73096,2024-01-01,S005,P0017,Toys,North,313,51,101,48.43,82.57,10,Cloudy,0,83.78,Autumn
73097,2024-01-01,S005,P0018,Clothing,West,278,36,151,39.65,11.11,10,Rainy,0,10.91,Winter
73098,2024-01-01,S005,P0019,Toys,East,374,264,21,270.52,53.14,20,Rainy,0,55.80,Spring


The dataset contains daily observations for each store‚Äìproduct pair from 2022 to 2024.
Each row represents one product in one store on one day.

Main variables:

üìÜ **Date**

üè™ **Store ID / Product ID** - Define the time series grouping: one sequence per store‚Äìproduct.

üè∑Ô∏è **Category / Region** - Product category / Store region.

üì¶ **Inventory Level** - Stock available at the beginning of the day. **--> Core variable for detecting stockouts and modeling inventory behavior**.

üõí **Units Sold** - Actual demand (sales) for that day.

üì• **Units Ordered** - Replenishment orders placed on that day.

üìà **Demand Forecast** - Provided demand prediction for the day.

üí∞ **Price, Discount, Competitor Pricing**

üå¶Ô∏è **Weather Condition, Holiday/Promotion, Seasonality** - External factors influencing demand patterns.

### Rename columns

In [26]:
data = data.rename(columns={
    "Date": "date",
    "Store ID": "store_id",
    "Product ID": "product_id",
    "Category": "category",
    "Region": "region",
    "Inventory Level": "inventory_level",
    "Units Sold": "units_sold",
    "Units Ordered": "units_ordered",
    "Demand Forecast": "demand_forecast",
    "Price": "price",
    "Discount": "discount",
    "Weather Condition": "weather",
    "Holiday/Promotion": "holiday_promo",
    "Competitor Pricing": "competitor_pricing",
    "Seasonality": "seasonality",
})

data

Unnamed: 0,date,store_id,product_id,category,region,inventory_level,units_sold,units_ordered,demand_forecast,price,discount,weather,holiday_promo,competitor_pricing,seasonality
0,2022-01-01,S001,P0001,Groceries,North,231,127,55,135.47,33.50,20,Rainy,0,29.69,Autumn
1,2022-01-01,S001,P0002,Toys,South,204,150,66,144.04,63.01,20,Sunny,0,66.16,Autumn
2,2022-01-01,S001,P0003,Toys,West,102,65,51,74.02,27.99,10,Sunny,1,31.32,Summer
3,2022-01-01,S001,P0004,Toys,North,469,61,164,62.18,32.72,10,Cloudy,1,34.74,Autumn
4,2022-01-01,S001,P0005,Electronics,East,166,14,135,9.26,73.64,0,Sunny,0,68.95,Summer
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
73095,2024-01-01,S005,P0016,Furniture,East,96,8,127,18.46,73.73,20,Snowy,0,72.45,Winter
73096,2024-01-01,S005,P0017,Toys,North,313,51,101,48.43,82.57,10,Cloudy,0,83.78,Autumn
73097,2024-01-01,S005,P0018,Clothing,West,278,36,151,39.65,11.11,10,Rainy,0,10.91,Winter
73098,2024-01-01,S005,P0019,Toys,East,374,264,21,270.52,53.14,20,Rainy,0,55.80,Spring


### Create validation dataset

In [27]:
val = data.sample(frac=0.3, random_state=42)

validation_file = 'validation.csv'
validation_path = path + '/02_Data/02_Validation/' + validation_file

val.to_csv(validation_path, index=False)

### Create working dataset

In [28]:
work = data.loc[~data.index.isin(val.index)]

work_file = 'work.csv'
work_path = path + '/02_Data/03_Working/' + work_file

work.to_csv(work_path, index=False)

### Create a sample

In [29]:
sample = work.sample(n=20000, random_state=42)

sample_file = 'sample.csv'
sample_path = path + '/02_Data/03_Working/' + sample_file

sample.to_csv(sample_path, index=False)