# RETRAINING SCRIPT

- Models gradually lose predictive power over time (data drifts, the market evolves...)
- There is no fixed rule for how often to retrain (e.g., insurance or energy companies may retrain every 5 years, while in digital advertising it can be every few seconds). 
- Typically, retraining is triggered when predictive performance drops by 5–10%. 
- We’ll keep this code ready, although the one we’ll actually use is the execution script. 

IMPORTANT: This  code must be executed in the exact same environment in which it was originally created.

The enviroment can be installed on a new machine using the *riesgos.yml* file created (or activated) during the project setup.

Copy the file *riesgos.yml* to your working directory and run this command in the terminal (or Anaconda Prompt): 

*conda env create --file riesgos.yml --name riesgos*

In [1]:
import pandas as pd
import cloudpickle
import os
import warnings 
warnings.filterwarnings("ignore")

# Import dataset
project_path = '/Users/rober/retail-stockout-risk-scoring/'
file_name_data = 'retail_store_inventory.csv'
path = project_path + '/02_Data/01_Raw/' + file_name_data 
df = pd.read_csv(path)

# --- Transformations outside the pipeline ---

# Rename columns
df = df.rename(columns={
    "Date": "date",
    "Store ID": "store_id",
    "Product ID": "product_id",
    "Category": "category",
    "Region": "region",
    "Inventory Level": "inventory_level",
    "Units Sold": "units_sold",
    "Units Ordered": "units_ordered",
    "Demand Forecast": "demand_forecast",
    "Price": "price",
    "Discount": "discount",
    "Weather Condition": "weather",
    "Holiday/Promotion": "holiday_promo",
    "Competitor Pricing": "competitor_pricing",
    "Seasonality": "seasonality",
})

# Convert data types
df['holiday_promo'] = df['holiday_promo'].astype('category')
df['date'] = pd.to_datetime(df['date'])

# Create the target variable: stockout_14d
df['stockout_14d'] = (df['inventory_level'] <= df['demand_forecast'] * 14).astype(int)

# X/y split
y = df['stockout_14d']
X = df.drop(columns=['stockout_14d'])


# --- PIPELINE ---

# Load UNTRAINED pipeline
pipe_retraining_path = project_path + '/04_Models/pipe_retraining.pkl'
with open(pipe_retraining_path, 'rb') as f:
    pipeline = cloudpickle.load(f)

# Retrain pipeline
pipeline.fit(X, y)

# Save new trained execution pipeline
pipe_execution_path = project_path + '/04_Models/pipe_execution.pkl'
with open(pipe_execution_path, 'wb') as f:
    cloudpickle.dump(pipeline, f)