# DEVELOP APP 
Notebook to develop app - focused in re-use the codes of notebook: "5_optimization/2_multiple_ml_models"

# Gurobi optimization using multiple machine learning models
## Optimize for Price and Supply of Avocados
- In this example there multiple linear regressions, but gurobi machine learning acept multiple models. Documentation **"gurobi-machinelearning"**

https://gurobi-machinelearning.readthedocs.io/en/stable/api.html


- In addition, to define the decision variables, parameters, restriction, etc of the optimization model are used **"gurobipy-pandas"**. Using this package is possible define the optimization model using pandas DataFrames

https://gurobipy-pandas.readthedocs.io/en/stable/

In [None]:
import pickle
import pandas as pd
import numpy as np
import os

#gurobi
import gurobipy_pandas as gppd
from gurobi_ml import add_predictor_constr
import gurobipy as gp

# USER INPUT
VALUES THAT THE USER INGRESS TO TEST THE OPTIMIZATION

In [None]:
### PRICES MIN AND MAX OF PRODUCT
input_product_price_min = 0
input_product_price_max = 2

### SUPPLY PRODUCT FOR EACH REGIONS
input_supply_product = 25

### COSTS - TRANSPORT - WASTE - ETC
input_c_waste = 0.1

input_c_transport_Great_Lakes = 0.3
input_c_transport_Midsouth = 0.1
input_c_transport_Northeast = 0.4
input_c_transport_Northern_New_England = 0.5
input_c_transport_SouthCentral = 0.3
input_c_transport_Southeast = 0.2
input_c_transport_West = 0.2
input_c_transport_Plains = 0.2

### SEASONALY 1: peak moths, 0: no
input_seasonality_peak = 0

## PREPARATION

### 1. Load data needs to use
In this example data is loaded because it is necesary to generate parameters of optimization model

In [None]:
def read_historical_data():
    ##### read data that have all the units sold for each region
    path_data_basic_features = 'data/data_basic_features.pkl'
    data_units_sold = pd.read_pickle(path_data_basic_features)
    
    ##### use data to generate parameters for optimization model
    # min, max deliry each region
    data_min_delivery = data_units_sold.groupby("region")["units_sold"].min().rename('min_delivery')
    data_max_delivery = data_units_sold.groupby("region")["units_sold"].max().rename('max_delivery')
    
    # historical distribution of price each region
    data_historical_max_price = data_units_sold.groupby("region")["price"].max().rename('max_price')

    return data_min_delivery, data_max_delivery, data_historical_max_price

In [None]:
data_min_delivery, data_max_delivery, data_historical_max_price = read_historical_data()

In [None]:
data_min_delivery

In [None]:
data_max_delivery

### 2. Load model machine learning
Load models that given an input (price of each regions and other features) predict the price (One different model to predict the price of each region)

The model was trained in the notebook "models/5_prices_regions_multiple_lr"

In [None]:
def read_ml_models_trained():
    # params
    path_folder_artifacts = 'models/'
    list_models_names = os.listdir(path_folder_artifacts)
    
    ### load models
    dict_models = {}
    for model_name in list_models_names:
        # params
        #print(f'loading model: {model_name}')
        path_model = path_folder_artifacts + model_name
        
        # load
        aux = model_name.split('.')[0].split('_')[1:]
        model_name_index = '_'.join(aux)
        with open(path_model, 'rb') as artifact:
            dict_models[model_name_index] = pickle.load(artifact)

    return dict_models

In [None]:
dict_models = read_ml_models_trained()

## RUN OPTIMIZATION

### 0. Load transversal params - sets of optimization model
Transversal all codes, not only this code. For example order in features in the data.

Save the sets of optimization model as pandas index

In [None]:
list_regions = ['Great_Lakes', 'Midsouth', 'Northeast', 'Northern_New_England', 'Plains', 'SouthCentral', 'Southeast', 'West']
regions = list_regions
index_regions = pd.Index(regions)

### 1. Create guroby optimization model
Documentation: https://www.gurobi.com/documentation/current/refman/py_model.html

In [None]:
model_opt = gp.Model(name = "Avocado_Price_Allocation")

### 2. Upper bounds and lower bounds of decision variables
Values that are boundss in decision variables. In gurobi the upper and lower boundss could be defined in the same moment that variables are created and not are defined as restrictions explicitly 

- $a_{min},a_{max}$: minimum and maximum price ($\$$) per avocado (price is a input of machine learning model)
- $b^r_{min},b^r_{max}$: minimum and maximum number of avocados allocated to region $r$

In [None]:
# product_price_min, product_price_max: min and max price of product A
product_price_min = input_product_price_min
product_price_max = input_product_price_max


# b_min(r), b_max(r): min and max historical products send to each region (value get from historical data)
b_min = data_min_delivery
b_max = data_max_delivery

### 3. Input parameters of optimization model
##### That are not decision variables either parameters of machine learning model)

**Set**
- $r$ : will be used to denote each region


**Parameters Optimization Model**
- $B$: available avocados to be distributed across the regions.Total amount of avocado supply

- $c_{waste}$: cost ($\$$) per wasted avocado

- $c^r_{transport}$: cost ($\$$) of transporting a avocado to region $r$

In [None]:
# B: supply product
B = input_supply_product


# c_waste: cost of waste product
c_waste = input_c_waste


# c_transport(r): cost transport for each region
c_transport = pd.Series(
    {
        "Great_Lakes": input_c_transport_Great_Lakes,
        "Midsouth": input_c_transport_Midsouth,
        "Northeast": input_c_transport_Northeast,
        "Northern_New_England": input_c_transport_Northern_New_England,
        "SouthCentral": input_c_transport_SouthCentral,
        "Southeast": input_c_transport_Southeast,
        "West": input_c_transport_West,
        "Plains": input_c_transport_Plains,
    }, name='transport_cost')
c_transport = c_transport.loc[regions]

### 4. Features input machine learning model fixed (that are not decision variables or parameters in optimization model)
Define the features that are inputs of machine learning model that are not decision variables of optimization model (so this values doesn't change). And also, this features that are not parameters of optimization model, so this values are not used in the restrictions

In [None]:
peak_or_not = input_seasonality_peak
instance_ml_model = pd.DataFrame(
    data={
        "peak": peak_or_not
    },
    index=regions
)

### 5. Decision variables of optimization model

Let us now define the decision variables. In our model, we want to store the price and number of avocados allocated to each region. We also want variables that track how many avocados are predicted to be sold and how many are predicted to be wasted. 

- $p(r)$ the price of an avocado ($\$$) in each region. The maxium price. It is a feature of machine learning model
- $x(r)$ the number of avocados supplied to each region
- $s(r)$ the predicted number of avocados sold in each region
- $u(r)$ the predicted number of avocados unsold (wasted) in each region
- $d(r)$ the predicted demand in each region. It is the target of machine learning model (because this value change according the input, it is a decision variable)

All those variables are created using gurobipy-pandas, with the function `gppd.add_vars`. To use this function it is necessary to define:
- model: optimization model of gurobi
- index: pandas index. With this index it can defined the sets of the decision variables
- name: name of the decision variable
- Example: x = gppd.add_vars(model, index, name="x")

In [None]:
# p(r): price. feature of machine learning model
price = gppd.add_vars(model_opt, index_regions, name = "price", lb = product_price_min, ub = product_price_max) # bounds prices


# x(r): supply
supply = gppd.add_vars(model_opt, index_regions, name = "supply", lb = b_min, ub= b_max) # bounds supply - using historical data


# s(r): solds given a certain price
sold = gppd.add_vars(model_opt, index_regions, name = "sold")


# u(r): inventary. units not sold. waste.
inventory = gppd.add_vars(model_opt, index_regions, name = "inventory") 


# d(r): demand. output of machine learning model
demand = gppd.add_vars(model_opt, index_regions, lb = -gp.GRB.INFINITY, name = "demand") # BY DEFULT LOWER BOUND IS ZERO

### 6. Constraints (constraints that are not generated by a ml model)

#### 6.1 Add the Supply Constraint
Make sure that the total number of avocados supplied is equal to $B$
\begin{align*} 
\sum_{r} supply_r &= B 
\end{align*}

In [None]:
model_opt.addConstr(supply.sum() == B, name = 'supply')

#### 6.2 Add Constraints That Define Sales Quantity
The sales quantity is the minimum of the allocated quantity and the predicted demand, i.e., $s_r = \min \{x_r,d_r(p_r)\}$ This relationship can be modeled by the following two constraints for each region $r$.

\begin{align*} 
sold_r &\leq supply_r                \:\:\:\:\forall r\\
sold_r &\leq demand(p_r,r)                   \:\:\:\:\forall r
\end{align*}

In [None]:
gppd.add_constrs(model_opt, sold, gp.GRB.LESS_EQUAL, supply, name = 'solds <= supply')
gppd.add_constrs(model_opt, sold, gp.GRB.LESS_EQUAL, demand, name = 'solds <= demand')

#### 6.3 Add the Wastage Constraints
Define the predicted unsold number of avocados in each region, given by the supplied quantity that is not sold. For each region $r$.

\begin{align*} 
inventory_r &= supply_r - sold_r                 \:\:\:\:\forall r
\end{align*}

In [None]:
gppd.add_constrs(model_opt, inventory, gp.GRB.EQUAL, supply - sold, name = 'waste')

#### 6.4 Model update - add the constraint to gurobi model

In [None]:
model_opt.update()

### 7. Add constraints that are machine learning models
To add constraints that have machine learning models it is necessary define a dataframe that are the instance of prediction (it has columns as gurobi decision variables) and then create the constraint in gurobi.

In this example, where each region has its own model, the dataframe instance also needs to be defined indidually. For the decision variable that are defined in the set "regions" it is important filter the dataframe instance with the correct element of the set region

**So, for each element in set region will be defined the instance dataframe and a constraint. Each region has it own model**Also, the instance has only one row, so now it is possible define a optimization model with set "time" and each row of the dataframe could be the instance of time t, t+1, t+2, etc


**IMPORTANT: LOGICALLY, FOR THIS EXAMPLE, TO DEFINE THE CONSTRAINTS OF ML MODELS, A FOR COULD HAVE BEEN MADE IN THE SET "REGIONS" BUT IT WAS NOT DONE CONSCIOUSLY THINKING OF AN EXAMPLE IN WHICH RESTRICTIONS HAVE TO BE DEFINED IN DIFFERENT SETS**

In [None]:
############ create instance for predict demand fo each region ############

for region in regions:

    # there is a dataframe with features fixed (no decision variables). filter it by region
    aux_features_fixed = instance_ml_model.loc[[region]]  
    
    # create a dataframe with decision variables gurobi. filter it by region. In this example the price of all regions are features of the ml model
    aux_features_decision =  pd.DataFrame(price).T
    aux_features_decision.index = [region]
    
    #name_columns_feature_decision = aux_features_decision.columns # CORRECTION NAME COLUMNS TO BE THE SAME COLUMNS NAMES IN DATAFRAME USED TO TRAIN
    name_columns_feature_decision = ['price_' + name_region for name_region in list_regions]
    name_columns_feature_decision = [column.lower() for column in name_columns_feature_decision]
    aux_features_decision.columns = name_columns_feature_decision
    
    # join into a dataframe instance
    instance = pd.concat([aux_features_fixed, aux_features_decision], axis=1) # generate instance
    
    
    ############ create constraint based in machine learning model ############
    # load model
    model_ml = dict_models[region]
    
    ## add model to predict the demand for each region with the SAME MODEL
    pred_constr = add_predictor_constr(gp_model = model_opt, 
                                       predictor = model_ml, 
                                       input_vars = instance, 
                                       output_vars = demand[region], # filter decision variable for the element of the set region,
                                       name = f'model_predict_{region}'
                                      )
    #pred_constr.print_stats()

### 8. Define Objetive Function
The goal is to maximize the **net revenue**, which is the product of price and quantity, minus costs over all regions. This model assumes the purchase costs are fixed (since the amount $B$ is fixed) and are therefore not incorporated.

\begin{align} 
\textrm{maximize} &  \sum_{r}  (price_r * sold_r - c_{waste} * inventory_r -
c^r_{transport} * supply_r)& 
\end{align}

In [None]:
model_opt.setObjective((price * sold).sum() - c_waste * inventory.sum() - (c_transport * supply).sum(),
               gp.GRB.MAXIMIZE)

### 9. Solve optimization problem
The objective is **quadratic** since we take the product of price and the predicted sales, both of which are variables. Maximizing a quadratic
term is said to be **non-convex**, and we specify this by setting the value of the [Gurobi NonConvex
parameter](https://www.gurobi.com/documentation/10.0/refman/nonconvex.html) to be $2$.

In [None]:
# solve cuadratic problems
model_opt.Params.NonConvex = 2

In [None]:
# solve
model_opt.optimize()

In [None]:
#### know the status of the model - 2 a optimal solution was founded
# docu: https://www.gurobi.com/documentation/current/refman/optimization_status_codes.html#sec:StatusCodes
model_opt.Status

### 10. Save optimal values in a dataframe
To get the optimal values of decision variables it is neccesary call "var.gppd.X"

In [None]:
# create dataframe with index
solution = pd.DataFrame(index=index_regions)

# save optimal values
solution["Price"] = price.gppd.X
solution["Historical_Max_Price"] = data_historical_max_price  # this is informative value get from historical data
solution["Allocated(supply)"] = supply.gppd.X
solution["Sold"] = sold.gppd.X
solution["Inventory"] = inventory.gppd.X
solution["Pred_demand"] = demand.gppd.X
solution["Diff Demand - Supply"] = demand.gppd.X - supply.gppd.X

# sum values
total_sum = solution.sum()
total_sum["Price"] = np.NaN
total_sum["Historical_Max_Price"] = np.NaN
solution.loc["Total", :] = total_sum

# round values
solution = solution.round(3)

In [None]:
# show value objetive function
opt_revenue = model_opt.ObjVal
opt_revenue = np.round(opt_revenue, 2)
print(f"\n The optimal net revenue: ${opt_revenue} million")

In [None]:
# show value decision variables
solution

In [None]:
#excel_solution = solution.to_excel('solution.xlsx', index=False)
#excel_solution

# ANALYSIS SOLUTION
Analysis the solution given by the optimization.

For example, see the income generated for each segment of product, etc