# The Palatable Diet Problem  

**Problem Description.** The Diet Problem is the first large-scale optimization problem to be solved with the Simplex algorithm by Jack Laderman in [1947](https://www.mpi-inf.mpg.de/fileadmin/inf/d1/teaching/winter18/Ideen/Materialien/Dantzig-Diet.pdf). The basic formulation of this problem consists of minimizing the cost of a food basket while meeting the specified nutrient requirements. In this notebook, we solve The Palatable Diet Problem (TPDP), where the basic model is extended with a constraint on the food basket palatability. An explicit formula of the palatability constraint is unknown, but we have data on several food baskets and the respective palatability score. First, we define a conceptual model with the *known constraint*. Then, OptiCL is used to learn and embed the palatability constraint.  
(*TPDP is part of a larger optimization problem which simultaneously  optimizes  the  food  basket  to  be  delivered,  the  sourcing  plan,  the  delivery  plan, and  the  transfer  modality  of  a  month-long  food  supply in a Wolrd Food Program setting ([Maragno et al., 2021])*).

<font color='#808080'>**Objective function:** minimize the total cost of the food basket.</font>  
$\min_{\boldsymbol{x}} c^\top \boldsymbol{x}$

*subject to* 

<font color='#808080'>**Nutritional constraints:** for each nutrient $l\in\mathcal{L}$, at least meet the minimum required level.</font>  
$ \sum_{k \in \mathcal{K}} nutval_{kl} x_{k} \geq nutreq_{l}, \ \ \ \forall l\in\mathcal{L},$   
<font color='#808080'>**Constraints on sugar and salt.**</font>  
$ x_{salt} = 5,$   
$ x_{sugar} = 20,$  
<font color='#808080'>**Palatability constraints:** the food basket palatability has to be at least equal to $t$.</font>  
$ y \geq t,$  
<font color='#808080'>**Learned predictive model:** the palatability is defined using a predictive model.</font>  
$ y = \hat{h}(\boldsymbol{x}),$   
<font color='#808080'>**Non negativity constraints.**</font>  
$ x_{k} \geq 0, \ \ \ \forall k \in \mathcal{K}.$  

In [1]:
import pandas as pd
from imp import reload
import numpy as np
import math
from sklearn.utils.extmath import cartesian
import time
import sys
import os
sys.path.append(os.path.abspath('../../src'))  # TODO: has to be changed
import opticl
from pyomo import environ
from pyomo.environ import *
np.random.seed(0)

In [234]:
sys.path.append(os.path.abspath('../../opticl'))
import constraint_learning
import embed_mip

### Data Loading  
**nutr_val**: nutritional values for each of the 25 foods  
**nutr_req**: 11 nutrition requirements  
**cost_p**: vector of procurement costs  
**dataset**: dataframe of food basket instances and relative palatability score

In [2]:
nutr_val = pd.read_excel('processed-data/Syria_instance.xlsx', sheet_name='nutr_val', index_col='Food')
nutr_req = pd.read_excel('processed-data/Syria_instance.xlsx', sheet_name='nutr_req', index_col='Type')
cost_p = pd.read_excel('processed-data/Syria_instance.xlsx', sheet_name='FoodCost', index_col='Supplier').iloc[0,:]
dataset = pd.read_csv('processed-data/WFP_dataset.csv').sample(frac=1)
dataset.head()

Unnamed: 0,Beans,Bulgur,Cheese,Fish,Meat,CSB,Dates,DSM,Milk,Salt,...,Soya-fortified bulgur wheat,Soya-fortified maize meal,Soya-fortified sorghum grits,Soya-fortified wheat flour,Sugar,Oil,Wheat,Wheat flour,WSB,label
398,0.687675,1.257354,0.0,0.0,0.0,0.0,0.0,0.302104,0.0,0.05,...,0.0,0.0,0.0,0.0,0.2,0.357429,2.823603,0.0,0.637964,0.715428
3833,0.551125,0.0,0.0,0.0,0.0,0.0,0.0,0.11799,0.0,0.05,...,0.0,0.0,0.0,0.0,0.2,0.392274,2.540599,3.414615,0.7333,0.292719
4836,0.701614,0.0,0.0,0.0,0.0,0.09499,0.0,0.330808,0.0,0.05,...,0.0,0.0,0.0,0.0,0.2,0.221908,0.336647,0.0,0.545864,0.816616
4572,0.0,3.832166,0.0,0.0,0.0,0.0,0.626751,0.278648,0.132718,0.05,...,0.0,0.0,0.0,0.0,0.2,0.311117,0.0,0.0,0.694007,0.79468
636,0.039754,0.0,0.344293,0.0,0.0,0.0,0.0,0.106482,0.0,0.05,...,0.0,0.0,0.0,0.0,0.2,0.16022,0.0,0.0,0.78879,0.261417


# OptiCL: Optimization with Constraint Learning

## Step 1: Conceptual Model

In [3]:
def init_conceptual_model(cost_p):
    N = list(nutr_val.index)  # foods
    M = nutr_req.columns  # nutrient requirements

    model = ConcreteModel('TPDP')

    '''
    Decision variables
    '''
    model.x = Var(N, domain=NonNegativeReals)  # variables controlling the food basket

    '''
    Objective function.
    '''
    def obj_function(model):
        return sum(cost_p[food].item()*model.x[food] for food in N)

    model.OBJ = Objective(rule=obj_function, sense=minimize)

    '''
    Nutrients requirements constraint.
    '''
    def constraint_rule1(model, req):
        return sum(model.x[food] * nutr_val.loc[food, req] for food in N) >= nutr_req[req].item()
    model.Constraint1 = Constraint(M, rule=constraint_rule1)
    '''
    Sugar constraint
    '''
    def constraint_rule2(model):
        return model.x['Sugar'] == 0.2
    model.Constraint2 = Constraint(rule=constraint_rule2)
    '''
    Salt constraint
    '''
    def constraint_rule3(model):
        return model.x['Salt'] == 0.05
    model.Constraint3 = Constraint(rule=constraint_rule3)
    
    return model

## Step 2: Data Processing  
y is binarized using a user-defined threshold.

In [32]:
bin_threshold = float(input('Insert a threshold between 0 and 1 to binarize the label: '))
y = dataset['label']
# y_new = []
# for i in y:
#     if i < 0.3:
#         y_new.append(0)
#     elif i < 0.6:
#         y_new.append(1)
#     else:
#         y_new.append(2)
X = dataset.drop(['label'], axis=1, inplace=False)
y = y>=bin_threshold
from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.8, random_state=42)

Insert a threshold between 0 and 1 to binarize the label: 0.5


## Part 3: Learn the predictive models
'alg_list' specifies the list of algorithms that you will consider in the training pipeline. If you have the InterpretableAI license, you can include **iai** (Optimal Trees with Hyperplanes) or **iai-single** (Optimal Trees with single feature splits) in the list. If using IAI, you must specify the metric as 'r2'. Otherwise, the default metric is 'neg_squared_mse'.

In [181]:
version = 'TPDP_v1'
alg_list = ['gbm']
outcome_list = ['palatability']  # Constraint to be learned

#### Train models (or skip if pre-saved)  
The training will use only regression models. 

In [213]:
reload(constraint_learning)
task_type = 'binary'
performance = pd.DataFrame()

if not os.path.exists('results/'):
    os.makedirs('results/')

for outcome in outcome_list:
    print(f'Learning a constraint for {outcome}')

    for alg in alg_list:
        if not os.path.exists('results/%s/' % alg):
            os.makedirs('results/%s/' % alg)
        print(f'Training {alg}')
        s = 1

        ## Run shallow/small version of RF
        alg_run = 'rf_shallow' if alg == 'rf' else alg

        m, perf = opticl.run_model(X_train, y_train, X_test, y_test, alg_run, outcome, task = task_type,
                               seed = s, cv_folds = 5, 
                               # metric = 'r2',
                               save = False,
                              )

        ## Save model
        constraintL = constraint_learning.ConstraintLearning(X_train, y_train, m, alg)
        constraint_add = constraintL.constraint_extrapolation(task_type)
        constraint_add.to_csv('results/%s/%s_%s_model.csv' % (alg, version, outcome), index = False)

        ## Extract performance metrics
        try:
            perf['auc_train'] = roc_auc_score(y_train >= threshold, m.predict(X_train))
            perf['auc_test'] = roc_auc_score(y_test >= threshold, m.predict(X_test))
        except: 
            perf['auc_train'] = np.nan
            perf['auc_test'] = np.nan

        perf['seed'] = s
        perf['outcome'] = outcome
        perf['alg'] = alg
        perf['task'] = task_type
        perf['save_path'] = 'results/%s/%s_%s_model.csv' % (alg, version, outcome)
        
            
        perf.to_csv('results/%s/%s_%s_performance.csv' % (alg, version, outcome), index = False)
        
        performance = performance.append(perf)
        print()
print('Saving the performance...')
performance.to_csv('results/%s_performance.csv' % version, index = False)
print('Done!')

Learning a constraint for palatability
Training gbm
------------- Initialize grid  ----------------
------------- Running model  ----------------
Algorithm = gbm, metric = None
------------- Model evaluation  ----------------
-------------------training evaluation-----------------------
Train Score: 0.9998952250847266
-------------------testing evaluation-----------------------
Test Score: 0.9563912551221156

Saving the performance...
Done!


## Step 4: Predictive model selection and Optimization

In [38]:
question2 = 0.5
print('What is the palatability threshold that you want to use in the constraint? The default is 0.5.')
question2 = input(' Choose in the range (0, 1): ')
if float(question2) > 1:
    question2 = 0.5
elif float(question2) < 0:
    question2 = 0.5
threshold = float(question2)

What is the palatability threshold that you want to use in the constraint? The default is 0.5.
 Choose in the range (0, 1): 0.5


In [191]:
outcome_list = ['palatability']
constraints_embed = ['palatability']
objectives_embed = {}
performance = pd.read_csv('results/%s_performance.csv' % version)
performance.dropna(axis='columns')

Unnamed: 0,save_path,seed,cv_folds,parameters,best_params,valid_score,train_score,test_score,outcome,alg,task
0,results/gbm/TPDP_v1_palatability_model.csv,1,5,"{'learning_rate': [0.01, 0.025, 0.05, 0.075, 0...","{'learning_rate': 0.2, 'max_depth': 5, 'n_esti...",0.956209,0.999895,0.956391,palatability,gbm,binary


In [251]:
reload(embed_mip)
model_master = embed_mip.model_selection(performance, constraints_embed, objectives_embed)
model_master['lb'] = 0.5
model_master['ub'] = None
embed_mip.check_model_master(model_master)
model_master

        outcome model_type                                   save_path  \
0  palatability        gbm  results/gbm/TPDP_v1_palatability_model.csv   

     task  objective  
0  binary          0  
No learned objective

Embedding constraint for palatability using gbm model.
0.5 <= palatability
The outcome 'palatability' is a probability


Unnamed: 0,outcome,model_type,save_path,task,objective,lb,ub
0,palatability,gbm,results/gbm/TPDP_v1_palatability_model.csv,binary,0,0.5,


In [255]:
trust_region = input('Do you want to use the trust region? True\False: ')

Do you want to use the trust region? True\False: False


In [256]:
def getSolution(model, X):
    solution = {}
    palatability = 0
    count = 0
    for v in model.getVars():
        if 'x[' in v.varName:
            solution[list(X.columns)[count]]=[v.x]
            print(v.varName)
            count += 1
    return solution

In [266]:
reload(embed_mip)
result = {}
conceptual_model= init_conceptual_model(cost_p)
MIP_final_model = embed_mip.optimization_MIP(conceptual_model, conceptual_model.x, model_master, X, tr=trust_region=='True')
opt = SolverFactory('gurobi')
print('---------------------Solving the optimization problem---------------------')
results = opt.solve(MIP_final_model) 
solution = {}
for food in  list(nutr_val.index):
    if value(MIP_final_model.x[food])*100 > 0.0000001:
        solution[food] = str(np.round(value(MIP_final_model.x[food])*100, 2))+'g'
print('The optimal solution is: \n', solution)
print(f"The predicted palatability of the optimal solution is {value(MIP_final_model.y['palatability'])}")

Embedding constraints for palatability
-0.0
---------------------Solving the optimization problem---------------------
The optimal solution is: 
 {'CSB': '2.19g', 'Milk': '50.78g', 'Salt': '5.0g', 'Maize': '135.75g', 'Sugar': '20.0g', 'Oil': '20.6g', 'Wheat': '276.89g', 'WSB': '69.65g'}
The predicted palatability of the optimal solution is 0.16500285796268693
