### Problem setting
we consider a three stage news vendor problem
1) First stage: agents require certain units of fix-investment from agents <br>
&emsp;variable:<br>
&emsp;&emsp; &emsp;$ q\in (0,10)$ - units of investment<br>
&emsp;coeffients:<br>
&emsp;&emsp; &emsp; 0.3 - cost per unit of investment<br>
            
2) Second stage: vendor buys news paper from agent, the available quantity range depends on the invesment <br>
&emsp;  variable:<br>
&emsp; &emsp; &emsp; $x \in (0,\inf)$ - units of news paper<br>
&emsp; coeffients:<br>
&emsp; &emsp; &emsp;-1 - cost per unit of news paper<br>
&emsp; random variable:<br>
&emsp; &emsp; &emsp; $\mathrm{cap}$ - the availble buying quantity is upper bounded by $\mathrm{cap}*q$

3) Third stage: vendor sells news paper to individuals<br>
&emsp;  variables:<br>
&emsp; &emsp; &emsp; $z \in (0,x)$, sold quantity of news paper<br>
&emsp;  random variables:<br>
&emsp; &emsp; &emsp; $d \in (0,\inf)$, quantity of demands<br> 

#### Stage Scenario Data

In [2]:
'''
Template of scenario data structure
Let x - first stage r.v. 
    y - second stage r.v
    z, d - third stage r.v. 

scenario['third_stage'] = {
    'third_stage':{
        'random_variables':[ 'z','d' ], #list of varaible_name
        'conditions': [ (x_1,y_1),... ]
            # list of realized values of previous random variables
            # = second_stage:condition x second_stage:value
        'values':[ (z_1,d_1),... ]
            # list of all posible values to current random variables
            # for the convience, 
            # we don't decomposite each value pair by r.v.
            # and avoid using joint probabiity representation
        'probabilities': {
                i: [(j,probability), ...],
                .
                .
                .
                num_conditions: [ (j, probibility),... ]
                    # i idx to thrid_stage:condition
                    # j idx to third_stage:values
                    # num_conditions for the non-conditional case
                    # num_conditions:[] if not provided
        }
            # dict of the probabilies responding to third_stage:value
    }
}
# the first two stages can ignore keywords accordingly,
# e.g. scenario['first_stage'] is empty coz there is no randomness
'''
scenario = {}

scenario['second_stage'] = {
    'random_variables': ['cap'],
    'conditions': [],
    'values':[2, 4, 6],
    'probabilies':{
        0: [(0,0.5), (1,0.4),(2,0.1)]
    }
}

scenario['third_stage'] = {
    'random_variables': ['d'],
    'conditions': scenario['second_stage']['values'],
    'conditions_prob':[prob for _,prob in scenario['second_stage']['probabilies'][0] ],
    'values': [10, 14, 16, 18, 22],
    'probabilies':{
        0: [(0,0.5), (1,0.4), (3,0.1)],
        1: [(1,0.4), (2,0.3), (3,0.3)],
        2: [(2,0.4), (3,0.4), (4,0.2)]
    }
}

#### Benchmark

In [3]:
import numpy as np
import math

# decision policy -> a reward r.v.
# integer for all variables
q = 4
# 22 is the upper bound for the potential demands
# supply <= demands according to the problem setting
# q*cap <= d_max, where (cap,d_max) = (2,18), (4,18), (6,22)

# x = policy_x(q,cap)
# we use table to represent the policy
x = { q:{cap: min(q*cap,10) for cap in [2,4,6]} for q in range(1,10)}
# policy that agent buy as much as possible if supply is less than 10
#  otherwise buy 10 units


def ssd_benchmark_by_q_x( q, x, scenario):
    benchmark = {}
    benchmark['first_stage'] = {
        'r': [-q*0.3],
        's': [1.0]
    }
    
    # tem_num_condition = len(scenario['second_stage']['conditions'])
    benchmark['second_stage'] = {
        'r': [-x[q][cap] for cap in scenario['second_stage']['values']],
        's': [ prob for _,prob in scenario['second_stage']['probabilies'][0]]
    }
    
    conditions = scenario['third_stage']['conditions']
    values = scenario['third_stage']['values']
    
    benchmark['third_stage'] = {
        'conditions': scenario['third_stage']['conditions'],
        'r': { 
                scenario['third_stage']['conditions'][c_idx]:\
                [ 1.5*min(x[q][conditions[c_idx]],\
                                    values[v_idx]) for v_idx,p in prob_v] \
                  for c_idx,prob_v in scenario['third_stage']['probabilies'].items()
        },
        's': { 
                scenario['third_stage']['conditions'][c_idx]:\
                [ p for v_idx,p in prob_v] \
                  for c_idx,prob_v in scenario['third_stage']['probabilies'].items()
        }
    }
    
    return benchmark
    
print(ssd_benchmark_by_q_x(q, x, scenario))

{'first_stage': {'r': [-1.2], 's': [1.0]}, 'second_stage': {'r': [-8, -10, -10], 's': [0.5, 0.4, 0.1]}, 'third_stage': {'conditions': [2, 4, 6], 'r': {2: [12.0, 12.0, 12.0], 4: [15.0, 15.0, 15.0], 6: [15.0, 15.0, 15.0]}, 's': {2: [0.5, 0.4, 0.1], 4: [0.4, 0.3, 0.3], 6: [0.4, 0.4, 0.2]}}}


#### Optimal policy of $x$ by solve two-stage subproblems
Given a determined $q$<br>
Distribution of scenarios is given as the above<br>
No SSD contraints to consider

In [4]:
import gurobipy as gp
from gurobipy import GRB
from collections import defaultdict

def second_stage_policy(q, scenario)->(dict,float):
    policy_x = {}
    r_x = {}
    conditions = scenario['third_stage']['conditions']
    values = scenario['third_stage']['values']
    # first stage
    for c_idx,list_v_p in scenario['third_stage']['probabilies'].items():
        cap = conditions[c_idx]
        upper_bound = q*cap
        # upper_bound for x
        m = gp.Model(f'second_stage_cap_{cap}')
        m.Params.LogToConsole = 0
        x = m.addVar(vtype = GRB.CONTINUOUS, name = 'x')
        
        m.addConstr(x<=upper_bound)
        x_d_aug = {}
        
        for v_idx,prob in list_v_p:
            d = values[v_idx]
            tem_x_d_aug = m.addVar(vtype = GRB.CONTINUOUS, name = f"min(x,{d})")
            # an auxiliary variable for min(x,d)
            
            x_d_aug[d] = tem_x_d_aug
            # store in the dict 
            
            m.addConstr(tem_x_d_aug<=x)
            m.addConstr(tem_x_d_aug<=d)

        m.setObjective(-x+sum( 1.5*x_d_aug[values[v_idx]]*prob for v_idx, prob in list_v_p) , GRB.MAXIMIZE)
        m.optimize()
        policy_x[cap] = x.X
        r_x[cap] =m.getObjective().getValue()
    return policy_x, sum( r_x[conditions[i]]*scenario['third_stage']['conditions_prob'][i] for i in  range(3))

print('q ||  optilmal policy of q || total reward')
for q in range(1,10):
    policy_x, expected_r = second_stage_policy(q, scenario)
    print(q,'||', policy_x, '||',expected_r-0.3*q)

q ||  optilmal policy of q || total reward
Restricted license - for non-production use only - expires 2023-10-25
1 || {2: 2.0, 4: 4.0, 6: 6.0} || 1.3000000000000005
2 || {2: 4.0, 4: 8.0, 6: 12.0} || 2.600000000000001
3 || {2: 6.0, 4: 12.0, 6: 16.0} || 3.800000000000001
4 || {2: 8.0, 4: 14.0, 6: 16.0} || 4.400000000000001
5 || {2: 10.0, 4: 14.0, 6: 16.0} || 4.600000000000001
6 || {2: 10.0, 4: 14.0, 6: 16.0} || 4.300000000000002
7 || {2: 10.0, 4: 14.0, 6: 16.0} || 4.000000000000002
8 || {2: 10.0, 4: 14.0, 6: 16.0} || 3.7000000000000015
9 || {2: 10.0, 4: 14.0, 6: 16.0} || 3.4000000000000017


Obviously, for $q>=5$, the optimal policy is not sensitive to $q$ anymore, that is, the second stage is not sensitive to the upper bound $q*cap$ <br>
Thus, $q=5$ and $x(q,\cdot)$ = { 2->10, 4->14.0, 6->16} is optimal solution when no consideration of SSD 

#### Non-SSD Optimal policy as benchmark

In [7]:
q = 4
x = {}
x[q],_ = second_stage_policy(q, scenario)

def ssd_benchmark_by_q_x( q, x, scenario):
    benchmark = {}
    benchmark['first_stage'] = {
        'r': [-q*0.3],
        's': [1.0]
    }
    
    # tem_num_condition = len(scenario['second_stage']['conditions'])
    benchmark['second_stage'] = {
        'r': {cap:-x[q][cap] for cap in scenario['second_stage']['values']},
        's': [ prob for _,prob in scenario['second_stage']['probabilies'][0]]
    }
    
    conditions = scenario['third_stage']['conditions']
    values = scenario['third_stage']['values']
    
    benchmark['third_stage'] = {
        'conditions': scenario['third_stage']['conditions'],
        'r': { 
                conditions[c_idx]:[ 1.5*min(x[q][conditions[c_idx]],\
                                    values[v_idx]) for v_idx,p in prob_v] \
                  for c_idx,prob_v in scenario['third_stage']['probabilies'].items()
        },
        's': { 
                conditions[c_idx]:[ p for v_idx,p in prob_v] \
                  for c_idx,prob_v in scenario['third_stage']['probabilies'].items()
        }
    }
    return benchmark

optimal_benchmark = ssd_benchmark_by_q_x(q, x, scenario)
print(optimal_benchmark)

{'first_stage': {'r': [-1.2], 's': [1.0]}, 'second_stage': {'r': {2: -8.0, 4: -14.0, 6: -16.0}, 's': [0.5, 0.4, 0.1]}, 'third_stage': {'conditions': [2, 4, 6], 'r': {2: [12.0, 12.0, 12.0], 4: [21.0, 21.0, 21.0], 6: [24.0, 24.0, 24.0]}, 's': {2: [0.5, 0.4, 0.1], 4: [0.4, 0.3, 0.3], 6: [0.4, 0.4, 0.2]}}}


Let's check if there is a better policy than this benchmark in SSD sense

In [5]:
# utili functions
import decimal

def round2(c):
    c = decimal.Decimal(c)
    return float(round(c,2))

#### Multi-cut algorithm
we use the above result from non-SSD optimal policy as the benchmark

In [13]:
import gurobipy as gp
from gurobipy import GRB
from collections import defaultdict

# need to give a bound to q
# we use the above non-SSD optimal solution/policy as the start point
def multicut_three_stage_newsvendor(q_0, z_1_0, scenario, benchmark, max_itr = 100):
    
    # parameters
    # the scenario of second stage list[pair(cap,prob)]
    cap_list = [(scenario['second_stage']['values'][v_idx],prob) \
            for v_idx,prob in scenario['second_stage']['probabilies'][0]]
    
    # the scenario of second stage dict[cap:prob)]
    cap_prob = {c:p for c,p in cap_list}
    
    # the scenario of third stage conditioning on second stage 
    # dict[cap: list[pair(d,prob)]]
    prob_dist = {}
    for c_idx,prob_v in scenario['third_stage']['probabilies'].items():
        prob_dist[scenario['third_stage']['conditions'][c_idx]] =\
                [(scenario['third_stage']['values'][v_idx],prob) for v_idx,prob in prob_v ]
    
    y_1 = benchmark['first_stage']['r'][0]
    
    # rhs of event cuts, w and u
    w = {}
    y = {} 
    for cap,_ in cap_list:
        y[cap] = defaultdict(float)
        for r,prob in zip(benchmark['third_stage']['r'][cap], benchmark['third_stage']['s'][cap]):
            y[cap][r]+=prob

        w[cap] = {}
        for y_3_j in y[cap].keys():
            w[cap][y_3_j] = sum( max(y_3_j - y_3_i,0)*prob for y_3_i,prob in y[cap].items())
    
    theta = {}
    for cap,_ in cap_list:
        theta[cap] = benchmark['second_stage']['r'][cap] + \
                        sum(  y_3_i*prob for y_3_i,prob in y[cap].items())
    u = {}
    for cap,_ in cap_list:
        u[cap] =  sum( max(theta[cap] -  theta[s], 0)*cap_prob[s] for s,prob in cap_list)
        
    
    # initialize, step 0
    itr = 0
    
    event_n = 0
    event_list = []
    event_cut = {}
    obj_cut = []
    fs_cut = []
    
    q_ = q_0
    z_1_ = z_1_0
    
    v_ = {}
    for cap,_ in cap_list:
        v_[cap]= float('inf')
    
    # init the master problem
    # we set the gurobi model outside the iteration loop
    master = gp.Model('master problem')
    
    # master.Params.LogToConsole = 0
    q = master.addVar(lb=0,ub=10, vtype = GRB.CONTINUOUS, name = 'q')
    z_1 = master.addVar(lb = -float('inf'), ub = float('inf'),\
                            vtype = GRB.CONTINUOUS, name = 'z_1')
    
    # X_1
    p = master.addConstr(z_1 + 0.3*q<=0)
    
    # variable v for each second stage scenario
    v = {}
    for cap,prob in cap_list:
        v[cap] = master.addVar(lb = -float('inf'), ub = 1000,\
                               vtype = GRB.CONTINUOUS, name = f'v_cap={cap}')

    
    master_obj = z_1 + sum( v[cap]*prob for cap,prob in cap_list )
    master.setObjective(master_obj, GRB.MAXIMIZE)
    
    # cut inequalities are added into master during iterations
    
    while(itr<max_itr):
        reward_second_stage = {}
        all_solvable_flag = True
        for cap,_ in cap_list:
            # each second stage subproblem in scenario cap = cap with q=q_k
            rst_obj = reward_problem_of_first_stage_ssd(q = q_, z_1 = z_1_, \
                                              cap = cap, prob_dist=prob_dist, benchmark=benchmark)
            
            if rst_obj is not None:
                # try objective cuts
                _, _, reward_second_stage[cap], pi_1_,pi_2_ = rst_obj
                tem_obj_cut = master.addConstr( v[cap] <=reward_second_stage[cap] +\
                                               pi_1_*cap*(q-q_) + pi_2_*(z_1_ - z_1),\
                                              name= f'obj_cut_{itr}_{cap}')
                obj_cut.append(tem_obj_cut)
            else:
                # try feasibility cuts
                rst_fs = feasibility_problem_of_first_stage_ssd(q = q_, z_1 = z_1_, \
                                              cap = cap, prob_dist=prob_dist, benchmark=benchmark)
                print(rst_fs)
                if rst_fs=='optimal': 
                    continue
                elif rst_fs=='beyond max_itr':
                    raise RuntimeError(rst_fs + f'in itr case cap = {cap}')
                all_solvable_flag = False
                _, _, fs_obj, pi_1_,pi_2_ = rst_fs
                tem_fs_cut = master.addConstr(  0 >= fs_obj + pi_1_*cap* (q-q_) +\
                                              pi_2_*(z_1 - z_1_), 
                                             name= f'fs_cut_{itr}_{cap}')
                fs_cut.append(tem_fs_cut)
                
        # update (add) event cuts to master problem
        event_n_ = event_n 
        if all_solvable_flag:
            A = {}
            tem_sup = {}
            for cap,_ in cap_list:
                theta_j = theta[cap]
                A[cap] = [cap for cap,_ in cap_list if v_[cap] + z_1_-y_1<=theta[cap]]

                tem_sup[cap] = sum((theta[cap] - v_[cap]-z_1_+y_1)*prob\
                                       for cap,prob in cap_list) - u[cap]
                
            tem_sup_ = max(sup for key,sup in tem_sup.items())
            if tem_sup_ > 0:
                tem_sup_ = tem_sup_/2
                for cap,_ in cap_list:
                    if tem_sup[cap]>=tem_sup_:
                        event_cut[cap] = (A[cap],event_n)
                        # add new constraints
                        tot = 0
                        for cap_j in A[cap]:
                            tem_lhs = theta[cap] -v[cap_j] -z_1+y_1
                            tot += cap_prob[cap_j]*tem_lhs
                        master.addConstr(tot<=u[cap],
                                        name= f'event_cut_{itr}_{cap}')
                        event_n +=1

        
        if event_n==event_n_ and all_solvable_flag\
                and all(abs(v_[cap]- reward_second_stage[cap])<0.001\
                                     for cap,_ in cap_list):
            break
        
        
        
        # master solution
        master.optimize()
        print(master.display())
        # print(master.Status)
        if master.Status==4:
            return 'unbounded'
        
        # update values of variables
        q_ = round2(q.X)
        z_1_=round2(z_1.X)
        for cap,_ in cap_list:
            v_[cap]= round2(v[cap].X)
        print(itr,'q: ', q_,'z_1: ', z_1_)
            
        # increase k by 1
        itr+=1
    if (itr>=max_itr):
        return "terminated: max iteration"
    else:
        return q_,z_1_

# testing
multicut_three_stage_newsvendor(q_0 = 4, z_1_0 =-1, scenario=scenario, \
                                benchmark=optimal_benchmark, max_itr = 10)

Gurobi Optimizer version 9.5.1 build v9.5.1rc2 (linux64)
Thread count: 8 physical cores, 16 logical processors, using up to 16 threads
Optimize a model with 4 rows, 5 columns and 6 nonzeros
Model fingerprint: 0xee5aba09
Coefficient statistics:
  Matrix range     [3e-01, 1e+00]
  Objective range  [1e-01, 1e+00]
  Bounds range     [1e+01, 1e+03]
  RHS range        [9e-16, 8e+00]
Presolve removed 4 rows and 5 columns
Presolve time: 0.01s
Presolve: All rows and columns removed
Iteration    Objective       Primal Inf.    Dual Inf.      Time
       0    5.6000000e+00   0.000000e+00   0.000000e+00      0s

Solved in 0 iterations and 0.01 seconds (0.00 work units)
Optimal objective  5.600000000e+00
Maximize
  <gurobi.LinExpr: z_1 + 0.5 v_cap=2 + 0.4 v_cap=4 + 0.1 v_cap=6>
Subject To
  R0: <gurobi.LinExpr: 0.3 q + z_1> <= 0
  obj_cut_0_2: <gurobi.LinExpr: -1.0 q + v_cap=2> <= 8.88178e-16
  obj_cut_0_4: <gurobi.LinExpr: v_cap=4> <= 7
  obj_cut_0_6: <gurobi.LinExpr: v_cap=6> <= 8
Bounds
  0 <= q 

(4.0, -1.2)

##### Solver for Subproblems

To construct the objective cuts and feasibility cuts, we have two following functions to solve corresponding subproblems, respectively

In [9]:
# subproblem solving algorithm based on current paper
# use the event cuts in the form

import gurobipy as gp
from gurobipy import GRB
from collections import defaultdict

def reward_problem_of_first_stage_ssd(q:float, z_1:float, cap:float, \
                                      prob_dist:dict, benchmark:dict)->(float, dict):
    prob_dist = prob_dist[cap]
    prob_of_d = {d:p for d,p in prob_dist}
    itr = 0
    max_itr = 100
    
    m = gp.Model('reward_problem_two_stage_SSD')
    m.Params.LogToConsole = 0

    x = m.addVar(vtype = GRB.CONTINUOUS, name = 'x')
    z_2 = m.addVar(lb = -float('inf'), ub = float('inf'),\
                       vtype = GRB.CONTINUOUS, name = 'z_2')
    sigma = m.addVar(lb = -float('inf'), ub = float('inf'),\
                       vtype = GRB.CONTINUOUS, name = 'sigma')
    f_3 = {}
    for d,p in prob_dist:
        f_3_tem = m.addVar(vtype = GRB.CONTINUOUS, name = f'f_3_d={d}')
        f_3[d] = f_3_tem
        m.addConstr( f_3_tem<=1.5*d, name = f'stage_2_{d}' )
        m.addConstr( f_3_tem<=1.5*x, name = f'stage_2_x_{d}' )
    
    obj = z_2+sum(f_3[d]*p for d,p in prob_dist)
    m.setObjective(obj,GRB.MAXIMIZE)
        
    pi_1 = m.addConstr( x<=q*cap, name='cap_limit' )
    m.addConstr( z_2<=-x ,'z_2<=f(x)')
    
    pi_2 = m.addConstr(sigma ==z_1 + z_2 - \
                       benchmark['first_stage']['r'][0] - \
                       benchmark['second_stage']['r'][cap] , name = 'sigma')
    
    # union hte same rewards in benchmark
    y = defaultdict(float)
    for r,prob in zip(benchmark['third_stage']['r'][cap], benchmark['third_stage']['s'][cap]):
        y[r]+=prob
    
    w = {}
    for y_3_j in y.keys():
        w[y_3_j] = sum( max(y_3_j - y_3_i,0)*prob for y_3_i,prob in y.items())
    
    f_3_ = {}
    for d,_ in prob_dist:
        f_3_[d] = - float('inf')
        
    while( itr< max_itr):
        m.optimize()
        # print(m.display())
        if m.Status!=2:
            print(m, 'status:',m.Status)
            return  None
        
        x_ = x.X
        z_2_ = z_2.X
        sigma_ = sigma.X
        for d,_ in prob_dist:
            f_3_[d] = f_3[d].X
        obj_ = obj.getValue()
        
        all_sat_flag = True
        for y_3_j,w_j in w.items():
            if sum(max(y_3_j-sigma_-f_3_[d_i],0)*p_i for d_i,p_i in prob_dist)>w_j:
                all_sat_flag = False
                A = [d_i for d_i,_ in prob_dist if y_3_j>sigma_+f_3_[d_i]]
                a = sum( (y_3_j-sigma-f_3[d])*prob_of_d[d] for d in A)
                m.addConstr( a<=w_j, name= f'event_cut_{itr}_{y_3_j}_{w_j}')

        if itr!=0 and all_sat_flag:
            pi_1_=pi_1.Pi
            pi_2_=pi_2.Pi
            # print(itr)
            return x_,z_2_,obj_,pi_1_,pi_2_
        itr +=1
    return None

In [253]:
prob_dist = {}
for c_idx,prob_v in scenario['third_stage']['probabilies'].items():
    prob_dist[scenario['third_stage']['conditions'][c_idx]] =\
            [(scenario['third_stage']['values'][v_idx],prob) for v_idx,prob in prob_v ]
# testing 
reward_problem_of_first_stage_ssd(q = 3.99, z_1 = -1.19, cap = 2,\
                                  prob_dist=prob_dist, benchmark=optimal_benchmark)

(7.98, -7.98, 3.99, 0.5, -0.0)

In [236]:

# import gurobipy as gp
# from gurobipy import GRB
# from collections import defaultdict

# def reward_problem_of_first_stage_ssd(q:float, z_1:float, cap:float, prob_dist:dict, benchmark:dict)->(float, dict):
#     prob_dist = prob_dist[cap]
#     m = gp.Model('reward_problem_two_stage_SSD')
#     m.Params.LogToConsole = 0

#     x = m.addVar(vtype = GRB.CONTINUOUS, name = 'x')
#     z_2 = m.addVar(lb = -float('inf'), ub = float('inf'),\
#                        vtype = GRB.CONTINUOUS, name = 'z_2')
#     sigma = m.addVar(lb = -float('inf'), ub = float('inf'),\
#                        vtype = GRB.CONTINUOUS, name = 'sigma')

#     pi_1 = m.addConstr( x<=q*cap )
#     m.addConstr( z_2<=-x )
    
#     pi_2 = m.addConstr(sigma ==z_1 + z_2 - \
#                        benchmark['first_stage']['r'][0] - \
#                        benchmark['second_stage']['r'][cap] )
    
#     # union hte same rewards in benchmark
#     y = defaultdict(float)
#     for r,prob in zip(benchmark['third_stage']['r'][cap], benchmark['third_stage']['s'][cap]):
#         y[r]+=prob
    
#     w = {}
#     for y_3_j in y.keys():
#         w[y_3_j] = sum( max(y_3_j - y_3_i,0)*prob for y_3_i,prob in y.items())

#     x_d_min = {}
#     for d,p in prob_dist:
#         tem_x_d_min = m.addVar(vtype = GRB.CONTINUOUS, name = f"min(x,{d})")
#         # an auxiliary variable for min(x,d)
#         x_d_min[d] = tem_x_d_min
#         # store in the dict 
#         m.addConstr(tem_x_d_min<=d, name = f'min_x_{d}')
#         m.addConstr(tem_x_d_min<=x, name = f'min_x_{d}')

#     x_d_min_0_max = defaultdict(dict)
#     for y_3_j,w_j in w.items():
#         tot = 0
#         for d,p in prob_dist:
# #             lhs_c = m.addVar(lb = -float('inf'), ub = float('inf'),\
# #                                  vtype = GRB.CONTINUOUS,\
# #                                  name = f'({y_3_j}-sigma - Q^{d}_3(x) )')
#             # an auxiliary variable for (y_j-sigma - Q^d_i_2(x))_+
#             # f_X(d_i,x) in this case is 1.5*min(x,d_i) which is 1.5 x_d_aug[d_i]
#             lhs_c_pos = m.addVar(vtype = GRB.CONTINUOUS,\
#                                      name = f'({y_3_j}-sigma - Q^{d}_3(x) )_+')
#             x_d_min_0_max[d][y_3_j] = lhs_c_pos
#             # m.addConstr( lhs_c == y_3_j-sigma - 1.5*x_d_min[d])
#             m.addConstr( lhs_c_pos >= y_3_j-sigma - 1.5*x_d_min[d])
#             tot+=p*lhs_c_pos
#         m.addConstr( tot <= w_j)

#     #   problem.setObjective(z+sum( 1.5*x_d_aug[d]*p+ x_d_aug[d] -x_aug[d] for d,p in prob_dist ), GRB.MAXIMIZE)
#     obj = z_2+sum( 1.5*x_d_min[d]*p for d,p in prob_dist)
#     m.setObjective(obj-sum(sum(tem for _,tem in x_d_min_0_max[d].items() ) \
#                            for d,p in prob_dist) \
#                    +sum(x_d_min[d] for d,p in prob_dist),\
#                    GRB.MAXIMIZE)
#     m.optimize()
#     if m.Status!=2:
#         return None
#     x_ = x.X
#     z_2_ = z_2.X
#     sigma_ = sigma.X
#     obj_ = obj.getValue()
#     pi_1_=pi_1.Pi
#     pi_2_=pi_2.Pi
     
#     return x_, z_2_,obj_,pi_1_,pi_2_

In [228]:
# # testing 
# reward_problem_of_first_stage_ssd(q = 3.9999, z_1 = -1.1999999999999993, cap = 2, \
#                                        prob_dist=prob_dist, benchmark=optimal_benchmark)

<gurobi.Model Continuous instance reward_problem_two_stage_SSD: 10 constrs, 6 vars, Parameter changes: LogToConsole=0> status: 3


#### Subproblem for feasible cuts

In [10]:
import gurobipy as gp
from gurobipy import GRB
from collections import defaultdict

def feasibility_problem_of_first_stage_ssd(q:float, z_1:float, cap:float, \
                                      prob_dist:dict, benchmark:dict)->(float, dict):
    prob_dist = prob_dist[cap]
    prob_of_d = {d:p for d,p in prob_dist}
    itr = 0
    max_itr = 100
    
    m = gp.Model('reward_problem_two_stage_SSD')
    m.Params.LogToConsole = 0

    x = m.addVar(vtype = GRB.CONTINUOUS, name = 'x')
    z_2 = m.addVar(lb = -float('inf'), ub = float('inf'),\
                       vtype = GRB.CONTINUOUS, name = 'z_2')
    sigma = m.addVar(lb = -float('inf'), ub = float('inf'),\
                       vtype = GRB.CONTINUOUS, name = 'sigma')
    
    u_1 = m.addVar(lb = -float('inf'), ub = float('inf'), \
                       vtype = GRB.CONTINUOUS, name = 'u_1')
    u_2 = m.addVar(lb = -float('inf'), ub = float('inf'),\
                       vtype = GRB.CONTINUOUS, name = 'u_2')
    u_1_abs = m.addVar(vtype = GRB.CONTINUOUS, name = 'u_1_abs')
    u_2_abs = m.addVar(vtype = GRB.CONTINUOUS, name = 'u_2_abs')
    
    m.addConstr( u_1_abs >= u_1)
    m.addConstr( u_2_abs >= u_2)
    m.addConstr( u_1_abs >= -u_1)
    m.addConstr( u_2_abs >= -u_2)
    
    f_3 = {}
    for d,p in prob_dist:
        f_3_tem = m.addVar(vtype = GRB.CONTINUOUS, name = f'f_3_d={d}')
        f_3[d] = f_3_tem
        m.addConstr( f_3_tem<=1.5*d, name = f'stage_2_{d}' )
        m.addConstr( f_3_tem<=1.5*x, name = f'stage_2_x_{d}' )
    
    obj = u_1_abs + u_2_abs
    m.setObjective(obj,GRB.MINIMIZE)
        
    pi_1 = m.addConstr( x + u_1 - q*cap<=0, name = 'pi_1' )
    m.addConstr( z_2+x<=0 )
    
    pi_2 = m.addConstr(sigma + u_2 - z_1 - z_2 + benchmark['first_stage']['r'][0]\
                           + benchmark['second_stage']['r'][cap] == 0, \
               name = 'pi_2')
    
    # union hte same rewards in benchmark
    y = defaultdict(float)
    for r,prob in zip(benchmark['third_stage']['r'][cap], benchmark['third_stage']['s'][cap]):
        y[r]+=prob
    
    w = {}
    for y_3_j in y.keys():
        w[y_3_j] = sum( max(y_3_j - y_3_i,0)*prob for y_3_i,prob in y.items())
    
    f_3_ = {}
    for d,_ in prob_dist:
        f_3_[d] = - float('inf')
        
    while( itr< max_itr):
        m.optimize()
        # print(m.display())
        if m.Status!=2:
            print(m, 'status:',m.Status)
            return  None
        
        x_ = x.X
        z_2_ = z_2.X
        sigma_ = sigma.X
        for d,_ in prob_dist:
            f_3_[d] = f_3[d].X
        obj_ = obj.getValue()
        
        all_sat_flag = True
        for y_3_j,w_j in w.items():
            if sum(max(y_3_j-sigma_-f_3_[d_i],0)*p_i for d_i,p_i in prob_dist)>w_j:
                all_sat_flag = False
                A = [d_i for d_i,_ in prob_dist if y_3_j>sigma_+f_3_[d_i]]
                a = sum( (y_3_j-sigma-f_3[d])*prob_of_d[d] for d in A)
                m.addConstr( a<=w_j, name= f'event_cut_{itr}_{y_3_j}_{w_j}')

        if itr!=0 and all_sat_flag:
            pi_1_=pi_1.Pi
            pi_2_=pi_2.Pi
            # print(f'iteration times: {itr+1}')
            return x_,z_2_, obj_,pi_1_,pi_2_
        itr +=1
    
    return 'beyond max_itr'

In [255]:
# test the new funciton
feasibility_problem_of_first_stage_ssd(q = 3.9999, z_1 = -1.1999999999999993, cap = 2, \
                                       prob_dist=prob_dist, benchmark=optimal_benchmark)

'optimal'

In [232]:
import gurobipy as gp
from gurobipy import GRB
from collections import defaultdict

def feasibility_problem_of_first_stage_ssd(q:float, z_1:float, cap:float, prob_dist:dict, benchmark:dict)->(float, dict):
    prob_dist = prob_dist[cap]
    m = gp.Model('reward_problem_two_stage_SSD')
    m.Params.LogToConsole = 0

    x = m.addVar(vtype = GRB.CONTINUOUS, name = 'x')
    z_2 = m.addVar(lb = -float('inf'), ub = float('inf'),\
                       vtype = GRB.CONTINUOUS, name = 'z_2')
    sigma = m.addVar(lb = -float('inf'), ub = float('inf'),\
                       vtype = GRB.CONTINUOUS, name = 'sigma')
    
    u_1 = m.addVar(lb = -float('inf'), ub = float('inf'), \
                       vtype = GRB.CONTINUOUS, name = 'u_1')
    u_2 = m.addVar(lb = -float('inf'), ub = float('inf'),\
                       vtype = GRB.CONTINUOUS, name = 'u_2')
    u_1_abs = m.addVar(vtype = GRB.CONTINUOUS, name = 'u_1_abs')
    u_2_abs = m.addVar(vtype = GRB.CONTINUOUS, name = 'u_2_abs')

    m.addConstr( u_1_abs >= u_1)
    m.addConstr( u_2_abs >= u_2)
    m.addConstr( u_1_abs >= -u_1)
    m.addConstr( u_2_abs >= -u_2)
    
    pi_1 = m.addConstr( x + u_1 - q*cap<=0, name = 'pi_1' )
    m.addConstr( z_2+x<=0 )
    
    pi_2 = m.addConstr(sigma + u_2 - z_1 - z_2 + benchmark['first_stage']['r'][0]\
                           + benchmark['second_stage']['r'][cap] == 0, \
               name = 'pi_2')
    
    # union hte same rewards in benchmark
    y = defaultdict(float)
    for r,prob in zip(benchmark['third_stage']['r'][cap], benchmark['third_stage']['s'][cap]):
        y[r]+=prob
    
    w = {}
    for y_3_j in y.keys():
        w[y_3_j] = sum( max(y_3_j - y_3_i,0)*prob for y_3_i,prob in y.items())

    x_d_min = {}
    for d,p in prob_dist:
        tem_x_d_min = m.addVar(vtype = GRB.CONTINUOUS, name = f"min(x,{d})")
        # an auxiliary variable for min(x,d)
        x_d_min[d] = tem_x_d_min
        # store in the dict 
        m.addConstr(tem_x_d_min<=d, name = f'min_x_{d}')
        m.addConstr(tem_x_d_min<=x, name = f'min_x_{d}')

    x_d_min_0_max = defaultdict(dict)
    for y_3_j,w_j in w.items():
        tot = 0
        for d,p in prob_dist:
#             lhs_c = m.addVar(lb = -float('inf'), ub = float('inf'),\
#                                  vtype = GRB.CONTINUOUS,\
#                                  name = f'({y_3_j}-sigma - Q^{d}_3(x) )')
            # an auxiliary variable for (y_j-sigma - Q^d_i_2(x))_+
            # f_X(d_i,x) in this case is 1.5*min(x,d_i) which is 1.5 x_d_aug[d_i]
            lhs_c_pos = m.addVar(vtype = GRB.CONTINUOUS,\
                                     name = f'({y_3_j}-sigma - Q^{d}_3(x) )_+')
            x_d_min_0_max[d][y_3_j] = lhs_c_pos
            # m.addConstr( lhs_c == y_3_j-sigma - 1.5*x_d_min[d])
            m.addConstr( lhs_c_pos >= y_3_j-sigma - 1.5*x_d_min[d])
            tot+=p*lhs_c_pos
        m.addConstr( tot <= w_j)

    #   problem.setObjective(z+sum( 1.5*x_d_aug[d]*p+ x_d_aug[d] -x_aug[d] for d,p in prob_dist ), GRB.MAXIMIZE)
    obj = u_1_abs + u_2_abs
    m.setObjective(obj+sum(sum(tem for _,tem in x_d_min_0_max[d].items() ) \
                           for d,p in prob_dist) \
                   -sum(x_d_min[d] for d,p in prob_dist),\
                   GRB.MINIMIZE)
    m.optimize()
    if m.Status!=2:
        print('no solution')
        return None
    x_ = x.X
    z_2_ = z_2.X
    sigma_ = sigma.X
    obj_ = obj.getValue()
    pi_1_=pi_1.Pi
    pi_2_=pi_2.Pi
     
    return x_, z_2_,sigma_,obj_,pi_1_,pi_2_

In [225]:
# test the new funciton
feasibility_problem_of_first_stage_ssd(q = 3.999999999999998, z_1 = -1.199999, cap = 2, \
                                       prob_dist=prob_dist, benchmark=optimal_benchmark)

(7.999999999999996, -7.999999999999996, 0.0, -0.0, -0.0)

In [384]:
import gurobipy as gp
from gurobipy import GRB
from collections import defaultdict

def feasibility_problem_of_first_stage_ssd(q:float, z_1:float, cap:float, prob_dist:dict, benchmark:dict)->(float, dict):

    prob_dist = prob_dist[cap]
    print(prob_dist)
    m = gp.Model('feasibility_problem')
    # m.params.Method = 4
    # m.Params.LogToConsole = 0
    
    x = m.addVar(vtype = GRB.CONTINUOUS, name = 'x')
    z_2 = m.addVar(lb = -float('inf'), ub = float('inf'),\
                       vtype = GRB.CONTINUOUS, name = 'z_2')
    sigma = m.addVar(lb = -float('inf'), ub = float('inf'),\
                         vtype = GRB.CONTINUOUS, name = 'sigma')
    u_1 = m.addVar(lb = -float('inf'), ub = float('inf'), \
                       vtype = GRB.CONTINUOUS, name = 'u_1')
    u_2 = m.addVar(lb = -float('inf'), ub = float('inf'),\
                       vtype = GRB.CONTINUOUS, name = 'u_2')
    u_1_abs = m.addVar(vtype = GRB.CONTINUOUS, name = 'u_1_abs')
    u_2_abs = m.addVar(vtype = GRB.CONTINUOUS, name = 'u_2_abs')
    
    m.addConstr( u_1_abs == gp.abs_(u_1))
    m.addConstr( u_2_abs == gp.abs_(u_2))
    
    m.addConstr( x + u_1 - q*cap<=0, name = 'pi_1' )
    m.addConstr( z_2+x<=0 )
    
    m.addConstr(sigma + u_2 - z_1 - z_2 + benchmark['first_stage']['r'][0]\
                           + benchmark['second_stage']['r'][cap] == 0, \
               name = 'pi_2')

    # union hte same rewards in benchmark
    y = defaultdict(float)
    for r,prob in zip(benchmark['third_stage']['r'][cap], benchmark['third_stage']['s'][cap]):
        y[r]+=prob
    
    w = {}
    for y_3_j in y.keys():
        w[y_3_j] = sum( max(y_3_j - y_3_i,0)*prob for y_3_i,prob in y.items())
    
    print(y)
    print(w)
    print(benchmark['first_stage']['r'][0])
    print(benchmark['second_stage']['r'][cap])

    x_d_min = {}
    for d,p in prob_dist:
        tem_x_d_min = m.addVar(vtype = GRB.CONTINUOUS, name = f"min(x,{d})")
        # an auxiliary variable for min(x,d)
        x_d_min[d] = tem_x_d_min
        # store in the dict 
        m.addConstr(tem_x_d_min== gp.min_(x,d), name = f'min_x_{d}')

    x_d_min_0_max = defaultdict(dict)
    for y_3_j,w_j in w.items():
        tot = 0
        for d,p in prob_dist:
            lhs_c = m.addVar(lb = -float('inf'), ub = float('inf'),\
                                 vtype = GRB.CONTINUOUS,\
                                 name = f'({y_3_j}-sigma - Q^{d}_3(x) )')
            # an auxiliary variable for (y_j-sigma - Q^d_i_2(x))_+
            # f_X(d_i,x) in this case is 1.5*min(x,d_i) which is 1.5 x_d_aug[d_i]
            lhs_c_pos = m.addVar(vtype = GRB.CONTINUOUS,\
                                     name = f'({y_3_j}-sigma - Q^{d}_3(x) )_+')
            x_d_min_0_max[d][y_3_j] = lhs_c_pos
            m.addConstr( lhs_c == y_3_j-sigma - 1.5*x_d_min[d])
            m.addConstr( lhs_c_pos == gp.max_(lhs_c,0))
            tot+=p*lhs_c_pos
        m.addConstr( tot <= w_j)
    
    #   problem.setObjective(z+sum( 1.5*x_d_aug[d]*p+ x_d_aug[d] -x_aug[d] for d,p in prob_dist ), GRB.MAXIMIZE)
    obj = u_1_abs + u_2_abs
    m.setObjective(u_1_abs + u_2_abs,\
                   GRB.MINIMIZE)
    m.optimize()
  
    
    x_ = x.X
    z_2_ = z_2.X
    sigma_ = sigma.X
    obj_ = obj.getValue()
    
    return x_,z_2_,sigma_,obj_


In [458]:
import cplex
from docplex import mp
from docplex.mp.model import Model


def reward_problem_of_first_stage_ssd(q:float, z_1:float, cap:float, prob_dist:dict, benchmark:dict)->(float, dict):
    n = 0
    prob_dist = prob_dist[cap]
    m = Model(name='reward_problem_two_stage_SSD')
    #m.Params.LogToConsole = 0

    x = m.continuous_var(name = 'x')
    z_2 = m.continuous_var(lb = -cplex.infinity, ub = cplex.infinity, name = 'z_2')
    sigma = m.continuous_var(lb = -cplex.infinity, ub = cplex.infinity, name = 'sigma')

    m.add_constraint( x<=q*cap, ctname = 'pi_1' )
    m.add_constraint( z_2<=-x )
    
    m.add_constraint(sigma ==z_1 + z_2 - \
                       benchmark['first_stage']['r'][0] - \
                       benchmark['second_stage']['r'][cap], ctname = 'pi_2')
    
    # union hte same rewards in benchmark
    y = defaultdict(float)
    for r,prob in zip(benchmark['third_stage']['r'][cap], benchmark['third_stage']['s'][cap]):
        y[r]+=prob
    
    w = {}
    for y_3_j in y.keys():
        w[y_3_j] = sum( max(y_3_j - y_3_i,0)*prob for y_3_i,prob in y.items())

    x_d_min = {}
    for d,p in prob_dist:
        x_d_min[d] = m.min(x,d)
        n+=1
        # store in the dict 

    x_d_min_0_max = defaultdict(dict)
    for y_3_j,w_j in w.items():
        tot = 0
        for d,p in prob_dist:
            lhs_c = y_3_j-sigma - 1.5*x_d_min[d]
            lhs_c_pos = m.max(lhs_c,0)
            n+=1
            x_d_min_0_max[d][y_3_j] = lhs_c_pos
            
            tot+=p*lhs_c_pos
        m.add_constraint( tot <= w_j)

    #   problem.setObjective(z+sum( 1.5*x_d_aug[d]*p+ x_d_aug[d] -x_aug[d] for d,p in prob_dist ), GRB.MAXIMIZE)
    obj = z_2+sum( 1.5*x_d_min[d]*p for d,p in prob_dist)
    m.maximize(obj)
    print(n)
    m.print_information()
    m.solve()
    
    if m.solution is None:
        return None
    x_ = x.solution_value
    z_2_ = z_2.solution_value
    sigma_ = sigma.solution_value
    obj_ = obj.solution_value
    
    #print(m.dual_values('pi_1'))
     
    return x_, z_2_,sigma_,obj_


AttributeError: module 'docplex.mp' has no attribute 'SolveDetails'

In [None]:
import gurobipy as gp
from gurobipy import GRB
from collections import defaultdict

def multicut_three_stage_newsvendor(scenario, benchmark):
    # initialize, step 0
    event_cut = []
    obj_cut = []
    fs_cut = []
    
    m_0 = gp.Model('initial')
    
    q = m_0.addVar(vtype = GRB.CONTINUOUS, name = 'q')
    z_1 = m_0.addVar(vtype = GRB.CONTINUOUS, name = 'z_1')
    
    # each scenario of Q_2 requires a system of auxiliary variables 
    for in :
        tem_Q_2 = m_0.addVar(vtype = GRB.CONTINUOUS, name = 'Q_2'+f'cap={cap}')
        tem_x = m_0.addVar(vtype = GRB.CONTINUOUS, name = 'x'+f'cap={cap}')
        m_0.addConstr(tem_x<=cap*q)
        for in :
            tem_x_d_min = m_0.addVar(vtype = GRB.CONTINUOUS, name = 'x'+f'cap={cap}')
        m_0.addConstr(tem_Q_2==cap*q)
    
    
    cap_prob_dist = [(scenario['second_stage']['values'][v_i],p) for v_i,p scenario['second_stage']['probabilies']]
    
    for cap,_ in cap_prob_dist:
        tem_qxcap_x = m.addVar(vtype = GRB.CONTINUOUS, name = f"min(x,{d})")
        # an auxiliary variable for min(x,d)
    x_d_aug[d] = tem_x_d_aug
    # store in the dict 

    m.addConstr(tem_x_d_aug<=x)
    
    
    
    m.setObjective(-0.3*q+sum( -x+reward_from_3 for cap,prob in cap_prob_dist) , GRB.MAXIMIZE)
    # z = m.addVar(vtype = GRB.CONTINUOUS, name = 'z')
    sigma = m.addVar(vtype = GRB.CONTINUOUS, name = 'sigma')
    obj = m.addVar(vtype = GRB.CONTINUOUS, name = 'obj_v')

In [None]:
import gurobipy as gp
from gurobipy import GRB
from collections import defaultdict

max_itr = 100

# we use the above non-SSD optimal solution/policy as the start point
def multicut_three_stage_newsvendor(q_0, x_0, scenario, benchmark):
    
    prob_dist = {}
    for c_idx,prob_v in scenario['third_stage']['probabilies'].items():
        prob_dist[scenario['third_stage']['conditions'][c_idx]] =\
                [(scenario['third_stage']['values'][v_idx],prob) for v_idx,prob in prob_v ]
    
    # initialize, step 0
    itr = 0
    event_cut = []
    obj_cut = []
    fs_cut = []
    
    q_ = q_0
    x_ = x_0
    
    # init the master problem
    master = gp.Model('master problem')
    q = master.addVar(vtype = GRB.CONTINUOUS, name = 'q')
    z_1 = master.addVar(vtype = GRB.CONTINUOUS, name = 'z_1')
    v = {}
    cap_prob = []
    n_cap = len(cap_prob_dist)
    for cap,prob in cap_prob_dist:
        tem_v = master.addVar(vtype = GRB.CONTINUOUS, name = f'v_cap={cap}')
        v.append(tem_v)
        cap_prob.append(prob)
    
    master.addConstr( z_1 <= -0.3*q)
    
    master_obj = z_1 + sum( v[i]*cap_prob[i] for i in range(n_cap) )
    
    # cut inequalities are added into master during iterations
    
    while(itr<max_itr): 
        # Step 1
        for cap,_ in cap_prob_dist:
            # each second stage subproblem in scenario cap = cap with q=q_k
            tem_m_2 = gp.Model('second stage subproblem')
            tem_x = tem_m_2.addVar(vtype = GRB.CONTINUOUS, name = 'x')
            tem_sigma = tem_m_2.addVar(vtype = GRB.CONTINUOUS, name = 'sigma')

            tem_m_2.addConstr(sigma == -x -ssd_benchmark['first_stage']['r'][0] )

            tem_m_2.addConstr(tem_x<=cap*q)

            tem_y_2 = defaultdict(float)
            for r,prob in zip(ssd_benchmark['second_stage']['r'], ssd_benchmark['second_stage']['s']):
                y[r]+=prob
            w = {}
            for y_j in y.keys():
                w[y_j] = sum( max(y_j - y_i,0)*prob for y_i,prob in y.items())


            x_d_aug = {}

            for d,p in prob_dist:
                tem_x_d_aug = m.addVar(vtype = GRB.CONTINUOUS, name = f"min(x,{d})")
                # an auxiliary variable for min(x,d)
                x_d_aug[d] = tem_x_d_aug
                # store in the dict 

                m.addConstr(tem_x_d_aug<=x)
                m.addConstr(tem_x_d_aug<=d)

            aug_f = defaultdict(dict)
            for y_j,w_j in w.items():
                tot = 0
                for d,p in prob_dist:
                    tem_aug_f = m.addVar(vtype = GRB.CONTINUOUS, name = f'({y_j}-sigma - Q^{d}_2(x))_+')
                    # an auxiliary variable for (y_j-sigma - Q^d_i_2(x))_+
                    # f_X(d_i,x) in this case is 1.5*min(x,d_i) which is 1.5 x_d_aug[d_i]
                    aug_f[d][y_j] = tem_aug_f
                    m.addConstr( y_j-sigma - 1.5*x_d_aug[d]<=tem_aug_f)
                    m.addConstr(tem_aug_f>= 0)
                    tot+=p*tem_aug_f
                m.addConstr( tot <= w_j)

            #   problem.setObjective(z+sum( 1.5*x_d_aug[d]*p+ x_d_aug[d] -x_aug[d] for d,p in prob_dist ), GRB.MAXIMIZE)
            m.addConstr(obj == -x+sum( 1.5*x_d_aug[d]*p for d,p in prob_dist))
            m.setObjective(-x+sum( 1.5*x_d_aug[d]*p-sum(tem for _,tem in aug_f[d].items() ) for d,p in prob_dist) , GRB.MAXIMIZE)
            m.optimize()

            if tem_m_2.:
                # try objective cuts
                pass
            else:
                # try feasibility cuts
                pass
        
        # update event cuts

        # add cuts into master problem

        # master solution
        
        master.optimize()
        v_ = { for cap in }
        
        if all(abs(v_[cap]- reward_second_stage[cap])<0.001 for cap in cap_list):
            break

        
        
        
    

True

In [472]:
import gurobipy as gp
from gurobipy import GRB
from collections import defaultdict

def reward_problem_of_first_stage_ssd(q:float, z_1:float, cap:float, prob_dist:dict, benchmark:dict)->(float, dict):
    itr =0
    prob_dist = prob_dist[cap]
    m = gp.Model('reward_problem_two_stage_SSD')
    # m.Params.LogToConsole = 0
    
    x = m.addVar(vtype = GRB.CONTINUOUS, name = 'x')
    z_2 = m.addVar(vtype = GRB.CONTINUOUS, name = 'z_2')
    sigma = m.addVar(vtype = GRB.CONTINUOUS, name = 'sigma')
    
    
    m.addConstr( x<=q*cap )
    m.addConstr( -z_2<=-x )
    
    m.addConstr(sigma ==z_1 + z_2 - benchmark['first_stage']['r'][0] - benchmark['second_stage']['r'][cap] )

    y = defaultdict(float)
    for r,prob in zip(benchmark['third_stage']['r'][cap], benchmark['third_stage']['s'][cap]):
        y[r]+=prob
    
    w = {}
    for y_3_j in y.keys():
        w[y_3_j] = sum( max(y_3_j - y_3_i,0)*prob for y_3_i,prob in y.items())
    
    
    x_d_min = {}
    for d,p in prob_dist:
        tem_x_d_min = m.addVar(vtype = GRB.CONTINUOUS, name = f"min(x,{d})")
        # an auxiliary variable for min(x,d)
        x_d_min[d] = tem_x_d_min
        # store in the dict 

        m.addConstr(tem_x_d_min<=x)
        m.addConstr(tem_x_d_min<=d)
        
    x_d_min_0_max = defaultdict(dict)
    for y_3_j,w_j in w.items():
        tot = 0
        for d,p in prob_dist:
            tem_x_d_min_0_max = m.addVar(vtype = GRB.CONTINUOUS, name = f'({y_3_j}-sigma - Q^{d}_3(x) )_+')
            # an auxiliary variable for (y_j-sigma - Q^d_i_2(x))_+
            # f_X(d_i,x) in this case is 1.5*min(x,d_i) which is 1.5 x_d_aug[d_i]
            x_d_min_0_max[d][y_3_j] = tem_x_d_min_0_max
            m.addConstr( y_3_j-sigma - 1.5*x_d_min[d]<=tem_x_d_min_0_max)
            m.addConstr(tem_x_d_min_0_max>= 0)
            tot+=p*tem_x_d_min_0_max
        m.addConstr( tot <= w_j)
    
    #   problem.setObjective(z+sum( 1.5*x_d_aug[d]*p+ x_d_aug[d] -x_aug[d] for d,p in prob_dist ), GRB.MAXIMIZE)
    obj = -x+sum( 1.5*x_d_min[d]*p for d,p in prob_dist)
    m.setObjective(-z_2+sum( 1.5*x_d_min[d]*p-sum(tem for _,tem in x_d_min_0_max[d].items() ) for d,p in prob_dist) ,\
                   GRB.MAXIMIZE)
    m.optimize()

    x_ = x.X
    z_2_ = z_2.X
    sigma_ = sigma.X
    obj_ = obj.getValue()
    
    #for c in m.getConstrs():
        #print(c.Pi)
    
#     while(itr<max_itr):
        
        
#         rewards = [ 1.5*min(x_,d) for d,_ in prob_dist]
#         new_events = []
#         for j,(y_j, w_j) in enumerate(zip(self.y_3,self.w)): # this for can be parallelism
#             if self.p*((y_j - s)*np.ones(self.n) - rewards) <= w_j:
#                 continue
#             event = [ i for i in range(L) if y_j -s > rewards[i]] 
#             new_events.append(event)
#         if not new_events:
#             break
#         else:
#             # add new event cuts as constr
            
#             problem.optimize()
#             # update x,z,sigma
#             x_ = x.X
#             z_ = z.X
#             sigma_ = sigma.X
#             itr+=1
    return x_,-z_2_,sigma_,obj_

############################################################
scenario = {}

scenario['second_stage'] = {
    'random_variables': ['cap'],
    'conditions': [],
    'values':[2, 4, 6],
    'probabilies':{
        0: [(0,0.5), (1,0.4),(2,0.1)]
    }
}

scenario['third_stage'] = {
    'random_variables': ['d'],
    'conditions': scenario['second_stage']['values'],
    'conditions_prob':[prob for _,prob in scenario['second_stage']['probabilies'][0] ],
    'values': [10, 14, 16, 18, 22],
    'probabilies':{
        0: [(0,0.5), (1,0.4), (3,0.1)],
        1: [(1,0.4), (2,0.3), (3,0.3)],
        2: [(2,0.4), (3,0.4), (4,0.2)]
    }
}

prob_dist = {}
for c_idx,prob_v in scenario['third_stage']['probabilies'].items():
    prob_dist[scenario['third_stage']['conditions'][c_idx]] =\
            [(scenario['third_stage']['values'][v_idx],prob) for v_idx,prob in prob_v ]

def ssd_benchmark_by_q_x( q, x, scenario):
    benchmark = {}
    benchmark['first_stage'] = {
        'r': [-q*0.3],
        's': [1.0]
    }
    
    # tem_num_condition = len(scenario['second_stage']['conditions'])
    benchmark['second_stage'] = {
        'r': {cap:-x[q][cap] for cap in scenario['second_stage']['values']},
        's': [ prob for _,prob in scenario['second_stage']['probabilies'][0]]
    }
    
    conditions = scenario['third_stage']['conditions']
    values = scenario['third_stage']['values']
    
    benchmark['third_stage'] = {
        'conditions': scenario['third_stage']['conditions'],
        'r': { 
                conditions[c_idx]:[ 1.5*min(q*conditions[c_idx],\
                                    values[v_idx]) for v_idx,p in prob_v] \
                  for c_idx,prob_v in scenario['third_stage']['probabilies'].items()
        },
        's': { 
                conditions[c_idx]:[ p for v_idx,p in prob_v] \
                  for c_idx,prob_v in scenario['third_stage']['probabilies'].items()
        }
    }
    return benchmark

q = 4
x = { q:{cap: min(q*cap,10) for cap in [2,4,6]} for q in range(1,10)}
# policy that agent buy as much as possible if supply is less than 10
#  otherwise buy 10 units
benchmark = ssd_benchmark_by_q_x( q, x, scenario)

print(benchmark)

reward_problem_of_first_stage_ssd(q = 4, z_1 = -1.2, cap = 4, prob_dist=prob_dist, benchmark=benchmark)

{'first_stage': {'r': [-1.2], 's': [1.0]}, 'second_stage': {'r': {2: -8, 4: -10, 6: -10}, 's': [0.5, 0.4, 0.1]}, 'third_stage': {'conditions': [2, 4, 6], 'r': {2: [12.0, 12.0, 12.0], 4: [21.0, 24.0, 24.0], 6: [24.0, 27.0, 33.0]}, 's': {2: [0.5, 0.4, 0.1], 4: [0.4, 0.3, 0.3], 6: [0.4, 0.4, 0.2]}}}
Gurobi Optimizer version 9.5.1 build v9.5.1rc2 (linux64)
Thread count: 8 physical cores, 16 logical processors, using up to 16 threads
Optimize a model with 23 rows, 12 columns and 44 nonzeros
Model fingerprint: 0x81e109b4
Coefficient statistics:
  Matrix range     [3e-01, 2e+00]
  Objective range  [4e-01, 1e+00]
  Bounds range     [0e+00, 0e+00]
  RHS range        [1e+00, 2e+01]
Presolve removed 15 rows and 6 columns
Presolve time: 0.01s
Presolved: 8 rows, 6 columns, 20 nonzeros

Iteration    Objective       Primal Inf.    Dual Inf.      Time
       0    7.0000000e+00   0.000000e+00   0.000000e+00      0s
       0    7.0000000e+00   0.000000e+00   0.000000e+00      0s

Solved in 0 iterations 

(14.0, -14.0, 24.0, 7.0)