### Problem setting
we consider a three stage news vendor problem
1) First stage: agents require certain units of fix-investment from agents <br>
&emsp;variable:<br>
&emsp;&emsp; &emsp;$ q\in (0,10)$ - units of investment<br>
&emsp;coeffients:<br>
&emsp;&emsp; &emsp; 0.3 - cost per unit of investment<br>
            
2) Second stage: vendor buys news paper from agent, the available quantity range depends on the invesment <br>
&emsp;  variable:<br>
&emsp; &emsp; &emsp; $x \in (0,\inf)$ - units of news paper<br>
&emsp; coeffients:<br>
&emsp; &emsp; &emsp;-1 - cost per unit of news paper<br>
&emsp; random variable:<br>
&emsp; &emsp; &emsp; $\mathrm{cap}$ - the availble buying quantity is upper bounded by $\mathrm{cap}*q$

3) Third stage: vendor sells news paper to individuals<br>
&emsp;  variables:<br>
&emsp; &emsp; &emsp; $z \in (0,x)$, sold quantity of news paper<br>
&emsp;  random variables:<br>
&emsp; &emsp; &emsp; $d \in (0,\inf)$, quantity of demands<br> 

#### Stage Scenario Data

In [27]:
'''
Template of scenario data structure
Let x - first stage r.v. 
    y - second stage r.v
    z, d - third stage r.v. 

scenario['third_stage'] = {
    'third_stage':{
        'random_variables':[ 'z','d' ], #list of varaible_name
        'conditions': [ (x_1,y_1),... ]
            # list of realized values of previous random variables
            # = second_stage:condition x second_stage:value
        'values':[ (z_1,d_1),... ]
            # list of all posible values to current random variables
            # for the convience, 
            # we don't decomposite each value pair by r.v.
            # and avoid using joint probabiity representation
        'probabilities': {
                i: [(j,probability), ...],
                .
                .
                .
                num_conditions: [ (j, probibility),... ]
                    # i idx to thrid_stage:condition
                    # j idx to third_stage:values
                    # num_conditions for the non-conditional case
                    # num_conditions:[] if not provided
        }
            # dict of the probabilies responding to third_stage:value
    }
}
# the first two stages can ignore keywords accordingly,
# e.g. scenario['first_stage'] is empty coz there is no randomness
'''
scenario = {}

scenario['second_stage'] = {
    'random_variables': ['cap'],
    'conditions': [],
    'values':[2, 4, 6],
    'probabilies':{
        0: [(0,0.5), (1,0.4),(2,0.1)]
    }
}

scenario['third_stage'] = {
    'random_variables': ['d'],
    'conditions': scenario['second_stage']['values'],
    'conditions_prob':[prob for _,prob in scenario['second_stage']['probabilies'][0] ],
    'values': [10, 14, 16, 18, 22],
    'probabilies':{
        0: [(0,0.5), (1,0.4), (3,0.1)],
        1: [(1,0.4), (2,0.3), (3,0.3)],
        2: [(2,0.4), (3,0.4), (4,0.2)]
    }
}

#### Benchmark

In [45]:
import numpy as np
import math

# decision policy -> a reward r.v.
# integer for all variables
q = 4
# 22 is the upper bound for the potential demands
# supply <= demands according to the problem setting
# q*cap <= d_max, where (cap,d_max) = (2,18), (4,18), (6,22)

# x = policy_x(q,cap)
# we use table to represent the policy
x = { q:{cap: min(q*cap,10) for cap in [2,4,6]} for q in range(1,10)}
# policy that agent buy as much as possible if supply is less than 10
#  otherwise buy 10 units


def ssd_benchmark_by_q_x( q, x, scenario):
    benchmark = {}
    benchmark['first_stage'] = {
        'r': [-q*0.3],
        's': [1.0]
    }
    
    # tem_num_condition = len(scenario['second_stage']['conditions'])
    benchmark['second_stage'] = {
        'r': [-x[q][cap] for cap in scenario['second_stage']['values']],
        's': [ prob for _,prob in scenario['second_stage']['probabilies'][0]]
    }
    
    conditions = scenario['third_stage']['conditions']
    values = scenario['third_stage']['values']
    
    benchmark['third_stage'] = {
        'conditions': scenario['third_stage']['conditions'],
        'r': { 
                c_idx:[ 1.5*min(q*conditions[c_idx],\
                                    values[v_idx]) for v_idx,p in prob_v] \
                  for c_idx,prob_v in scenario['third_stage']['probabilies'].items()
        },
        's': { 
                c_idx:[ p for v_idx,p in prob_v] \
                  for c_idx,prob_v in scenario['third_stage']['probabilies'].items()
        }
    }
    
    return benchmark
    
print(ssd_benchmark_by_q_x(q, x, scenario))

{'first_stage': {'r': [-1.2], 's': [1.0]}, 'second_stage': {'r': [-8, -10, -10], 's': [0.5, 0.4, 0.1]}, 'third_stage': {'conditions': [2, 4, 6], 'r': {0: [12.0, 12.0, 12.0], 1: [21.0, 24.0, 24.0], 2: [24.0, 27.0, 33.0]}, 's': {0: [0.5, 0.4, 0.1], 1: [0.4, 0.3, 0.3], 2: [0.4, 0.4, 0.2]}}}


#### Optimal policy of $x$ by solve two-stage subproblems
Given a determined $q$<br>
Distribution of scenarios is given as the above<br>
No SSD contraints to consider

In [44]:
import gurobipy as gp
from gurobipy import GRB
from collections import defaultdict

def second_stage_policy(q, scenario)->(dict,float):
    policy_x = {}
    r_x = {}
    conditions = scenario['third_stage']['conditions']
    values = scenario['third_stage']['values']
    # first stage
    for c_idx,list_v_p in scenario['third_stage']['probabilies'].items():
        cap = conditions[c_idx]
        upper_bound = q*cap
        # upper_bound for x
        m = gp.Model(f'second_stage_cap_{cap}')
        m.Params.LogToConsole = 0
        x = m.addVar(vtype = GRB.CONTINUOUS, name = 'x')
        
        m.addConstr(x<=upper_bound)
        x_d_aug = {}
        
        for v_idx,prob in list_v_p:
            d = values[v_idx]
            tem_x_d_aug = m.addVar(vtype = GRB.CONTINUOUS, name = f"min(x,{d})")
            # an auxiliary variable for min(x,d)
            
            x_d_aug[d] = tem_x_d_aug
            # store in the dict 
            
            m.addConstr(tem_x_d_aug<=x)
            m.addConstr(tem_x_d_aug<=d)

        m.setObjective(-x+sum( 1.5*x_d_aug[values[v_idx]]*prob for v_idx, prob in list_v_p) , GRB.MAXIMIZE)
        m.optimize()
        policy_x[cap] = x.X
        r_x[cap] =m.getObjective().getValue()
    return policy_x, sum( r_x[conditions[i]]*scenario['third_stage']['conditions_prob'][i] for i in  range(3))
print('q || optilmal policy of q || total reward')
for q in range(1,10):
    policy_x, expected_r = second_stage_policy(q, scenario)
    print(q,'||', policy_x, '||',expected_r-0.3*q)

q || optilmal policy of q || total reward
1 || {2: 2.0, 4: 4.0, 6: 6.0} || 1.3000000000000005
2 || {2: 4.0, 4: 8.0, 6: 12.0} || 2.600000000000001
3 || {2: 6.0, 4: 12.0, 6: 16.0} || 3.800000000000001
4 || {2: 8.0, 4: 14.0, 6: 16.0} || 4.400000000000001
5 || {2: 10.0, 4: 14.0, 6: 16.0} || 4.600000000000001
6 || {2: 10.0, 4: 14.0, 6: 16.0} || 4.300000000000002
7 || {2: 10.0, 4: 14.0, 6: 16.0} || 4.000000000000002
8 || {2: 10.0, 4: 14.0, 6: 16.0} || 3.7000000000000015
9 || {2: 10.0, 4: 14.0, 6: 16.0} || 3.4000000000000017


Obviously, for $q>=5$, the optimal policy is not sensitive to $q$ anymore, that is, the second stage is not sensitive to the upper bound $q*cap$ <br>
Thus, $q=5$ and $x(q,\cdot)$ = { 2->10, 4->14.0, 6->16} is optimal solution when no consideration of SSD 

#### Non-SSD Optimal policy as benchmark

In [48]:
q = 5
x = {}
x[q],_ = second_stage_policy(q, scenario)

optimal_benchmark = ssd_benchmark_by_q_x(5, x, scenario)
print(optimal_benchmark)

{'first_stage': {'r': [-1.5], 's': [1.0]}, 'second_stage': {'r': [-10.0, -14.0, -16.0], 's': [0.5, 0.4, 0.1]}, 'third_stage': {'conditions': [2, 4, 6], 'r': {0: [15.0, 15.0, 15.0], 1: [21.0, 24.0, 27.0], 2: [24.0, 27.0, 33.0]}, 's': {0: [0.5, 0.4, 0.1], 1: [0.4, 0.3, 0.3], 2: [0.4, 0.4, 0.2]}}}


#### Multi-cut algorithm
we use the above result from non-SSD optimal policy as the benchmark

In [None]:
import gurobipy as gp
from gurobipy import GRB
from collections import defaultdict

max_itr = 100

# we use the above non-SSD optimal solution/policy as the start point
def multicut_three_stage_newsvendor(q_0, x_0, scenario, benchmark):
    
    cap_list = [(scenario['second_stage']['values'][v_idx],prob) \
            for v_idx,prob in scenario['second_stage']['probabilies'][0]]
    cap_prob = {c:p for c,p in cap_list}
    
    prob_dist = {}
    for c_idx,prob_v in scenario['third_stage']['probabilies'].items():
        prob_dist[scenario['third_stage']['conditions'][c_idx]] =\
                [(scenario['third_stage']['values'][v_idx],prob) for v_idx,prob in prob_v ]
    
    # rhs of event cuts, w and u
    w = {}
    y = {} 
    for cap,_ in cap_list:
        y[cap] = defaultdict(float)
        for r,prob in zip(benchmark['third_stage']['r'][cap], benchmark['third_stage']['s'][cap]):
            y[cap][r]+=prob

        w[cap] = {}
        for y_3_j in y.keys():
            w[cap][y_3_j] = sum( max(y_3_j - y_3_i,0)*prob for y_3_i,prob in y[cap].items())
    
    theta = {}
    for cap,_ in cap_list:
        theta[cap] = benchmark['second_stage']['r'][cap] + sum(  y_3_i*prob for y_3_i,prob in y[cap].items())
    u = {}
    for cap,_ in cap_list:
        u[cap] =  sum( max(theta[cap] -  theta[s], 0)*cap_prob[s] for s,prob in cap_list)
        
    
    prob_dist = {}
    for c_idx,prob_v in scenario['third_stage']['probabilies'].items():
        prob_dist[scenario['third_stage']['conditions'][c_idx]] =\
                [(scenario['third_stage']['values'][v_idx],prob) for v_idx,prob in prob_v ]
    
    # initialize, step 0
    itr = 0
    event_cut = []
    obj_cut = []
    fs_cut = []
    
    q_ = q_0
    x_ = x_0
    
    # init the master problem
    master = gp.Model('master problem')
    q = master.addVar(vtype = GRB.CONTINUOUS, name = 'q')
    z_1 = master.addVar(vtype = GRB.CONTINUOUS, name = 'z_1')
    v = {}
    cap_prob = []
    n_cap = len(cap_prob_dist)
    for cap,prob in cap_prob_dist:
        tem_v = master.addVar(vtype = GRB.CONTINUOUS, name = f'v_cap={cap}')
        v.append(tem_v)
        cap_prob.append(prob)
    
    master.addConstr( z_1 <= -0.3*q)
    
    master_obj = z_1 + sum( v[i]*cap_prob[i] for i in range(n_cap) )
    
    # auxilary variables for the max( theta^{s_i} -v^{s_j} -z_1+y_1,0)
    
    # cut inequalities are added into master during iterations
    
    while(itr<max_itr): 
        # Step 1
        all_solvable_flag = True
        for cap,_ in cap_prob_dist:
            # each second stage subproblem in scenario cap = cap with q=q_k
            _,_,_,reward_second_stage[cap] = reward_problem_of_first_stage_ssd(q = Q_, z_1 = z_1_, \
                                              cap = CAP, prob_dist=prob_dist, benchmark=benchmark)
            
            if reward_second_stage[cap] is not math.inf:
                # try objective cuts
                master.addConstr( v[cap] <= (q-q_) + (z_1_ - z_1) )
            else:
                all_solvable_flag = False
                # try feasibility cuts
                pass
        if all_solvable_flag and all(abs(v_[cap]- reward_second_stage[cap])<0.001 for cap in cap_list):
            break
        
        # update (add) event cuts to master problem
        if all_solvable_flag:
            A = {}
            tem_sup = {}
            for cap,_ in cap_list:
                theta_j = theta[cap]
                A[cap] = [cap for cap,_ in cap_list if v_[cap] + z_1_-y_1<=theta_j]
                tem_sup[cap] = sum(max(theta[cap] - v_[cap]-z_1_+y_1, 0)*prob for cap,prob in cap_list) - u[cap]
                
            tem_sup_ = max(tem_sup)
            if tem_sup_ > 0:
                tem_sup_ = tem_sup_/2
                for cap,_ in cap_list:
                    if tem_sup[cap]>tem_sup_:
                        master.addConstr(sum( bp[cap][cap_A]*cap_prob[cap_A] for cap_A in A[cap])<=u[cap])
        # master solution
        master.optimize()
        
        v_ = { for cap in }
        itr+=1

In [None]:
import gurobipy as gp
from gurobipy import GRB
from collections import defaultdict

def reward_problem_of_first_stage_ssd(q:float, z_1:float, cap:float, prob_dist:dict, benchmark:dict)->(float, dict):
    itr =0
    prob_dist = prob_dist[cap]
    m = gp.Model('reward_problem_two_stage_SSD')
    # m.Params.LogToConsole = 0
    
    x = m.addVar(vtype = GRB.CONTINUOUS, name = 'x')
    z_2 = m.addVar(vtype = GRB.CONTINUOUS, name = 'z_2')
    sigma = m.addVar(vtype = GRB.CONTINUOUS, name = 'sigma')
    obj = m.addVar(vtype = GRB.CONTINUOUS, name = 'obj_value')
    
    m.addConstr( x<=q*cap )
    m.addConstr( -z_2<=-x )
    
    m.addConstr(sigma ==z_1 + z_2 - benchmark['first_stage']['r'][0] - benchmark['second_stage']['r'][cap] )

    y = defaultdict(float)
    for r,prob in zip(benchmark['third_stage']['r'][cap], benchmark['third_stage']['s'][cap]):
        y[r]+=prob
    
    w = {}
    for y_3_j in y.keys():
        w[y_3_j] = sum( max(y_3_j - y_3_i,0)*prob for y_3_i,prob in y.items())
    
    
    x_d_min = {}
    for d,p in prob_dist:
        tem_x_d_min = m.addVar(vtype = GRB.CONTINUOUS, name = f"min(x,{d})")
        # an auxiliary variable for min(x,d)
        x_d_min[d] = tem_x_d_min
        # store in the dict 

        m.addConstr(tem_x_d_min<=x)
        m.addConstr(tem_x_d_min<=d)
        
    x_d_min_0_max = defaultdict(dict)
    for y_3_j,w_j in w.items():
        tot = 0
        for d,p in prob_dist:
            tem_x_d_min_0_max = m.addVar(vtype = GRB.CONTINUOUS, name = f'({y_3_j}-sigma - Q^{d}_3(x) )_+')
            # an auxiliary variable for (y_j-sigma - Q^d_i_2(x))_+
            # f_X(d_i,x) in this case is 1.5*min(x,d_i) which is 1.5 x_d_aug[d_i]
            x_d_min_0_max[d][y_3_j] = tem_x_d_min_0_max
            m.addConstr( y_3_j-sigma - 1.5*x_d_min[d]<=tem_x_d_min_0_max)
            m.addConstr(tem_x_d_min_0_max>= 0)
            tot+=p*tem_x_d_min_0_max
        m.addConstr( tot <= w_j)
    
    #   problem.setObjective(z+sum( 1.5*x_d_aug[d]*p+ x_d_aug[d] -x_aug[d] for d,p in prob_dist ), GRB.MAXIMIZE)
    m.addConstr(obj == -x+sum( 1.5*x_d_min[d]*p for d,p in prob_dist))
    m.setObjective(-z_2+sum( 1.5*x_d_min[d]*p-sum(tem for _,tem in x_d_min_0_max[d].items() ) for d,p in prob_dist) ,\
                   GRB.MAXIMIZE)
    m.optimize()

    x_ = x.X
    z_2_ = z_2.X
    sigma_ = sigma.X
    obj_ = obj.X
    
#     while(itr<max_itr):
        
        
#         rewards = [ 1.5*min(x_,d) for d,_ in prob_dist]
#         new_events = []
#         for j,(y_j, w_j) in enumerate(zip(self.y_3,self.w)): # this for can be parallelism
#             if self.p*((y_j - s)*np.ones(self.n) - rewards) <= w_j:
#                 continue
#             event = [ i for i in range(L) if y_j -s > rewards[i]] 
#             new_events.append(event)
#         if not new_events:
#             break
#         else:
#             # add new event cuts as constr
            
#             problem.optimize()
#             # update x,z,sigma
#             x_ = x.X
#             z_ = z.X
#             sigma_ = sigma.X
#             itr+=1
    return x_,-z_2_,sigma_,obj_


In [None]:
import gurobipy as gp
from gurobipy import GRB
from collections import defaultdict

def multicut_three_stage_newsvendor(scenario, benchmark):
    # initialize, step 0
    event_cut = []
    obj_cut = []
    fs_cut = []
    
    m_0 = gp.Model('initial')
    
    q = m_0.addVar(vtype = GRB.CONTINUOUS, name = 'q')
    z_1 = m_0.addVar(vtype = GRB.CONTINUOUS, name = 'z_1')
    
    # each scenario of Q_2 requires a system of auxiliary variables 
    for in :
        tem_Q_2 = m_0.addVar(vtype = GRB.CONTINUOUS, name = 'Q_2'+f'cap={cap}')
        tem_x = m_0.addVar(vtype = GRB.CONTINUOUS, name = 'x'+f'cap={cap}')
        m_0.addConstr(tem_x<=cap*q)
        for in :
            tem_x_d_min = m_0.addVar(vtype = GRB.CONTINUOUS, name = 'x'+f'cap={cap}')
        m_0.addConstr(tem_Q_2==cap*q)
    
    
    cap_prob_dist = [(scenario['second_stage']['values'][v_i],p) for v_i,p scenario['second_stage']['probabilies']]
    
    for cap,_ in cap_prob_dist:
        tem_qxcap_x = m.addVar(vtype = GRB.CONTINUOUS, name = f"min(x,{d})")
        # an auxiliary variable for min(x,d)
    x_d_aug[d] = tem_x_d_aug
    # store in the dict 

    m.addConstr(tem_x_d_aug<=x)
    
    
    
    m.setObjective(-0.3*q+sum( -x+reward_from_3 for cap,prob in cap_prob_dist) , GRB.MAXIMIZE)
    # z = m.addVar(vtype = GRB.CONTINUOUS, name = 'z')
    sigma = m.addVar(vtype = GRB.CONTINUOUS, name = 'sigma')
    obj = m.addVar(vtype = GRB.CONTINUOUS, name = 'obj_v')

In [None]:
import gurobipy as gp
from gurobipy import GRB
from collections import defaultdict

max_itr = 100

# we use the above non-SSD optimal solution/policy as the start point
def multicut_three_stage_newsvendor(q_0, x_0, scenario, benchmark):
    
    prob_dist = {}
    for c_idx,prob_v in scenario['third_stage']['probabilies'].items():
        prob_dist[scenario['third_stage']['conditions'][c_idx]] =\
                [(scenario['third_stage']['values'][v_idx],prob) for v_idx,prob in prob_v ]
    
    # initialize, step 0
    itr = 0
    event_cut = []
    obj_cut = []
    fs_cut = []
    
    q_ = q_0
    x_ = x_0
    
    # init the master problem
    master = gp.Model('master problem')
    q = master.addVar(vtype = GRB.CONTINUOUS, name = 'q')
    z_1 = master.addVar(vtype = GRB.CONTINUOUS, name = 'z_1')
    v = {}
    cap_prob = []
    n_cap = len(cap_prob_dist)
    for cap,prob in cap_prob_dist:
        tem_v = master.addVar(vtype = GRB.CONTINUOUS, name = f'v_cap={cap}')
        v.append(tem_v)
        cap_prob.append(prob)
    
    master.addConstr( z_1 <= -0.3*q)
    
    master_obj = z_1 + sum( v[i]*cap_prob[i] for i in range(n_cap) )
    
    # cut inequalities are added into master during iterations
    
    while(itr<max_itr): 
        # Step 1
        for cap,_ in cap_prob_dist:
            # each second stage subproblem in scenario cap = cap with q=q_k
            tem_m_2 = gp.Model('second stage subproblem')
            tem_x = tem_m_2.addVar(vtype = GRB.CONTINUOUS, name = 'x')
            tem_sigma = tem_m_2.addVar(vtype = GRB.CONTINUOUS, name = 'sigma')

            tem_m_2.addConstr(sigma == -x -ssd_benchmark['first_stage']['r'][0] )

            tem_m_2.addConstr(tem_x<=cap*q)

            tem_y_2 = defaultdict(float)
            for r,prob in zip(ssd_benchmark['second_stage']['r'], ssd_benchmark['second_stage']['s']):
                y[r]+=prob
            w = {}
            for y_j in y.keys():
                w[y_j] = sum( max(y_j - y_i,0)*prob for y_i,prob in y.items())


            x_d_aug = {}

            for d,p in prob_dist:
                tem_x_d_aug = m.addVar(vtype = GRB.CONTINUOUS, name = f"min(x,{d})")
                # an auxiliary variable for min(x,d)
                x_d_aug[d] = tem_x_d_aug
                # store in the dict 

                m.addConstr(tem_x_d_aug<=x)
                m.addConstr(tem_x_d_aug<=d)

            aug_f = defaultdict(dict)
            for y_j,w_j in w.items():
                tot = 0
                for d,p in prob_dist:
                    tem_aug_f = m.addVar(vtype = GRB.CONTINUOUS, name = f'({y_j}-sigma - Q^{d}_2(x))_+')
                    # an auxiliary variable for (y_j-sigma - Q^d_i_2(x))_+
                    # f_X(d_i,x) in this case is 1.5*min(x,d_i) which is 1.5 x_d_aug[d_i]
                    aug_f[d][y_j] = tem_aug_f
                    m.addConstr( y_j-sigma - 1.5*x_d_aug[d]<=tem_aug_f)
                    m.addConstr(tem_aug_f>= 0)
                    tot+=p*tem_aug_f
                m.addConstr( tot <= w_j)

            #   problem.setObjective(z+sum( 1.5*x_d_aug[d]*p+ x_d_aug[d] -x_aug[d] for d,p in prob_dist ), GRB.MAXIMIZE)
            m.addConstr(obj == -x+sum( 1.5*x_d_aug[d]*p for d,p in prob_dist))
            m.setObjective(-x+sum( 1.5*x_d_aug[d]*p-sum(tem for _,tem in aug_f[d].items() ) for d,p in prob_dist) , GRB.MAXIMIZE)
            m.optimize()

            if tem_m_2.:
                # try objective cuts
                pass
            else:
                # try feasibility cuts
                pass
        
        # update event cuts

        # add cuts into master problem

        # master solution
        
        master.optimize()
        v_ = { for cap in }
        
        if all(abs(v_[cap]- reward_second_stage[cap])<0.001 for cap in cap_list):
            break

        
        
        
    

True

In [68]:
import gurobipy as gp
from gurobipy import GRB
from collections import defaultdict

def reward_problem_of_first_stage_ssd(q:float, z_1:float, cap:float, prob_dist:dict, benchmark:dict)->(float, dict):
    itr =0
    prob_dist = prob_dist[cap]
    m = gp.Model('reward_problem_two_stage_SSD')
    # m.Params.LogToConsole = 0
    
    x = m.addVar(vtype = GRB.CONTINUOUS, name = 'x')
    z_2 = m.addVar(vtype = GRB.CONTINUOUS, name = 'z_2')
    sigma = m.addVar(vtype = GRB.CONTINUOUS, name = 'sigma')
    obj = m.addVar(vtype = GRB.CONTINUOUS, name = 'obj_value')
    
    m.addConstr( x<=q*cap )
    m.addConstr( -z_2<=-x )
    
    m.addConstr(sigma ==z_1 + z_2 - benchmark['first_stage']['r'][0] - benchmark['second_stage']['r'][cap] )

    y = defaultdict(float)
    for r,prob in zip(benchmark['third_stage']['r'][cap], benchmark['third_stage']['s'][cap]):
        y[r]+=prob
    
    w = {}
    for y_3_j in y.keys():
        w[y_3_j] = sum( max(y_3_j - y_3_i,0)*prob for y_3_i,prob in y.items())
    
    
    x_d_min = {}
    for d,p in prob_dist:
        tem_x_d_min = m.addVar(vtype = GRB.CONTINUOUS, name = f"min(x,{d})")
        # an auxiliary variable for min(x,d)
        x_d_min[d] = tem_x_d_min
        # store in the dict 

        m.addConstr(tem_x_d_min<=x)
        m.addConstr(tem_x_d_min<=d)
        
    x_d_min_0_max = defaultdict(dict)
    for y_3_j,w_j in w.items():
        tot = 0
        for d,p in prob_dist:
            tem_x_d_min_0_max = m.addVar(vtype = GRB.CONTINUOUS, name = f'({y_3_j}-sigma - Q^{d}_3(x) )_+')
            # an auxiliary variable for (y_j-sigma - Q^d_i_2(x))_+
            # f_X(d_i,x) in this case is 1.5*min(x,d_i) which is 1.5 x_d_aug[d_i]
            x_d_min_0_max[d][y_3_j] = tem_x_d_min_0_max
            m.addConstr( y_3_j-sigma - 1.5*x_d_min[d]<=tem_x_d_min_0_max)
            m.addConstr(tem_x_d_min_0_max>= 0)
            tot+=p*tem_x_d_min_0_max
        m.addConstr( tot <= w_j)
    
    #   problem.setObjective(z+sum( 1.5*x_d_aug[d]*p+ x_d_aug[d] -x_aug[d] for d,p in prob_dist ), GRB.MAXIMIZE)
    m.addConstr(obj == -x+sum( 1.5*x_d_min[d]*p for d,p in prob_dist))
    m.setObjective(-z_2+sum( 1.5*x_d_min[d]*p-sum(tem for _,tem in x_d_min_0_max[d].items() ) for d,p in prob_dist) ,\
                   GRB.MAXIMIZE)
    m.optimize()

    x_ = x.X
    z_2_ = z_2.X
    sigma_ = sigma.X
    obj_ = obj.X
    
#     while(itr<max_itr):
        
        
#         rewards = [ 1.5*min(x_,d) for d,_ in prob_dist]
#         new_events = []
#         for j,(y_j, w_j) in enumerate(zip(self.y_3,self.w)): # this for can be parallelism
#             if self.p*((y_j - s)*np.ones(self.n) - rewards) <= w_j:
#                 continue
#             event = [ i for i in range(L) if y_j -s > rewards[i]] 
#             new_events.append(event)
#         if not new_events:
#             break
#         else:
#             # add new event cuts as constr
            
#             problem.optimize()
#             # update x,z,sigma
#             x_ = x.X
#             z_ = z.X
#             sigma_ = sigma.X
#             itr+=1
    return x_,-z_2_,sigma_,obj_

############################################################
scenario = {}

scenario['second_stage'] = {
    'random_variables': ['cap'],
    'conditions': [],
    'values':[2, 4, 6],
    'probabilies':{
        0: [(0,0.5), (1,0.4),(2,0.1)]
    }
}

scenario['third_stage'] = {
    'random_variables': ['d'],
    'conditions': scenario['second_stage']['values'],
    'conditions_prob':[prob for _,prob in scenario['second_stage']['probabilies'][0] ],
    'values': [10, 14, 16, 18, 22],
    'probabilies':{
        0: [(0,0.5), (1,0.4), (3,0.1)],
        1: [(1,0.4), (2,0.3), (3,0.3)],
        2: [(2,0.4), (3,0.4), (4,0.2)]
    }
}

prob_dist = {}
for c_idx,prob_v in scenario['third_stage']['probabilies'].items():
    prob_dist[scenario['third_stage']['conditions'][c_idx]] =\
            [(scenario['third_stage']['values'][v_idx],prob) for v_idx,prob in prob_v ]

def ssd_benchmark_by_q_x( q, x, scenario):
    benchmark = {}
    benchmark['first_stage'] = {
        'r': [-q*0.3],
        's': [1.0]
    }
    
    # tem_num_condition = len(scenario['second_stage']['conditions'])
    benchmark['second_stage'] = {
        'r': {cap:-x[q][cap] for cap in scenario['second_stage']['values']},
        's': [ prob for _,prob in scenario['second_stage']['probabilies'][0]]
    }
    
    conditions = scenario['third_stage']['conditions']
    values = scenario['third_stage']['values']
    
    benchmark['third_stage'] = {
        'conditions': scenario['third_stage']['conditions'],
        'r': { 
                conditions[c_idx]:[ 1.5*min(q*conditions[c_idx],\
                                    values[v_idx]) for v_idx,p in prob_v] \
                  for c_idx,prob_v in scenario['third_stage']['probabilies'].items()
        },
        's': { 
                conditions[c_idx]:[ p for v_idx,p in prob_v] \
                  for c_idx,prob_v in scenario['third_stage']['probabilies'].items()
        }
    }
    return benchmark

q = 4
x = { q:{cap: min(q*cap,10) for cap in [2,4,6]} for q in range(1,10)}
# policy that agent buy as much as possible if supply is less than 10
#  otherwise buy 10 units
benchmark = ssd_benchmark_by_q_x( q, x, scenario)

print(benchmark)

reward_problem_of_first_stage_ssd(q = 4, z_1 = -1.2, cap = 4, prob_dist=prob_dist, benchmark=benchmark)

{'first_stage': {'r': [-1.2], 's': [1.0]}, 'second_stage': {'r': {2: -8, 4: -10, 6: -10}, 's': [0.5, 0.4, 0.1]}, 'third_stage': {'conditions': [2, 4, 6], 'r': {2: [12.0, 12.0, 12.0], 4: [21.0, 24.0, 24.0], 6: [24.0, 27.0, 33.0]}, 's': {2: [0.5, 0.4, 0.1], 4: [0.4, 0.3, 0.3], 6: [0.4, 0.4, 0.2]}}}
Gurobi Optimizer version 9.5.1 build v9.5.1rc2 (linux64)
Thread count: 8 physical cores, 16 logical processors, using up to 16 threads
Optimize a model with 24 rows, 13 columns and 49 nonzeros
Model fingerprint: 0x12177211
Coefficient statistics:
  Matrix range     [3e-01, 2e+00]
  Objective range  [4e-01, 1e+00]
  Bounds range     [0e+00, 0e+00]
  RHS range        [1e+00, 2e+01]
Presolve removed 15 rows and 7 columns
Presolve time: 0.00s
Presolved: 9 rows, 6 columns, 22 nonzeros

Iteration    Objective       Primal Inf.    Dual Inf.      Time
       0    7.0000000e+00   0.000000e+00   0.000000e+00      0s
       0    7.0000000e+00   0.000000e+00   0.000000e+00      0s

Solved in 0 iterations 

(14.0, -14.0, 24.0, 7.0)