<CENTER>
<p><font size="7"><b>IMT Summer School 2024</b></font></p>
<p><font size="5">Dynamic resource allocation problems in communication networks</font></p>
<p><font size="5"><b>June 25-27: </b>Day 2 - <i>IMPROVING THE RESOLUTION OF MARKOV DECISION PROCESSES</i></font></p>
</CENTER>

**Usefull links:**

- Summer school [website](https://sites.google.com/view/alexandre-reiffers/courses/dynamic-resource-allocation-problems-in-communication-networks).
- It was organized by the team of the [RAMONaaS project](https://sites.google.com/view/ramonaas/home) at IMT Atlantique in Brest, France.
- For a brief introduction to Numpy you can have look [here](https://github.com/brain-bzh/introduction-to-ai/blob/main/session1/lab/TP0.ipynb), and for a more thorough overview [here](https://www.w3schools.com/python/numpy/default.asp).
- One the best (and easiest) ways to solve convex problems is using the [CVXPY library](https://www.cvxpy.org/).


**About:**

- This lab is about the resolution of Markov Decision Processes (MDPs).
- The objective is the resolution of large scale MDPs.
- In this order, some tips are proposed for dimension reduction in a particular case of load balancing problem.


**The lab is divided in 3 sections:**

- In the first section, you [model a load balancing problem](#load_balancing) as an MDP.
- Then you use the resolution technique to [solve it](#load_balancing_resolution).
- Finaly you will evaluate the relevance of [reducing the dimension of the problem](#load_balancing_dimension_reduction).


<RIGHT>
<div style="text-align: right">
<img
src="https://www.imt-atlantique.fr/sites/default/files/Images/Ecole/charte-graphique/IMT_Atlantique_logo_RVB_Baseline_400x272.jpg"
width="100" height="68">
</div>
</RIGHT>


----------------------------
----------------------------

In [83]:
import math
import time
import torch
import itertools
import plotly.graph_objects as go
import numpy as np
import matplotlib.pyplot as plt
import plotly.subplots as ps
import ipywidgets as widgets
from IPython.display import display

# **<a id="load_balancing">A Load Balancing Problem</a>**

Let us take a practical case of MDP, which consists in modeling a load balancing problem.

<div style="text-align: center">
  <figure>
    <img src="https://i.ibb.co/prmNRfT/load-balancing.jpg" alt="Load Balancing Problem">
    <figcaption>Figure 1: A Load Balancing Problem</figcaption>
  </figure>
</div>

- A load balancer receives one job at a time and transmits it to one of the $Q$ queues of sizes $B_1$, $\ldots$, $B_Q$.
- The queue cannot be overloaded; instead, an extra job is loss.
- After ths transmission, each queue $q$ process each existing job at rate $\lambda_q$.
- Let $\text{cost}_\text{loss}$ and $\text{cost}_\text{delay}$ respectively represent the cost charged for losing a job and retaining a job in a queue, these cost being discounted at rate $\gamma$.


**Research Question:** *What is the optimal load balancing policy?*

# **<a id="mdp_for_load_balancing">MDP for the Load Balancing Problem</a>**

Lets use tensor objetcs to define the parameters of the load balancing problem.

## **<a id="mdp_for_load_balancing_states_and_actions">The State and Action Spaces</a>**

An action is the selection of a queue. Let's say it is an integer number in $\left\{0, 1, \ldots, Q-1\right\}$, where $a=q-1$ means the job is sent to queue $q$.

A state is any tensor representing $s = \left(s_q\right)_{q=0}^{Q-1}$.

### Function implementation of States and Actions

Objective:
- Construct the state space for the load balancing problem.

Args:
- number_queues (optional, int): number of queues. Defaults to 4.
- buffers_size (optional, tensor3): buffer size (maximum value) of each queue. Defaults to 3 for each queue.

Returns:
- The list of states.
- The updated number of queues (in case it does not match the tensor of buffers size).
- The tensor of buffers size in case it was not entered.

In [84]:
def state_space_construction(number_queues:int=4, buffers_size:torch.Tensor=None) -> list:
    
    if buffers_size == None:
        buffers_size = 3*torch.ones([number_queues],dtype=int)
    combinations = itertools.product(*[range(B + 1) for B in buffers_size])
    state_space = [torch.tensor(combination) for combination in combinations]
    updated_number_queues = buffers_size.numel()
    return state_space, updated_number_queues, buffers_size

### Example of state space

In [85]:
bs = torch.tensor([2,1,3])  # Replace by a one-dimension tensor of ints
print(f"State space with B = {bs}")
SP = state_space_construction(buffers_size=bs)[0]
for s_idx in range(len(SP)):
    if s_idx % 7 == 0: print()
    else: print("   ", end="")
    print(SP[s_idx], end="")

print(f"\n\n{len(SP)} states.")

State space with B = tensor([2, 1, 3])

tensor([0, 0, 0])   tensor([0, 0, 1])   tensor([0, 0, 2])   tensor([0, 0, 3])   tensor([0, 1, 0])   tensor([0, 1, 1])   tensor([0, 1, 2])
tensor([0, 1, 3])   tensor([1, 0, 0])   tensor([1, 0, 1])   tensor([1, 0, 2])   tensor([1, 0, 3])   tensor([1, 1, 0])   tensor([1, 1, 1])
tensor([1, 1, 2])   tensor([1, 1, 3])   tensor([2, 0, 0])   tensor([2, 0, 1])   tensor([2, 0, 2])   tensor([2, 0, 3])   tensor([2, 1, 0])
tensor([2, 1, 1])   tensor([2, 1, 2])   tensor([2, 1, 3])

24 states.


In [86]:
num_state = len(SP)
num_queues = SP[0].size(0)
NSP = [state/bs for state in SP]
data = np.stack([tensor.numpy() for tensor in NSP], axis=1)

# Create the heatmap
fig = go.Figure(data=go.Heatmap(
    z=data,
    x=np.arange(num_state),
    y=np.arange(num_queues),
    colorscale='blues',
    showscale=True,
    colorbar=dict(
        title='Queue occupancy',
        titleside='right',
        tickmode='array',
        tickvals=np.linspace(0, 1, 5),
        ticktext=[f'{v:.2f}' for v in np.linspace(0, 1, 5)]
    ),
    zmin=0,
    zmax=1
))

# Add cells bounds
fig.update_traces(
    hoverongaps=False,
    xgap=1,  # Set the gap between cells
    ygap=1
)

# Update the axis
fig.update_layout(
    xaxis=dict(
        tickmode='array',
        tickvals=np.arange(num_state),
        ticktext=np.arange(1,num_state+1)
    ),
    yaxis=dict(
        tickmode='array',
        tickvals=np.arange(num_queues),
        ticktext=np.arange(1,num_queues+1)
    ),
    title='The state space',
    xaxis_title='State index',
    yaxis_title='Queue index'
)

# Plot the figure
fig.show()


## **<a id="mdp_for_load_balancing_transitions">The Transitions</a>**

Let us examine the two stages of transition dynamics: 

1. The deterministic stage in which the job is transmitted to the selected queue or loss according to the buffer availability. The result of this transisition is given by the function  **next()**.
2. The probabilistic stage in which a non-predictable number of jobs are proceeded. The probability at which a transition happens in this stage is given by the function **probability()**.

We first implement these two functions (*next()* and *probability()*) and use them to define the transition function (**transition_function()**), then we implement a function, **transition_dynamics()**, that returns a tensor reprsenting the transition dynamics.

### **<a id="mdp_for_load_balancing_transition_stage1">First Stage Transition at each period: The Deterministic Transition</a>** (the **next()** function)

Objective:
- Finds the state that results from performing a given action.
    
Args:
- state (tensor): the state in which the action is performed.
- action (int): the action performed.
- buffers_size (tensor): the maximum size of each queue.

Returns:
- The resulting state.

In [87]:
def next(state:torch.Tensor, action:int, buffers_size:torch.Tensor) -> torch.Tensor:
    if action==0 or action>buffers_size.numel(): return state
    if state[action-1]>=buffers_size[action-1]: return state
    next_state = state.clone()
    next_state[action-1] = state[action-1] + 1
    return next_state

#### Example of deterministic transition

In [88]:
bs = torch.tensor([2,1,3])
a = 3  # Replace a by any suitable action
print(f"Deterministic transition with B = {bs} and a = {a}")
state_space = state_space_construction(buffers_size=bs)[0]
for s_idx in range(len(state_space)):
    if s_idx % 4 == 0: print()
    else: print("   ", end="")
    state = state_space[s_idx]
    state_prime = next(state=state,action=a, buffers_size=bs)
    print(f"s = {state.numpy()} --->  s' = {state_prime.numpy()}    | ", end="")

Deterministic transition with B = tensor([2, 1, 3]) and a = 3

s = [0 0 0] --->  s' = [0 0 1]    |    s = [0 0 1] --->  s' = [0 0 2]    |    s = [0 0 2] --->  s' = [0 0 3]    |    s = [0 0 3] --->  s' = [0 0 3]    | 
s = [0 1 0] --->  s' = [0 1 1]    |    s = [0 1 1] --->  s' = [0 1 2]    |    s = [0 1 2] --->  s' = [0 1 3]    |    s = [0 1 3] --->  s' = [0 1 3]    | 
s = [1 0 0] --->  s' = [1 0 1]    |    s = [1 0 1] --->  s' = [1 0 2]    |    s = [1 0 2] --->  s' = [1 0 3]    |    s = [1 0 3] --->  s' = [1 0 3]    | 
s = [1 1 0] --->  s' = [1 1 1]    |    s = [1 1 1] --->  s' = [1 1 2]    |    s = [1 1 2] --->  s' = [1 1 3]    |    s = [1 1 3] --->  s' = [1 1 3]    | 
s = [2 0 0] --->  s' = [2 0 1]    |    s = [2 0 1] --->  s' = [2 0 2]    |    s = [2 0 2] --->  s' = [2 0 3]    |    s = [2 0 3] --->  s' = [2 0 3]    | 
s = [2 1 0] --->  s' = [2 1 1]    |    s = [2 1 1] --->  s' = [2 1 2]    |    s = [2 1 2] --->  s' = [2 1 3]    |    s = [2 1 3] --->  s' = [2 1 3]    | 

In [89]:
# Visualizing the first stage transition
bs = torch.tensor([2, 1, 3])
SP = state_space_construction(buffers_size=bs)[0]

# Parameters
num_states = len(SP)
num_queues = bs.size(0)

# Normalize the state space
NSP = [state.float() / bs for state in SP]

# Convert in data for visualization
data_nsp = np.stack([ns.numpy() for ns in NSP], axis=1)

# Create the plot regions
fig = ps.make_subplots(
    rows=2, cols=1,
    row_heights=[0.1, 0.1],
    shared_xaxes=True,
    vertical_spacing=0.1,
    subplot_titles=("Initial state", "Difference between initial and intermediate state (light for equal values, dark for different values)")
)

# Add the heatmap for the normalized initial states
heatmap_nsp1 = go.Heatmap(
    z=data_nsp,
    x=np.arange(num_states),
    y=np.arange(num_queues),
    colorscale='blues',
    showscale=True,
    colorbar=dict(
        title='Initial queue occupancy',
        titleside='right'
    ),
    zmin=0,
    zmax=1
)
fig.add_trace(heatmap_nsp1, row=1, col=1)

# Add a heatmap for the difference between the initial and intermediate states
initial_diff = np.zeros((num_queues, num_states))
heatmap_diff = go.Heatmap(
    z=initial_diff,
    x=np.arange(num_states),
    y=np.arange(num_queues),
    colorscale='greens',
    showscale=False,  
    zmin=0,
    zmax=1
)
fig.add_trace(heatmap_diff, row=2, col=1)

# Cells borders
fig.update_traces(
    hoverongaps=False,
    xgap=1,  # Set the gap between cells
    ygap=1
)

# Axes
fig.update_layout(
    title='First stage: deterministic transition',
    xaxis=dict(
        title='State index',
        tickvals=np.arange(num_states),
        ticktext=[str(i) for i in range(num_states)]
    ),
    yaxis=dict(
        title='Queue index',
        tickvals=np.arange(num_queues),
        ticktext=[str(i+1) for i in range(num_queues)]
    ),
    height=500
)
fig.update_yaxes(
    title='Queue index',
    tickvals=np.arange(num_queues),
    ticktext=[str(i+1) for i in range(num_queues)],
    row=2, col=1
)
fig.update_xaxes(
    title='State index',
    tickmode='array',
    tickvals=np.arange(num_states),
    ticktext=[str(i+1) for i in range(num_states)],
    row=2, col=1
)

# Cursor to select the action
slider_1 = widgets.IntSlider(min=0, max=num_queues, step=1, value=0, description='Action')

# Prepare the dynamic update of the figure
fig_widget = go.FigureWidget(fig)

def update_heatmap1(a):
    list_diff = [next(SP[s], a, bs) - SP[s] for s in range(num_states)]
    data_diff = torch.stack(list_diff).numpy()
    with fig_widget.batch_update():
        fig_widget.data[1].z = data_diff.T 

widgets.interact(update_heatmap1, a=slider_1)

# Afficher la figure widget et le curseur
display(fig_widget, slider_1)


interactive(children=(IntSlider(value=0, description='Action', max=3), Output()), _dom_classes=('widget-intera…

FigureWidget({
    'data': [{'colorbar': {'title': {'side': 'right', 'text': 'Initial queue occupancy'}},
              'colorscale': [[0.0, 'rgb(247,251,255)'], [0.125,
                             'rgb(222,235,247)'], [0.25, 'rgb(198,219,239)'],
                             [0.375, 'rgb(158,202,225)'], [0.5,
                             'rgb(107,174,214)'], [0.625, 'rgb(66,146,198)'],
                             [0.75, 'rgb(33,113,181)'], [0.875, 'rgb(8,81,156)'],
                             [1.0, 'rgb(8,48,107)']],
              'hoverongaps': False,
              'showscale': True,
              'type': 'heatmap',
              'uid': 'a8066f19-0d5e-4e5e-832d-598488770192',
              'x': array([ 0,  1,  2,  3,  4,  5,  6,  7,  8,  9, 10, 11, 12, 13, 14, 15, 16, 17,
                          18, 19, 20, 21, 22, 23]),
              'xaxis': 'x',
              'xgap': 1,
              'y': array([0, 1, 2]),
              'yaxis': 'y',
              'ygap': 1,
              'z':

IntSlider(value=0, description='Action', max=3)

### **<a id="mdp_for_load_balancing_transition_stage2">Second Stage Transition at each period: The Probabilistic Transition</a>** (the **probability()** function)

Objective:
- Compute the transition probability from one state to another in the second stage of the transition dynamics.

Args:
- intermediate_state (tensor): state resulting from the first stage transition.
- final_state (tensor): any existing state.
- service_rates (tensor): service rate for each queue.

Returns:
- The probability of transition from intermediate_state to final_state in the second stage.

In [90]:
def probability(intermediate_state:torch.Tensor, final_state:torch.Tensor, service_rates:torch.Tensor) -> float:
    nb_queues = service_rates.numel()
    proba = 1.0
    for queue in range(nb_queues):
        max_to_process = intermediate_state[queue]
        unprocessed = final_state[queue]
        if max_to_process < unprocessed: return 0.0
        processed = max_to_process - unprocessed
        sr = service_rates[queue]
        proba *= math.comb(max_to_process,unprocessed) * sr**processed * (1-sr)**unprocessed
    return proba

### **<a>The overall transition (first and second stages) at each period</a>** (the **transition_function()** and **transition_dynamics()** functions)


1. We can now define the transition function (**transition_function()**), that retunrs the probability of each transition from one state to any state given an action.
2. It may also be intersting to avoid this calculation all the time and, instead, generate a tensor that capture the transition dynamics and use it whenever it is necessary. This tensor is generated by the function **transition_dynamics()**.

#### The Transition Function

Objective:
- Compute the probability of transition to a given state.

Args:
- state1 (tensor): state at which the action is performed.
- state2 (tensor): state to which the transition probability is to evaluate.
- action (int): decision taken in state1 before the transition.
- buffers_size (tensor): the maximum size of each queue.
- service_rates (tensor): service rate for each queue.

Returns:
- Probability of transition to *sate2* when action *action* was taken in *state1*.

In [91]:
def transition_function(state1:torch.Tensor, action:int, state2:torch.Tensor, buffers_size:torch.Tensor, service_rates:torch.Tensor) -> float:
    intermediate_state = next(state=state1,action=action,buffers_size=buffers_size)
    return probability(intermediate_state=intermediate_state,final_state=state2,service_rates=service_rates)

#### The Transition Dynamics

Objective:
- Compute the tensor that captures transition dynamics. The states are referred to by their indices.

Args:
- state_space (list): list of states.
- buffers_size (tensor): the maximum size of each queue.
- service_rates (tensor): service rate for each queue.

Returns:
- The tensor of transition matrices.
- The time to get it.

In [92]:
def transition_dynamics(state_space:list, buffers_size:torch.Tensor, service_rates:torch.Tensor) -> torch.Tensor:
    begin = time.time()
    nb_states = len(state_space)
    nb_actions = buffers_size.numel() + 1
    nb_steps = nb_states**2 * nb_actions
    step = 0
    TD = torch.zeros([nb_states,nb_actions,nb_states],dtype=float)
    for s_idx1, state1 in enumerate(state_space):
        for a in range(nb_actions):
            for s_idx2, state2 in enumerate(state_space):
                step += 1
                print(f"Fetching the transition dynamics   .....   step {step}/{nb_steps} ({step/nb_steps:.2%})", end="\r")
                TD[s_idx1,a,s_idx2] = transition_function(state1=state1, action=a, state2=state2, buffers_size=buffers_size, service_rates=service_rates)
    end = time.time()
    print(" "*150,end="\r")
    return TD, end - begin

## **<a id="mdp_for_load_balancing_costs">The Costs</a>**

1. We can now define the cost function (**cost_function()**), that retunrs the cost for taking an action in a given state.
2. It may also be intersting to avoid this calculation all the time and, instead, generate a tensor that capture the possible costs and use it whenever it is necessary. This tensor is generated by the function **cost_matrix()**.

### **<a id="mdp_for_load_balancing_cost_function">The Cost Function</a>**

- The cost is linear in the number of loss and unprocessed jobs, i.e., 

    `Total cost` = `cost per loss job` × `loss jobs` + `cost per unproceeded job` × `number of unproceeded jobs`
    
    
    The loss happens in the deterministic stage of the transition, that results in an intermediate state; The number of unproceeded jobs relies on the difference between the intermediate state and the final state of the period.
- The number of loss jobs is returned by the function **loss()**.
- The number of unprocessed jobs is random, its mathematical expectation is added to *loss()* the mathematical expectation of the total cost of period is returned by the function **cost()**.
- In a queue of service rate $\lambda_q$ and containing $s_q$ jobs in the intermediate state, in expectation, $\lambda_q s_q$ jobs will be proceeded and $\left(1-\lambda_q\right) s_q$ jobs will remain in the queue.

#### Loss

Objective:
- Finds the loss that results from performing a given action.

Args:
- state (tensor): the state in which the action is performed.
- action (int): the action performed.
- buffers_size (tensor): the maximum size of each queue.

Returns:
- The resulting loss.

In [93]:
def loss(state:torch.Tensor, action:int, buffers_size:torch.Tensor) -> float:
    if action==0: return 1
    if state[action-1]>=buffers_size[action-1]: return 1
    return 0

#### Cost

Objective:
- Computes the cost charged for taking some action in some state.

Args:
- state (tensor): the state in which the action is performed.
- action (int): the action performed.
- buffers_size (tensor): the maximum size of each queue.
- service_rates (tensor): service rate of each queue.

Returns:
- The cost.

In [94]:
def cost_function(state:torch.tensor, action:int, buffers_size:torch.tensor, service_rates:torch.tensor, cost_loss:float=2.0, cost_delay:float=1.0) -> float:
    jobs_loss = loss(state=state, action=action, buffers_size=buffers_size)
    intermediate_state = next(state=state, action=action, buffers_size=buffers_size)
    jobs_unproceeded = torch.dot(1-service_rates.to(float), intermediate_state.to(float)).item()
    return cost_loss*jobs_loss + cost_delay*jobs_unproceeded

#### Examples of loss and total cost

##### Example of loss

In [95]:
bs = torch.tensor([2,1,3])
a = 0  # Replace a by any suitable action
print(f"Cost with B = {bs} and a = {a}")
state_space = state_space_construction(buffers_size=bs)[0]
for s_idx in range(len(state_space)):
    if s_idx % 4 == 0: print()
    else: print("   ", end="")
    state = state_space[s_idx]
    jobs_loss = loss(state=state,action=a, buffers_size=bs)
    print(f"s = {state.numpy()} --->  loss = {jobs_loss}    | ", end="")

Cost with B = tensor([2, 1, 3]) and a = 0

s = [0 0 0] --->  loss = 1    |    s = [0 0 1] --->  loss = 1    |    s = [0 0 2] --->  loss = 1    |    s = [0 0 3] --->  loss = 1    | 
s = [0 1 0] --->  loss = 1    |    s = [0 1 1] --->  loss = 1    |    s = [0 1 2] --->  loss = 1    |    s = [0 1 3] --->  loss = 1    | 
s = [1 0 0] --->  loss = 1    |    s = [1 0 1] --->  loss = 1    |    s = [1 0 2] --->  loss = 1    |    s = [1 0 3] --->  loss = 1    | 
s = [1 1 0] --->  loss = 1    |    s = [1 1 1] --->  loss = 1    |    s = [1 1 2] --->  loss = 1    |    s = [1 1 3] --->  loss = 1    | 
s = [2 0 0] --->  loss = 1    |    s = [2 0 1] --->  loss = 1    |    s = [2 0 2] --->  loss = 1    |    s = [2 0 3] --->  loss = 1    | 
s = [2 1 0] --->  loss = 1    |    s = [2 1 1] --->  loss = 1    |    s = [2 1 2] --->  loss = 1    |    s = [2 1 3] --->  loss = 1    | 

##### Example of cost

In [96]:
bs = torch.tensor([2,1,3])
sr = torch.rand((3))
a = 0  # Replace a by any suitable action
print(f"Number of jobs loss with B = {bs.numpy()}, sr = {sr.numpy()} and a = {a}")
state_space = state_space_construction(buffers_size=bs)[0]
for s_idx in range(len(state_space)):
    if s_idx % 4 == 0: print()
    else: print("   ", end="")
    state = state_space[s_idx]
    cost = cost_function(state=state, action=a, buffers_size=bs, service_rates=sr)
    print(f"s = {state.numpy()} --->  cost = {cost:.4f}    | ", end="")

Number of jobs loss with B = [2 1 3], sr = [0.7733791  0.4721961  0.50632375] and a = 0

s = [0 0 0] --->  cost = 2.0000    |    s = [0 0 1] --->  cost = 2.4937    |    s = [0 0 2] --->  cost = 2.9874    |    s = [0 0 3] --->  cost = 3.4810    | 
s = [0 1 0] --->  cost = 2.5278    |    s = [0 1 1] --->  cost = 3.0215    |    s = [0 1 2] --->  cost = 3.5152    |    s = [0 1 3] --->  cost = 4.0088    | 
s = [1 0 0] --->  cost = 2.2266    |    s = [1 0 1] --->  cost = 2.7203    |    s = [1 0 2] --->  cost = 3.2140    |    s = [1 0 3] --->  cost = 3.7076    | 
s = [1 1 0] --->  cost = 2.7544    |    s = [1 1 1] --->  cost = 3.2481    |    s = [1 1 2] --->  cost = 3.7418    |    s = [1 1 3] --->  cost = 4.2355    | 
s = [2 0 0] --->  cost = 2.4532    |    s = [2 0 1] --->  cost = 2.9469    |    s = [2 0 2] --->  cost = 3.4406    |    s = [2 0 3] --->  cost = 3.9343    | 
s = [2 1 0] --->  cost = 2.9810    |    s = [2 1 1] --->  cost = 3.4747    |    s = [2 1 2] --->  cost = 3.9684    |    s

### **<a id="mdp_for_load_balancing_cost_matrix">The Cost Matrix</a>**

Objective:
- Compute the cost matrix. The states are referred to by their indices.

Args:
- state_space (list): list of states.
- buffers_size (tensor): the maximum size of each queue.
- service_rates (tensor): service rate for each queue.

Returns:
- The cost matrix.
- The time to get it.

In [97]:
def cost_matrix(state_space:list, buffers_size:torch.Tensor, service_rates:torch.Tensor, cost_loss:float=2.0, cost_delay:float=1.0) -> torch.Tensor:
    begin = time.time()
    nb_states = len(state_space)
    nb_actions = buffers_size.numel() + 1
    nb_steps = nb_states*nb_actions
    step = 0
    CM = torch.zeros([nb_states,nb_actions],dtype=float)
    for s_idx, state in enumerate(state_space):
        for a in range(nb_actions):
            step += 1
            print(f"Fetching the cost matrix   .....   step {step}/{nb_steps} ({step/nb_steps:.2%})", end="\r")
            CM[s_idx,a] = cost_function(state=state, action=a, buffers_size=buffers_size, service_rates=service_rates, cost_loss=cost_loss, cost_delay=cost_delay)
    end = time.time()
    return CM, end - begin

#### Example of cost matrix

In [98]:
bs = torch.tensor([2,1,3])
sr = torch.rand((3))
a = 1  # Replace a by any suitable action
print(f"Cost matrix with B = {bs.numpy()}, service rate = {sr.numpy()} and a = {a}\n")
CM, time_ = cost_matrix(state_space=state_space, buffers_size=bs, service_rates=sr)
print("Cost matrix")
print(CM.numpy())
print(f"Elapse time = {time_}")

Cost matrix with B = [2 1 3], service rate = [0.60475546 0.30605632 0.39700103] and a = 1

Cost matrixe cost matrix   .....   step 96/96 (100.00%)
[[2.         0.39524454 0.69394368 0.60299897]
 [2.60299897 0.99824351 1.29694265 1.20599794]
 [3.20599794 1.60124248 1.89994162 1.80899692]
 [3.80899692 2.20424145 2.5029406  3.80899692]
 [2.69394368 1.08918822 2.69394368 1.29694265]
 [3.29694265 1.69218719 3.29694265 1.89994162]
 [3.89994162 2.29518616 3.89994162 2.5029406 ]
 [4.5029406  2.89818513 4.5029406  4.5029406 ]
 [2.39524454 0.79048908 1.08918822 0.99824351]
 [2.99824351 1.39348805 1.69218719 1.60124248]
 [3.60124248 1.99648702 2.29518616 2.20424145]
 [4.20424145 2.59948599 2.89818513 4.20424145]
 [3.08918822 1.48443276 3.08918822 1.69218719]
 [3.69218719 2.08743173 3.69218719 2.29518616]
 [4.29518616 2.6904307  4.29518616 2.89818513]
 [4.89818513 3.29342967 4.89818513 4.89818513]
 [2.79048908 2.79048908 1.48443276 1.39348805]
 [3.39348805 3.39348805 2.08743173 1.99648702]
 [3.996

In [99]:
# Visualizing the first stage transition
bs = torch.tensor([2, 1, 3])
SP = state_space_construction(buffers_size=bs)[0]

# Parameters
num_states = len(SP)
num_queues = bs.size(0)

# Normalize the state space
NSP = [state.float() / bs for state in SP]

# Convert in data for visualization
data_nsp = np.stack([ns.numpy() for ns in NSP], axis=1)

# Create the plot regions
fig = ps.make_subplots(
    rows=3, cols=1,
    row_heights=[0.3, 0.1, 0.1],
    shared_xaxes=True,
    vertical_spacing=0.1,
    subplot_titles=("States", "Loss (first stage)", "Total cost")
)

# Heatmap for the normalized initial states
heatmap_nsp = go.Heatmap(
    z=data_nsp,
    x=np.arange(num_states),
    y=np.arange(num_queues),
    colorscale='blues',
    showscale=True,
    colorbar=dict(
        title='Queue occupancy',
        titleside='right'
    ),
    zmin=0,
    zmax=1
)
fig.add_trace(heatmap_nsp, row=1, col=1)

# Heatmap for the loss
initial_loss = np.zeros((num_states))
heatmap_loss = go.Heatmap(
    z=initial_loss,
    x=np.arange(num_states),
    y=["Loss"],
    colorscale='reds',
    showscale=False,  
    zmin=0,
    zmax=1
)
fig.add_trace(heatmap_loss, row=2, col=1)

# Heatmap for the total cost
initial_total_cost = np.zeros((num_states))
heatmap_total_cost = go.Heatmap(
    z=initial_total_cost,
    x=np.arange(num_states),
    y=["Cost"],
    colorscale='reds',
    showscale=False,  
    zmin=0,
    zmax=1
)
fig.add_trace(heatmap_total_cost, row=3, col=1)

# Cells borders
fig.update_traces(
    hoverongaps=False,
    xgap=1,  # Set the gap between cells
    ygap=1
)

# Axes
fig.update_layout(
    title='Cost',
    height=500,
    xaxis=dict(
        # title='State',
        tickmode='array',
        tickvals=np.arange(num_states),
        ticktext=[str(i) for i in range(num_states)]
    ),
    xaxis2=dict(
        # title='State',
        tickmode='array',
        tickvals=np.arange(num_states),
        ticktext=[str(i) for i in range(num_states)]
    ),
    xaxis3=dict(
        title='State',
        tickmode='array',
        tickvals=np.arange(num_states),
        ticktext=[str(i+1) for i in range(num_states)]
    ),
    yaxis1=dict(
        title='Queue',
        tickmode='array',
        tickvals=np.arange(num_queues),
        ticktext=[str(i+1) for i in range(num_queues)]
    )
)

# Cursor to select the action
slider_2 = widgets.IntSlider(min=0, max=num_queues, step=1, value=0, description='Action')

# Prepare the dynamic update of the figure
fig_widget = go.FigureWidget(fig)

def update_heatmap2(a):
    # loss_ = np.array([loss(state,a,bs) for state in SP])
    loss_ = np.stack([loss(state,a,bs) for state in SP])
    cost = CM[:,a].numpy()
    with fig_widget.batch_update():
        fig_widget.data[1].z = loss_.reshape(1, -1)  
        fig_widget.data[2].z = cost.reshape(1, -1)  
        fig_widget.data[2].zmin = -np.max(np.abs(cost))
        fig_widget.data[2].zmax = np.max(np.abs(cost))

widgets.interact(update_heatmap2, a=slider_2)

# Plot the widget and the cursor
display(fig_widget, slider_2)

interactive(children=(IntSlider(value=0, description='Action', max=3), Output()), _dom_classes=('widget-intera…

FigureWidget({
    'data': [{'colorbar': {'title': {'side': 'right', 'text': 'Queue occupancy'}},
              'colorscale': [[0.0, 'rgb(247,251,255)'], [0.125,
                             'rgb(222,235,247)'], [0.25, 'rgb(198,219,239)'],
                             [0.375, 'rgb(158,202,225)'], [0.5,
                             'rgb(107,174,214)'], [0.625, 'rgb(66,146,198)'],
                             [0.75, 'rgb(33,113,181)'], [0.875, 'rgb(8,81,156)'],
                             [1.0, 'rgb(8,48,107)']],
              'hoverongaps': False,
              'showscale': True,
              'type': 'heatmap',
              'uid': '90bf164e-a2ef-4f0b-ade6-5e7d94f8635b',
              'x': array([ 0,  1,  2,  3,  4,  5,  6,  7,  8,  9, 10, 11, 12, 13, 14, 15, 16, 17,
                          18, 19, 20, 21, 22, 23]),
              'xaxis': 'x',
              'xgap': 1,
              'y': array([0, 1, 2]),
              'yaxis': 'y',
              'ygap': 1,
              'z': array([

IntSlider(value=0, description='Action', max=3)

# **<a id="load_balancing_resolution">Resolution of the Load Balancing Problem</a>**

Now, let us implement the value iteration algorithm and apply it for the load balancing problem.

## **<a>Value of a Policy</a>**

Objective:
- Use the value iteration algorithm to determine the value of a deterministic policy.

Args:
- transition_dynamics` (torch.Tensor): Transition probability matrix of size (|S|, |A|, |S|).
- reward_matrix (torch.Tensor): Reward matrix of size (|S|, |A|).
- policy (torch.Tensor): Deterministic policy of size (|S|), where policy[s] is the action taken in state s.
- discount_factor (float): Discount factor (0 <= discount_factor < 1).
- precision (float): Maximum bound of the error in norm infinite.

Returns:
- torch.Tensor: Value function V of size (|S|)

In [100]:
def valueP(transition_dynamics:torch.Tensor, cost_matrix:torch.Tensor, policy:torch.Tensor, discount_factor:float=0.9, precision:float=1e-16) -> torch.Tensor:
    num_states = transition_dynamics.size(0)
    error_bound = discount_factor*precision / (1-discount_factor)  # Convergence criterion: $\left\|V_{n+1} - V_n\right\|_\infty \leqslant \epsilon \frac{\gamma}{1-\gamma}$
    step = 0
    error_ = "#NA"
    
    policy = policy.to(int)
    reward_vector = cost_matrix[torch.arange(num_states), policy]  # Reward corresponding to each state according to the policy applied
    transition_matrix = transition_dynamics[torch.arange(num_states), policy]  # Transition probability distribution corresponding to each state
    current_value = reward_vector.clone()  # Initialization of the value function

    while True:
        step += 1
        print(f"Fetching the value of the policy.  Step {step}:  current bound of the error = {error_}", end="\r")
        new_value = reward_vector + discount_factor * torch.sum(transition_matrix * current_value, dim=1)  # Updated version of the value function

        error_ = torch.max(torch.abs(new_value - current_value)).item()  # Maximum difference between both versions of the value functon
        print(" "*130, end="\r")

        if error_ <= error_bound: break
        current_value = new_value.clone()
    return new_value

## **<a>Optimal Solution of an MDP</a>**

Objective:
- Finds the optimal solution of an MDP in the sens of maximization.
- The numbers of states and actions are implicitly given in the dynamics.
- For S states and A actions, the state and action spaces are respectively {0, ..., S} and {0, ..., A}.

Args:
- transition_dynamics (tensor of floats, size (S,A,S)): position [s1,a,s2] represents the probability of transitioning to state s2 when action a
- was taken in state s1.
- reward_matrix (tensor of floats, size (S,A)): position [s,a] represents the reward induced by action a in state s.
- discount_factor (float, optional). Defaults to 0.9.
- precision (float, optional). Defaults to 1e-16.

Returns:
- A policy (tensor int, size S).
- Avalue function (tensor of floats, size S).
- An upper bound of the difference between the returned and the optimal value functions.
- The duration the computation.

In [101]:
def solutionMDP(transition_dynamics:torch.Tensor, cost_matrix:torch.Tensor, discount_factor:float=0.9, precision:float=1e-16) -> list:
    start_time = time.time()
    number_states = cost_matrix.size(0)
    cost_matrix = cost_matrix.to(float)
    error_bound = discount_factor*precision / (1-discount_factor)  # Convergence criterion: $\left\|V_{n+1} - V_n\right\|_\infty \leqslant \epsilon \frac{\gamma}{1-\gamma}$
    
    current_value = torch.min(cost_matrix,dim=1)[0]  # Initialization of the value function as to correspond to the one-shot optimal policy
    
    # Iterations
    new_value = torch.zeros([number_states], dtype=float)
    policy = torch.zeros([number_states], dtype=int)
    error_ = "not evaluated"
    step = 0
    while True:
        step += 1
        for s in range(number_states):
            print(f"Solving a small max MDP.  Step {step}:  current bound of the error = {error_}, current state = {s}", end="\r")
            Q = cost_matrix[s] + discount_factor*torch.matmul(transition_dynamics[s],current_value)
            new_value[s] = torch.min(Q)
            policy[s] = torch.argmin(Q)
        error_ = torch.max(torch.abs(new_value - current_value)).item()  # Maximum difference between both versions of the value functon
        if error_ <= error_bound: break
        current_value = new_value.clone()
    end_time = time.time()
    duration = end_time - start_time
    print(" "*170, end="\r")
    
    return policy, new_value, error_bound, duration

## **<a>Optimal Solution of the Load Balancing Problem</a>**

Some variables use in the following cell are still used in the following other cells:
- `sr`, `bs` and `SP` are respectively the tensor of service rates, the tensor of buffer sizes and the list of states
- `TD` and `TD_time` are the tensor of transition dynamics and the time taken to get it
- `CM` `CM_time` are the reward matrix and the time taken to get it
- `GP`, `GV`, `B` and `GRT` are the optimal policy, the optimal value function of the load BALANCING PROBLEM, the bound of the error and the time to get this optimal solution.

In [102]:
# Trying on the load balancing problem

print("Resolution of a load balancing problem\n\n")

# Get the parameters
sr = torch.rand((3))  # Service rates
bs = torch.tensor([2,3,4])  # Buffers size
SP = state_space_construction(buffers_size=bs)[0]
print("Parameters:")
print(f"Service rates = {sr.numpy()}")
print(f"Buffers size =  {bs.numpy()}")
print("\n")

Resolution of a load balancing problem


Parameters:
Service rates = [0.76737386 0.21253014 0.356205  ]
Buffers size =  [2 3 4]




In [103]:
# Solve the problem
TD, TD_time = transition_dynamics(state_space=SP, buffers_size=bs, service_rates=sr)
CM, CM_time = cost_matrix(state_space=SP, buffers_size=bs, service_rates=sr)
GP,GV,B,GRT = solutionMDP(transition_dynamics=TD, cost_matrix=CM)

                                                                                                                                                                          

In [104]:
# Print the solution
print("Solution found")
for s_idx in range(len(SP)):
    s, a, v = SP[s_idx].numpy(), GP[s_idx].item(), GV[s_idx].item()
    if s_idx % 4 == 0: print()
    else: print("   ", end="")
    print(f"s = {s}  -->  a = {a}, v = {v:<4.1f}    | ", end="")
print("\n\n")
print("Elapse time in seconds:\n")
print(f"Transition dynamics: {TD_time:>6.3f}")
print(f"Cost matrix:         {CM_time:>6.3f}")
print(f"MDP resolution:      {GRT:>6.3f}")
print("-----------------------------------")
print(f"Total:               {GRT+TD_time+CM_time:>6.3f}")

Solution found

s = [0 0 0]  -->  a = 1, v = 3.1     |    s = [0 0 1]  -->  a = 1, v = 4.6     |    s = [0 0 2]  -->  a = 1, v = 6.2     |    s = [0 0 3]  -->  a = 1, v = 7.7     | 
s = [0 0 4]  -->  a = 1, v = 9.2     |    s = [0 1 0]  -->  a = 1, v = 5.8     |    s = [0 1 1]  -->  a = 1, v = 7.3     |    s = [0 1 2]  -->  a = 1, v = 8.9     | 
s = [0 1 3]  -->  a = 1, v = 10.4    |    s = [0 1 4]  -->  a = 1, v = 11.9    |    s = [0 2 0]  -->  a = 1, v = 8.5     |    s = [0 2 1]  -->  a = 1, v = 10.0    | 
s = [0 2 2]  -->  a = 1, v = 11.6    |    s = [0 2 3]  -->  a = 1, v = 13.1    |    s = [0 2 4]  -->  a = 1, v = 14.6    |    s = [0 3 0]  -->  a = 1, v = 11.2    | 
s = [0 3 1]  -->  a = 1, v = 12.7    |    s = [0 3 2]  -->  a = 1, v = 14.3    |    s = [0 3 3]  -->  a = 1, v = 15.8    |    s = [0 3 4]  -->  a = 1, v = 17.3    | 
s = [1 0 0]  -->  a = 1, v = 3.5     |    s = [1 0 1]  -->  a = 1, v = 5.0     |    s = [1 0 2]  -->  a = 1, v = 6.5     |    s = [1 0 3]  -->  a = 1, v =

# **<a id="load_balancing_dimension_reduction">Dimension Reduction for the Load Balancing Problem</a>**

- Now, examine the resolution of the same load balancing problem through state abstraction. 
- The clustering method is based on optimal action: i.e., 
    $$
        \phi\left(s\right) = \phi\left(s'\right)
        \qquad\Longleftrightarrow\qquad
        \displaystyle \argmax_{a\in\mathcal{A}} R_s^a = \argmax_{a\in\mathcal{A}} R_{s'}^a
    $$ 
    That is, two states are in the same class if they admit the same same one-shot optimal action. 
    The clustering is performed by the `cost_based_clustering()` function.
- Weighting model: **uniform distribution** in each class.

## **<a>Clustering Based on Cost</a>**

### The `cost_based_clustering()` Function

Objective:
- Bulids classes on the basis of optimal actions from the cost matrix. This may not be the optimal policy of the process.

Args:
- cost_matrix (tensor): the cost matrix. cost_matrix[s,a] is the cost for taking action a in state s.

Returns:
- The number of classes: nb_classes.
- The list of states indices in each class: states_in_classes.
- The class index corresponding to each states: classes.

In [105]:
def cost_based_clustering(cost_matrix:torch.Tensor) -> torch.Tensor:
    nb_states = cost_matrix.size(0)
    where_optimal_cost = torch.argmin(cost_matrix,dim=1)  # Optimal action for each state
    possible_optimal_actions = where_optimal_cost.unique()  # Set of optimal actions
    nb_classes = possible_optimal_actions.size(0)  # Number of actions that are optimal for at least one state
    states_in_classes = [torch.where(where_optimal_cost==possible_optimal_actions[op_a_idx])[0] for op_a_idx in range(nb_classes)]  # States for which each action is optimal
    classes = [(possible_optimal_actions==where_optimal_cost[s_idx]).nonzero(as_tuple=False).item() for s_idx in range(nb_states)]  # Actions that are optimal for each state
    return nb_classes, states_in_classes, classes

### Cost Based Clustering for the Load Balancing Problem

Here we define variables that will be used thoughout the notebook:
- `NC` is the number of classes
- `SIC` is the list of state indexes per class (`SIC[c_*idx]` is the list of indexes of the states that are in class of index `c_idx`)
- `C`: is the list that gives the class index for each state (`C[s_idx]` is the class index of the state of index `s_idx`)

In [106]:
print("Classification for the above problem\n\n")
NC, SIC, C = cost_based_clustering(CM)

print(f"Number of classes: {NC}\n")

print("Indices of the states in each class:\n")
for c_idx in range(NC): print(f"c_{c_idx} = {SIC[c_idx].numpy()}")

print("\nClass index for each state")
for s_idx in range(len(SP)):
    s = SP[s_idx].numpy()
    if s_idx % 6 == 0: print()
    else: print("   ", end="")
    print(f"{s_idx:>3}: {s} --> c_{C[s_idx]}    | ", end="")

Classification for the above problem


Number of classes: 4

Indices of the states in each class:

c_0 = [59]
c_1 = [ 0  1  2  3  4  5  6  7  8  9 10 11 12 13 14 15 16 17 18 19 20 21 22 23
 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39]
c_2 = [44 49 54]
c_3 = [40 41 42 43 45 46 47 48 50 51 52 53 55 56 57 58]

Class index for each state

  0: [0 0 0] --> c_1    |      1: [0 0 1] --> c_1    |      2: [0 0 2] --> c_1    |      3: [0 0 3] --> c_1    |      4: [0 0 4] --> c_1    |      5: [0 1 0] --> c_1    | 
  6: [0 1 1] --> c_1    |      7: [0 1 2] --> c_1    |      8: [0 1 3] --> c_1    |      9: [0 1 4] --> c_1    |     10: [0 2 0] --> c_1    |     11: [0 2 1] --> c_1    | 
 12: [0 2 2] --> c_1    |     13: [0 2 3] --> c_1    |     14: [0 2 4] --> c_1    |     15: [0 3 0] --> c_1    |     16: [0 3 1] --> c_1    |     17: [0 3 2] --> c_1    | 
 18: [0 3 3] --> c_1    |     19: [0 3 4] --> c_1    |     20: [1 0 0] --> c_1    |     21: [1 0 1] --> c_1    |     22: [1 0 2] --> c_1    |  

## **<a>Abstract dynamics</a>**

Once the clustering done, the we infer dynamics on the cluster from the dynamics of the ground problem.
This is performed by the `abstract_dynamics()` function.

### The `abstract_dynamics()` Function

Objective:
- Bulids the transition dynamics and the reward matrix of the abstract MDP for a load balancing problem..

Args:
- nb_actions (int): number of actions (= number of queues + 1)
- nb_classes (int): number of classes
- states_in_classes (list): list of state indexes per class
- state_space (list): list of states
- buffers_size (tensor)
- service_rates (tensor)
- cost_loss (float)
- cost_delay (float)

Returns:
- The tensor of abstract transition dynamics
- The tensor of abstract cost matrix
- The time taken to build it

In [107]:
def abstract_dynamics(nb_actions:int, nb_classes:int, 
                      states_in_classes:list, state_space:list, 
                      buffers_size:torch.Tensor, service_rates:torch.Tensor, cost_loss, cost_delay) -> torch.Tensor:
    time1 = time.time()
    ATD = torch.zeros([nb_classes,nb_actions,nb_classes],dtype=float); ACM = torch.zeros([nb_classes,nb_actions],dtype=float)
    nb_steps = nb_classes*nb_classes
    step = 0
    for c_idx1, a in itertools.product(range(nb_classes),range(nb_actions)):
        step += 1
        print(f"Fetching the abstract dynamics   .....   step {step}/{nb_steps} ({step/nb_steps:.2%})", end="\r")
        cost = 0
        for s_idx1 in states_in_classes[c_idx1]:
            s = state_space[s_idx1]
            cost += cost_function(state=s, action=a, buffers_size=buffers_size, service_rates=service_rates, cost_loss=cost_loss, cost_delay=cost_delay)
        ACM[c_idx1,a] = cost / states_in_classes[c_idx1].size(0)
        for c_idx2 in range(nb_classes):
            proba = 0
            for s_idx1, s_idx2 in itertools.product(states_in_classes[c_idx1],states_in_classes[c_idx2]):
                proba += transition_function(state1=state_space[s_idx1], action=a, state2=state_space[s_idx2], buffers_size=buffers_size, service_rates=service_rates)
            ATD[c_idx1,a,c_idx2] = proba / states_in_classes[c_idx1].size(0)
    time2 = time.time()
    print(" "*170, end="\r")
    return ATD, ACM, time2-time1

### Getting the abstract dynamics (illustration)

For the current exemple, let us get the abstract transition dynamics, tensor `ATD`, and the abstract cost matrix, `ACM`.

In [108]:
print("Abstract dynamics for the above problem:\n")
nb_actions = bs.size(0)+1
ATD, ACM, AD_time = abstract_dynamics(
    nb_actions=nb_actions, nb_classes=NC, states_in_classes=SIC, state_space=SP, buffers_size=bs, service_rates=sr, cost_loss=1, cost_delay=1
)

print("Abstract transition dynamics:"); print(ATD); print()
print("Abstract cost matrix:"); print(ACM); print()
print(f"Elapse time: {AD_time}")

Abstract dynamics for the above problem:

Abstract transition dynamics:                                                                                                                                             
tensor([[[4.5395e-03, 9.4589e-01, 4.7567e-03, 4.4819e-02],
         [4.5395e-03, 9.4589e-01, 4.7567e-03, 4.4819e-02],
         [4.5395e-03, 9.4589e-01, 4.7567e-03, 4.4819e-02],
         [4.5395e-03, 9.4589e-01, 4.7567e-03, 4.4819e-02]],

        [[0.0000e+00, 1.0000e+00, 0.0000e+00, 0.0000e+00],
         [1.1349e-04, 9.7294e-01, 8.1614e-04, 2.6128e-02],
         [0.0000e+00, 1.0000e+00, 0.0000e+00, 0.0000e+00],
         [0.0000e+00, 1.0000e+00, 0.0000e+00, 0.0000e+00]],

        [[0.0000e+00, 9.4588e-01, 9.2962e-03, 4.4819e-02],
         [0.0000e+00, 9.4588e-01, 9.2962e-03, 4.4819e-02],
         [1.5132e-03, 9.4588e-01, 7.7831e-03, 4.4819e-02],
         [0.0000e+00, 9.4588e-01, 9.2962e-03, 4.4819e-02]],

        [[0.0000e+00, 9.4589e-01, 0.0000e+00, 5.4115e-02],
         [0.00

## **<a>Solving through the Abstraction</a>**

### Resolution of the abstract problem

We save the abstract problem given by the dynamics `ATD` and `ACM` and the time `ART` it takes.

In [109]:
AP, AV, _, ART = solutionMDP(ATD,ACM)

print("Solution of the abstract problem:\n")
for c_idx in range(NC):
    c, a = SIC[c_idx].numpy(), AP[c_idx].item()
    print(f"c_{c_idx}  -->  a = {a}\n{c}\n")
print("\n")
print(f"Elapse time in seconds: {ART:>6.3f}")

Solution of the abstract problem:                                                                                                                                         

c_0  -->  a = 0
[59]

c_1  -->  a = 1
[ 0  1  2  3  4  5  6  7  8  9 10 11 12 13 14 15 16 17 18 19 20 21 22 23
 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39]

c_2  -->  a = 2
[44 49 54]

c_3  -->  a = 3
[40 41 42 43 45 46 47 48 50 51 52 53 55 56 57 58]



Elapse time in seconds:  0.051


### Evaluation of the abstraction

Now, we evaluate the abstraction from the perspectives of: the duration of the resolution, the policy and the function. Run the following four cells.

In the one nex to this one, `AP` and `EP` are respectively the optimal policy of the abstract MDP and its constant extrapolation.

In [110]:
print("EVALUATION OF THE ABSTRACTION\n\n")
time1 = time.time()
EP = AP[C]
time2 = time.time()
Ext_time = time2 - time1

EVALUATION OF THE ABSTRACTION




In [111]:
# Elapse time
print("1. Duration of the resolution\n")

print(f"Cost matrix:         {CM_time:>6.3f} sec\n")

print("Direct resolution (cost matrix is used for both direct and undirect resolutions)")
print(f"Transition dynamics: {TD_time:>6.3f} sec")
print(f"MDP resolution:      {GRT:>6.3f} sec")
direct_time = GRT+TD_time
print("-----------------------------------")
print(f"Total Direct:        {direct_time:>6.3f} sec\n")

print("Resolution through abstracton")
print(f"Abstraction:         {AD_time:>6.3f} sec")
print(f"Resolution:          {ART:>6.3f} sec")
print(f"Extrapolation:       {Ext_time:>6.3f} sec")
undirect_time = AD_time+ART+Ext_time
print("-----------------------------------")
print(f"Total Undirect:      {undirect_time:>6.3f} sec\n")

saved_time = direct_time - undirect_time
print(f"Time saved:          {saved_time:>6.3f} sec,  i.e.,  {saved_time/direct_time:.2%} of the direct time\n\n")

1. Duration of the resolution

Cost matrix:          0.009 sec

Direct resolution (cost matrix is used for both direct and undirect resolutions)
Transition dynamics:  1.311 sec
MDP resolution:       0.628 sec
-----------------------------------
Total Direct:         1.939 sec

Resolution through abstracton
Abstraction:          1.189 sec
Resolution:           0.051 sec
Extrapolation:        0.000 sec
-----------------------------------
Total Undirect:       1.240 sec

Time saved:           0.699 sec,  i.e.,  36.05% of the direct time




In [112]:
# Difference between the policies
print("2. Difference between the policies\n")
print("Optimal policy"); print(GP); print()
EP = AP[C]
print("Extrapolation of the abstract solution"); print(EP); print()

DeltaP = GP - EP
s_idxs_of_coincidence = torch.where(DeltaP==0)[0]; s_idxs_of_difference = torch.where(DeltaP!=0)[0]
coincidence_ration = s_idxs_of_coincidence.size(0)/DeltaP.size(0)
states_of_difference = [SP[s_idx].tolist() for s_idx in s_idxs_of_difference]
print(f"Both policies coincide at {coincidence_ration:.2%}")

print(f"They differ at states of indices {s_idxs_of_difference.numpy()}, which are")
print(states_of_difference); print("\n")

2. Difference between the policies

Optimal policy
tensor([1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
        1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 3, 3, 3, 3, 0, 3, 3, 3,
        3, 0, 3, 3, 3, 3, 0, 3, 3, 3, 3, 0])

Extrapolation of the abstract solution
tensor([1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
        1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 3, 3, 3, 3, 2, 3, 3, 3,
        3, 2, 3, 3, 3, 3, 2, 3, 3, 3, 3, 0])

Both policies coincide at 95.00%
They differ at states of indices [44 49 54], which are
[[2, 0, 4], [2, 1, 4], [2, 2, 4]]




In [113]:
# Difference between the value functions
print("3. Difference between the value functions\n")
EV = valueP(transition_dynamics=TD, cost_matrix=CM, policy=EP)
print("Optimal value function"); print(GV); print()
print("Extrapolated value function"); print(EV); print()

max_difference = torch.max(torch.abs(EV - GV))
print(f"Maximum difference = {max_difference}\n")

relative_differences = torch.zeros([EV.size(0)],dtype=float)
for s_idx in range(EV.size(0)):
    if GV[s_idx]==0: relative_differences[s_idx] = (0 if EV[s_idx]==0 else math.inf)
    else: relative_differences[s_idx] = (EV[s_idx] - GV[s_idx]) / GV[s_idx]
relative_differences = torch.abs(relative_differences)
print("Relative differences in absolute values"); print(relative_differences); print()

avrg_relative_difference, max_relative_difference = relative_differences.mean().item(), relative_differences.max().item()
print(f"About the relative difference:  mean = {avrg_relative_difference:.2%};  max = {max_relative_difference:.2%}\n")

critical_ceil = 0.1
print(f"Considering the crital ceil of relative difference is {critical_ceil:.2%}:")
s_idxs_of_critical_difference = torch.where(relative_differences>critical_ceil)[0]
safe_s_idxs = torch.where(relative_differences<=critical_ceil)[0]
print(f"Number of safe states:  {safe_s_idxs.size(0)}  ({safe_s_idxs.size(0)/len(SP):.2%})")
print(f"Unsafe states indices: {s_idxs_of_critical_difference.numpy()}.   The corresponding states are")
states_of_critical_difference = [SP[s_idx].tolist() for s_idx in s_idxs_of_critical_difference]
print(states_of_critical_difference)

3. Difference between the value functions

Optimal value function                                                                                                            
tensor([ 3.0925,  4.6232,  6.1539,  7.6846,  9.2155,  5.7960,  7.3267,  8.8574,
        10.3881, 11.9190,  8.4995, 10.0302, 11.5609, 13.0916, 14.6225, 11.2030,
        12.7337, 14.2644, 15.7952, 17.3260,  3.4585,  4.9892,  6.5199,  8.0507,
         9.5856,  6.1620,  7.6927,  9.2234, 10.7542, 12.2891,  8.8655, 10.3962,
        11.9269, 13.4577, 14.9926, 11.5690, 13.0997, 14.6304, 16.1612, 17.6961,
         4.9892,  6.5199,  8.0507,  9.5856, 11.5856,  7.6927,  9.2234, 10.7542,
        12.2891, 14.2891, 10.3962, 11.9269, 13.4577, 14.9926, 16.9926, 13.0997,
        14.6304, 16.1612, 17.6961, 19.6961], dtype=torch.float64)

Extrapolated value function
tensor([ 3.0925,  4.6232,  6.1539,  7.6846,  9.2158,  5.7960,  7.3267,  8.8574,
        10.3881, 11.9193,  8.4995, 10.0302, 11.5609, 13.0917, 14.6228, 11.2030,
        12.