# Sarsa
$$Q(x_t, u_t) \leftarrow Q(x_t, u_t) + \alpha [R_{t+1} + \gamma Q(x_{t+1}, u_{t+1}) - Q(x_t, u_t)]$$
![](https://miro.medium.com/max/1300/1*mHNdrdmeMe_EUVALDTr3aw.png)

## Q-learning vs Sarsa
On a first sight Sarsa and Q-learning looks almost identical. However, there is subtle and fundamental difference. While Sarsa is on on-policy algorithm, Q-learning is off-policy. Very nice _high level_ explanation given by _Don Reba_ on [stackoverflow](https://stackoverflow.com/a/6852935):  

**Q-learning:**  
$Q(x_{t+1}, a_{t+1}) = max_aQ(x_{t+1}, a)$  
**Sarsa:**  
$Q(x_{t+1}, a_{t+1}) = \epsilon\ mean_aQ(x_{t+1}, a) + (1 - \epsilon)max_aQ(x_{t+1}, a)$  

In [1]:
import os
import sys
import numpy as np

module_path = os.path.abspath(os.path.join(".."))
if module_path not in sys.path:
    sys.path.append(module_path)

from network_control_rl.rl import Sarsa
from network_control_rl.algebra import BaseNumber
from network_control_rl.network import Network, calculate_next_state_base_number

In [2]:
network = Network()
network.from_edges([(0, 1), (1, 2), (2, 3)])

input_matrix = {0: 0}
q = 4
n = network.nodes

initial_state = BaseNumber(n, q)
initial_state.from_array(np.array([1, 2, 3, 1]))
end_state = BaseNumber(n, q)
end_state.from_array(np.array([1, 3, 2, 1]))

model = Sarsa(
    initial_state,
    end_state,
    network,
    input_matrix,
    num_episodes=50,
    max_iteration=10
)
model.train(seed=6)
model.get_signals(vector=True)

Training: |--------------------------------------------------| 0.0% CompleteProgress: |--------------------------------------------------| 0.0% CompleteProgress: |█-------------------------------------------------| 2.0% CompleteProgress: |██------------------------------------------------| 4.0% CompleteProgress: |███-----------------------------------------------| 6.0% CompleteProgress: |████----------------------------------------------| 8.0% CompleteProgress: |█████---------------------------------------------| 10.0% CompleteProgress: |██████--------------------------------------------| 12.0% CompleteProgress: |███████-------------------------------------------| 14.0% CompleteProgress: |████████------------------------------------------| 16.0% CompleteProgress: |█████████-----------------------------------------| 18.0% CompleteProgress: |██████████----------------------------------------| 20.0% CompleteProgress: |███████████-----------------------------------

array([[2],
       [3],
       [1]], dtype=int8)

## n-step Sarsa
$$G_{t:t+n\_step} = R_{t+1} + \gamma R_{t+2} + \dots + \gamma^{n\_step - 1}R_{t+n\_step} + \gamma^{n\_step}Q_{t + n\_step - 1}(x_{t + n\_step}, u_{t + n\_step})$$
$$Q_{t + n\_step}(x_t, u_t) = Q_{t + n\_step - 1}(x_t, u_t) + \alpha [G_{t:t+n\_step} - Q_{t + n\_step - 1}(x_t, u_t)]$$

In [3]:
model_2_step = Sarsa(
    initial_state,
    end_state,
    network,
    input_matrix,
    num_episodes=50,
    max_iteration=10,
    n_steps=2
)
model_2_step.train(seed=6)
model_2_step.get_signals(vector=True)

Training: |--------------------------------------------------| 0.0% CompleteProgress: |--------------------------------------------------| 0.0% CompleteProgress: |█-------------------------------------------------| 2.0% CompleteProgress: |██------------------------------------------------| 4.0% CompleteProgress: |███-----------------------------------------------| 6.0% CompleteProgress: |████----------------------------------------------| 8.0% CompleteProgress: |█████---------------------------------------------| 10.0% CompleteProgress: |██████--------------------------------------------| 12.0% CompleteProgress: |███████-------------------------------------------| 14.0% CompleteProgress: |████████------------------------------------------| 16.0% CompleteProgress: |█████████-----------------------------------------| 18.0% CompleteProgress: |██████████----------------------------------------| 20.0% CompleteProgress: |███████████-----------------------------------

array([[2],
       [3],
       [1]], dtype=int8)

## References
[1] Stackoverflow,"_What is the difference between Q-learning and SARSA?_", https://stackoverflow.com/questions/6848828/what-is-the-difference-between-q-learning-and-sarsa 