**The recycling robot**

In this exercise, we are going to find the Optimal Policy, using the Bellman equations, for a mobile robot has the job of collecting empty cans in an office environment. The environment state is the robot's battery level, which could be in two states: low or high. The robot has to decide whether

1. Actively search for cans,
2. Be quiet and wait for someone to bring it a can or
3. Move to its home base to recharge its battery.

In order to find cans, the robot has to search them. But searching consumes the robot's battery, whereas waiting does not.

The rewards can be zero if battery level is high and no can has been found (likely most of the time), positive when the robot secures an empty can and the battery remains high, or large and negative if the robot runs out of battery (since, in this case, the robot shuts down and waits to be rescued, causing a big cost).

More formally, the state set is $\mathcal{S}=\left\{ {\rm high},{\rm \;low}\right\}$ , the action set is $A=\left\{ {\rm search,}\;{\rm wait,}\;{\rm recharge}\right\}$  and the state transition probabilities are parameterized by parameters $\alpha,\beta$, as described in next Table

![alt text](prob_robot.png "Title")

Assuming that $\alpha=0.3$, $\beta=0.6$, $\gamma=0.9$ and $r_{search}=1$, $r_{wait}=0.1$, and $r_{recharge}=0$, find the numerical solution using the Bellman equations.

Let us start with the imports. We use numpy and scipy.

In [1]:
import numpy as np
from scipy.optimize import fsolve

Now, let us define the parameters of the problem, as stated in the exercise.

In [2]:
alpha = 0.3
beta = 0.6
gamma = 0.9
r_search = 1
r_wait = 0.1
r_recharge = 0

Now, obtain the optimal value function using the fsolve method from scipy.optimize. This method finds the root of a function, which is exactly what we need.

In [3]:
def bellman_opt_equations(v):
    # Code to be filled by the student

v = fsolve(bellman_opt_equations, np.zeros((2, 1)))
with np.printoptions(precision=2, suppress=True):
    print(f"Optimal value found v* = {v.flatten()}")

Optimal value found v* = [6.13 5.52]
