---
## Step Response Tracking
Next, we do the same, but change the setpoint after a few seconds.

In [None]:
setpoints = np.zeros(EPISODE_LENGTH)
setpoints[:] = 23.0 # °C
setpoints[int(EPISODE_LENGTH/2):] = 40.0 # °C

plant_control = PlantControl(IS_HARDWARE, SAMPLE_RATE)
stepped_setpoint_results = plant_control.episode(setpoints, (50.0, 0.001, 0.1))

stepped_setpoint_results.sample(10).sort_values(COL_TIME)


In [None]:
plot_episode(stepped_setpoint_results)


---
## Add Stability Preserving Supervisor
The next step is to add the supervisor to the system. This supervisor acts as proposed in [Stability-preserving automatic tuning of PID control with reinforcement learning](https://comengsys.com/article/view/4601) and replaced the PID parameters with fallback ones if the system appears to be unstable.

From reading the paper, it seems that the proposed algorithm uses the accumulated error as the inverse reward. Also, the proposed supervisor algorithm compares the _running_ squared error $RR(t)$ in an episode with the _total_ squared error $R_{bmk}$. We expected both to be either total or running, but one and the other. In the code below, we follow the same pattern: comparing a running error with a total-for-an-episode benchmark.

In [Stability-preserving automatic tuning of PID control with reinforcement learning](https://comengsys.com/article/view/4601), they base this decision on the cumulative error for an episode, but that is useful only for episodes where the setpoint $r(t)$ does not change. Changing setpoints would punish the algorithm for something it has no control over. The article solves this by assuming the setpoint does not change.

<center><img alt="state diagram" src="state-diagram.png" /></center>

An alternative might have been to have the baseline controller run alongside the operational controller and have the supervisor switch between the two. The problem with that is that the supervisor cannot determine $y(t)$ for the stable controller, because it's $u(t)$ is not passed through the plant.

In [None]:
class SupervisedPlantControl:
    def __init__(self, is_hardware, sample_rate, benchmark_error, lamba_error, known_good_pid_tunings):
        TCLab = tclab.setup(connected=is_hardware)
        self.plant = TCLab()

        self.pid = PID()
        self.pid.sample_time = 1.0 / sample_rate
        self.sample_rate = sample_rate

        self.lambda_benchmark_error = lamba_error * benchmark_error
        self.known_good_pid_tunings = known_good_pid_tunings

    def _step(self, t, r_t, y_t_prev, episode_state):
        self.pid.setpoint = r_t
        u_t_uncapped = self.pid(y_t_prev)

        u_t = u_t_uncapped
        if u_t < 0.0:
            u_t = 0.0
        if u_t > 100.0:
            u_t = 100.0

        self.plant.U1 = u_t
        y_t  = self.plant.T1
        y2_t = self.plant.T2

        return [t, r_t,
                self.pid.tunings[0], self.pid.tunings[1], self.pid.tunings[2],
                self.pid._proportional, self.pid._integral, self.pid._derivative,
                self.pid._last_error, u_t_uncapped, u_t, y_t,
                0.0, y2_t,
                episode_state], y_t

    def episode(self, setpoints, pid_tunings):
        results = pd.DataFrame(columns=EPISODE_COLUMNS)

        episode_state = STATE_NORMAL
        self.pid.tunings = pid_tunings
        y_t_prev = self.plant.T1
        for t in range(len(setpoints)):
            time.sleep(1.0 / self.sample_rate)
    
            step_data, y_t_prev = self._step(t / self.sample_rate, setpoints[t], y_t_prev, episode_state)
            results.loc[len(results)] = step_data

            # in fallback state we just sit the episode out
            if episode_state != STATE_FALLBACK and \
                    (results[COL_ERROR]**2).sum() > self.lambda_benchmark_error:
                print(f"at {t}: running error {(results[COL_ERROR]**2).sum():.1f} above lamda*benchmark error {self.lambda_benchmark_error:.1f}, switched to fallback parameters")
                episode_state = STATE_FALLBACK
                results.loc[len(results)-1][COL_STATE] = STATE_FALLBACK
                self.pid.tunings = self.known_good_pid_tunings
                self.pid.reset()

        return results


In [None]:
setpoints = np.zeros(EPISODE_LENGTH)
setpoints[:] = 23.0 # °C

plant_control = SupervisedPlantControl(IS_HARDWARE, SAMPLE_RATE, 1200.0, 1.0, (20.0, 0.1, 0.01))
supervised_smooth_results = plant_control.episode(setpoints, (50.0, 0.001, 0.1))

supervised_smooth_results.sample(10).sort_values(COL_TIME)
