In [2]:
import numpy as np
import matplotlib.pyplot as plt

## Windy Gridworld

**Problem**: re-solve the windy gridworld assuming eight possible actions, including the diagonal moves, rather than four. How much better can you do with the extra actions? Can you do even better by including a ninth action that causes no movement at all other than that caused by the wind?

First step would be to implement the base case and see if the result is similar to what is shown in the textbook

### Example 6.5 Windy Gridworld

Reach the goal from starting point. **Undiscounted** task with constant -1 rewards per time step until goal is reached.

Action space: [`up`, `down`, `left`,`right`]

State space: gridworld

In the middle region there is an upwind wind that causes next states to be shifted upwards. The strength varies from column to column.

Implement e-greedy SARSA. Consider $\epsilon = 0.1$ and $\alpha = 0.5$ and initial action-state values at 0. 

In [None]:
class RaceTrackEnv:
    def __init__(self, width, height, lanewidth):
        self.rows = height
        self.columns = width
        self.lanewidth = lanewidth
        self.racetrack = create_racetrack(height, width, lanewidth)
        self.reset()

    def _set_initial_state(self):
        ...

    def check_next_condition(self):
        ...

    def _update_last_state(self, row_acc, col_acc):
        new_row_vel = self.last_state[2] + row_acc
        new_col_vel = self.last_state[3] + col_acc
        new_row = self.last_state[0] + new_row_vel
        new_col = self.last_state[1] + new_col_vel
        self.last_state = [new_row, new_col, new_row_vel, new_col_vel]

    def reset(self):
        self.last_state = self._set_initial_state()
        return self.last_state

    def step(self, action) -> tuple[tuple, int, bool]:
        """
        return state, reward, done
        """
        new_row_vel = self.last_state[2] + action[0]
        new_col_vel = self.last_state[3] + action[1]
        next_condition = check_next_condition(
            self.last_state[0],
            self.last_state[1],
            self.last_state[2],
            self.last_state[3],
            self.racetrack
        )
        if next_condition == PossibleOutcomes.crash:
            self.reset()
            return self.last_state, -1, False
        if next_condition == PossibleOutcomes.finish:
            return self.last_state, 0, True
        self._update_last_state(*action)
        return self.last_state, -1, False