## Environment 1: Racetrack

Consider driving a race car around a turn. You want
to go as fast as possible, but not so fast as to run off the track. 

The racetrack is a grid
– it begins with a vertical section 10 cells wide and 30 cells high, followed by a right turn
into a horizontal section 15 cells wide and 10 cells tall. 

The starting line is the row of cells
at the bottom of the first section. 

The finish line is the column of cells at the right of the
horizontal section. 

The car begins at any cell on the starting line and each turn it is at one
of the grid positions. 

The velocity is also discrete, a number of grid cells moved horizontally
and vertically per time step. 

The actions are increments to the velocity components. 

Each
may be changed by +1, −1, or 0 in each step, for a total of nine (3 ×3) actions. 

The
vertical velocity component is restricted to be nonnegative. 

Both velocity components are
restricted to be less than 5, and they cannot both be zero except at the starting line. 

Each
episode begins in one of the randomly selected start states with both velocity components
zero and ends when the car crosses the finish line. 

If the car attempts to go through the
track boundary (anywhere by the finish line) it crashes and the episode ends.

In [None]:
import random


class RacetrackEnv:
    def __init__(self, config):
        self.track_width = config["width"]

        self.x_max = self.track_width + config["turn"]
        self.y_max = self.track_width + config["straight"]

        self.x_inner = range(self.track_width, self.x_max)
        self.y_inner = range(0, config["straight"])

        self.action_space = [(dx, dy) for dx in [-1, 0, 1] for dy in [-1, 0, 1]]

        self.reset()

    def reset(self, seed=None):
        self.velocity = (0, 0)
        self.position = (random.randint(0, self.track_width - 1), 0)

    def __check_bounds(self, position) -> bool:
        return (
            0 <= position[0] < self.x_max
            and 0 <= position[1] < self.y_max
            and not (position[0] in self.x_inner and position[1] in self.y_inner)
        )

    def step(self, action):
        if 0 > action or action >= len(self.action_space):
            raise ValueError("Invalid action")
        # get acceleration from action
        acceleration = self.action_space[action]
        # apply acceleration with bounds
        self.velocity = (
            max(1, min(self.velocity[0] + acceleration[0], 5)),
            min(self.velocity[1] + acceleration[1], 5),
        )

        # update position
        new_position = (
            self.position[0] + self.velocity[1],
            self.position[1] + self.velocity[0],
        )

        if not self.__check_bounds(new_position):
            # reset if out of bounds
            # self.reset()
            return {"state": [self.position, self.velocity], "r": -1, "done": True}

        self.position = new_position
        done = self.position[1] >= self.y_max - 1
        reward = 5 if done else 1
        return {"state": [self.position, self.velocity], "r": reward, "done": done}

    def render(self, past_positions=None):
        # generated using copilot auto-complete
        track = [["."] * self.x_max for _ in range(self.y_max)]
        for x in self.x_inner:
            for y in self.y_inner:
                track[y][x] = "#"
        if past_positions:
            for pos in past_positions:
                track[pos[1]][pos[0]] = "*"
        track[self.position[1]][self.position[0]] = "X"
        print("\n".join("".join(row) for row in reversed(track)))

In [None]:
config = {
    "width": 10,
    "straight": 30,
    "turn": 15,
}

env = RacetrackEnv(config)
table = {}
for episode in range(10):
    state = env.reset()
    done = False
    past_positions = []
    while not done:
        action = random.randint(0, len(env.action_space) - 1)
        result = env.step(action)
        state, reward, done = result["state"], result["r"], result["done"]
        past_positions.append(env.position)
    print(f"Episode {episode}")
    env.render(past_positions)
    print()