## Implement a Basic Driving Agent
To begin, your only task is to get the **smartcab** to move around in the environment. At this point, you will not be concerned with any sort of optimal driving policy. Note that the driving agent is given the following information at each intersection:

- The next waypoint location relative to its current location and heading.
- The state of the traffic light at the intersection and the presence of oncoming vehicles from other directions.
- The current time left from the allotted deadline.

To complete this task, simply have your driving agent choose a random action from the set of possible actions (`None, 'forward', 'left', 'right'`) at each intersection, disregarding the input information above. Set the simulation deadline enforcement, `enforce_deadline` to `False` and observe how it performs.

### Question
Observe what you see with the agent's behavior as it takes random actions. Does the **smartcab** eventually make it to the destination? Are there any other interesting observations to note?

### Answer
As we would expect, the agent showed completely erratic behavior. Still, even with random actions, the agent reached its destination quite often in about 65 cases, and the trial was aborted in about 35 cases because the agent hit the hard time limit (-100). See Appendix A for an example output.

I repeatedly run the experiment, of course, but this really seems to be an average result. Intuitively, given $8 \times 6 = 48$ intersections and the option to do nothing, I'd expected it to fail more often.

Naturally, the results deteriorate if we set `enforce_deadline` to `True`. The cab would reach the destination in about 20% to 25% of the cases only.

## Inform the Driving Agent
Now that your driving agent is capable of moving around in the environment, your next task is to identify a set of states that are appropriate for modeling the **smartcab** and environment. The main source of state variables are the current inputs at the intersection, but not all may require representation. You may choose to explicitly define states, or use some combination of inputs as an implicit state. At each time step, process the inputs and update the agent's current state using the `self.state` variable. Continue with the simulation deadline enforcement `enforce_deadline` being set to `False`, and observe how your driving agent now reports the change in state as the simulation progresses.

### QUESTION
What states have you identified that are appropriate for modeling the **smartcab** and environment? Why do you believe each of these states to be appropriate for this problem?

### Answer
Naively, we could use all the `inputs` containing information about the light and possible cars and their directions, the next `waypoint`, and also the `deadline` to constitute states. Since `deadline` can take up lots of different values, it would dramatically increase the number of states and thus training time. If we'd like to increase the size of the grid later, we'd even be in a worse situation.

Of course, we could also think of something more complicated like $\left \lfloor{log_{3}({deadline})}\right \rfloor$. This approach would not also reduce the number of states resulting by changes in `deadline`, but we could also tackle the importance of the remaining time: the state would change more often the smaller the remaining time becomes. Anyway, this is should also be possible by adjusting $\alpha$, $\gamma$, and $\epsilon$ according to the `deadline`.

The next best thing would be 5 variables with 2 or 4 possible values:

- $light \in \{red, green\}$
- $left \in \{None, forward, left, right\}$
- $oncoming \in \{None, forward, left, right\}$
- $right \in \{None, forward, left, right\}$
- $waypoint \in \{None, forward, left, right\}$

`state` could be stored as a dictionary containing a combination of those, so we'd still have 512 different states.

### OPTIONAL
How many states in total exist for the **smartcab** in this environment? Does this number seem reasonable given that the goal of Q-Learning is to learn and make informed decisions about each state? Why or why not?

### Answer
The approach generates $2 \times 4 \times 4 \times 4 \times 4 = 512$ different states possible. This will allow the agent to make good decisions, but it will also learn very slowly. For Q-Learning, there are many states that have to be visited multiple times in order to learn the value of each action possible. 

The most extreme alternative (besides driving randomly) would probably be to ignore everything but `waypoint`. We'd only have 4 states, and given the small amount of traffic this might even work - but I'd probably not use such a smartcab in real life. Also including `light` would be my next best alternative, because this parameter has to be considered at every intersection, would prevent some crashes, and would influence our reward quite a bit. We'd end up with merely 8 states which I'd consider quite reasonable.

## Implement a Q-Learning Driving Agent

With your driving agent being capable of interpreting the input information and having a mapping of environmental states, your next task is to implement the Q-Learning algorithm for your driving agent to choose the best action at each time step, based on the Q-values for the current state and action. Each action taken by the **smartcab** will produce a reward which depends on the state of the environment. The Q-Learning driving agent will need to consider these rewards when updating the Q-values. Once implemented, set the simulation deadline enforcement `enforce_deadline` to `True`. Run the simulation and observe how the **smartcab** moves about the environment in each trial.

The formulas for updating Q-values can be [found in this video](https://classroom.udacity.com/nanodegrees/nd009/parts/0091345409/modules/e64f9a65-fdb5-4e60-81a9-72813beebb7e/lessons/5446820041/concepts/6348990570923).

### QUESTION
What changes do you notice in the agent's behavior when compared to the basic driving agent when random actions were always taken? Why is this behavior occurring?

### Answer
To me it is not completely clear how $\alpha$, $\gamma$ and $\epsilon$ should be set for this first run, so I simply used `random.random()` to generate random values for each of them. I also had a look at the results before setting `enforce_deadline` to `True`. In most cases, even with these random values the performance increased dramatically. In many cases, the agent wouldn't fail anymore. That was to be expected, because now the agent learns over time and uses the rewards as guidance for his actions instead of driving randomly. I also noticed that the agent tended to only fail in eary trials if at all. That's because it takes some time to fill in all the Q values. I attached one example with randomly generated values for $\alpha$, $\gamma$ and $\epsilon$ in Appendix B.

After these trials, I set `enforce_deadline` to `True` and simply set $\alpha$ and $\gamma$ to 0.5 and $\epsilon$ to 0.25, so the agent would still randomly cruise the streets in about 25 % of all moves, moderately consider previous visits and moderately take into account the utility of the next state. In 100 runs with 100 iterations each, the average success rate was about 79% - still better than the random approach even though now we have a deadline.

## Improve the Q-Learning Driving Agent
Your final task for this project is to enhance your driving agent so that, after sufficient training, the **smartcab** is able to reach the destination within the allotted time safely and efficiently. Parameters in the Q-Learning algorithm, such as the learning rate (`alpha`), the discount factor (`gamma`) and the exploration rate (`epsilon`) all contribute to the driving agent’s ability to learn the best action for each state. To improve on the success of your **smartcab**:

- Set the number of trials, `n_trials`, in the simulation to 100.
- Run the simulation with the deadline enforcement `enforce_deadline` set to `True` (you will need to reduce the update delay `update_delay` and set the `display` to `False`).
- Observe the driving agent’s learning and **smartcab’s** success rate, particularly during the later trials.
- Adjust one or several of the above parameters and iterate this process.
- This task is complete once you have arrived at what you determine is the best combination of parameters required for your driving agent to learn successfully.

### QUESTION
Report the different values for the parameters tuned in your basic implementation of Q-Learning. For which set of parameters does the agent perform best? How well does the final driving agent perform?

### Answer
In order to find the best combination of $\alpha$, $\gamma$ and $\epsilon$, I quickly implemented a grid search. I lazily started it with a set of $\{0.1, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8, 0.9\}$ for each parameter and averaged the success rate of 25 trials for each combination. After quite some waiting, the result was:

Afterwards, I explored the space around this solution a little more diligently by setting $\alpha \in, \gamma \in$ and $\epsilon \in$. Now, I also averaged the success rate of 100 trials for each combination. The final result was:


### QUESTION
Does your agent get close to finding an optimal policy, i.e. reach the destination in the minimum possible time, and not incur any penalties? How would you describe an optimal policy for this problem?

## Appendix A: random driving
When setting a random action, setting `enforce_deadline` to `False`, and ignoring the agent's output, the simulator would return something like this.

In this example, the agent reached the destination in 63 cases and failed in 37 cases.

```
Simulator.run(): Trial 0
Environment.reset(): Trial set up with start = (8, 6), destination = (3, 2), deadline = 45
RoutePlanner.route_to(): destination = (3, 2)
Environment.act(): Primary agent has reached destination!
Simulator.run(): Trial 1
Environment.reset(): Trial set up with start = (2, 5), destination = (4, 2), deadline = 25
RoutePlanner.route_to(): destination = (4, 2)
Environment.step(): Primary agent hit hard time limit (-100)! Trial aborted.
Simulator.run(): Trial 2
Environment.reset(): Trial set up with start = (1, 2), destination = (5, 3), deadline = 25
RoutePlanner.route_to(): destination = (5, 3)
Environment.act(): Primary agent has reached destination!
Simulator.run(): Trial 3
Environment.reset(): Trial set up with start = (1, 6), destination = (7, 2), deadline = 50
RoutePlanner.route_to(): destination = (7, 2)
Environment.step(): Primary agent hit hard time limit (-100)! Trial aborted.
Simulator.run(): Trial 4
Environment.reset(): Trial set up with start = (1, 1), destination = (6, 2), deadline = 30
RoutePlanner.route_to(): destination = (6, 2)
Environment.step(): Primary agent hit hard time limit (-100)! Trial aborted.
Simulator.run(): Trial 5
Environment.reset(): Trial set up with start = (3, 5), destination = (6, 3), deadline = 25
RoutePlanner.route_to(): destination = (6, 3)
Environment.step(): Primary agent hit hard time limit (-100)! Trial aborted.
Simulator.run(): Trial 6
Environment.reset(): Trial set up with start = (7, 6), destination = (4, 3), deadline = 30
RoutePlanner.route_to(): destination = (4, 3)
Environment.step(): Primary agent hit hard time limit (-100)! Trial aborted.
Simulator.run(): Trial 7
Environment.reset(): Trial set up with start = (4, 1), destination = (5, 4), deadline = 20
RoutePlanner.route_to(): destination = (5, 4)
Environment.act(): Primary agent has reached destination!
Simulator.run(): Trial 8
Environment.reset(): Trial set up with start = (2, 2), destination = (6, 1), deadline = 25
RoutePlanner.route_to(): destination = (6, 1)
Environment.act(): Primary agent has reached destination!
Simulator.run(): Trial 9
Environment.reset(): Trial set up with start = (5, 2), destination = (1, 3), deadline = 25
RoutePlanner.route_to(): destination = (1, 3)
Environment.act(): Primary agent has reached destination!
Simulator.run(): Trial 10
Environment.reset(): Trial set up with start = (3, 6), destination = (7, 6), deadline = 20
RoutePlanner.route_to(): destination = (7, 6)
Environment.act(): Primary agent has reached destination!
Simulator.run(): Trial 11
Environment.reset(): Trial set up with start = (8, 3), destination = (3, 4), deadline = 30
RoutePlanner.route_to(): destination = (3, 4)
Environment.act(): Primary agent has reached destination!
Simulator.run(): Trial 12
Environment.reset(): Trial set up with start = (1, 5), destination = (5, 4), deadline = 25
RoutePlanner.route_to(): destination = (5, 4)
Environment.act(): Primary agent has reached destination!
Simulator.run(): Trial 13
Environment.reset(): Trial set up with start = (4, 4), destination = (2, 2), deadline = 20
RoutePlanner.route_to(): destination = (2, 2)
Environment.step(): Primary agent hit hard time limit (-100)! Trial aborted.
Simulator.run(): Trial 14
Environment.reset(): Trial set up with start = (7, 1), destination = (1, 1), deadline = 30
RoutePlanner.route_to(): destination = (1, 1)
Environment.act(): Primary agent has reached destination!
Simulator.run(): Trial 15
Environment.reset(): Trial set up with start = (6, 1), destination = (5, 4), deadline = 20
RoutePlanner.route_to(): destination = (5, 4)
Environment.act(): Primary agent has reached destination!
Simulator.run(): Trial 16
Environment.reset(): Trial set up with start = (6, 5), destination = (2, 1), deadline = 40
RoutePlanner.route_to(): destination = (2, 1)
Environment.act(): Primary agent has reached destination!
Simulator.run(): Trial 17
Environment.reset(): Trial set up with start = (5, 3), destination = (8, 5), deadline = 25
RoutePlanner.route_to(): destination = (8, 5)
Environment.act(): Primary agent has reached destination!
Simulator.run(): Trial 18
Environment.reset(): Trial set up with start = (4, 5), destination = (8, 4), deadline = 25
RoutePlanner.route_to(): destination = (8, 4)
Environment.act(): Primary agent has reached destination!
Simulator.run(): Trial 19
Environment.reset(): Trial set up with start = (3, 5), destination = (6, 2), deadline = 30
RoutePlanner.route_to(): destination = (6, 2)
Environment.act(): Primary agent has reached destination!
Simulator.run(): Trial 20
Environment.reset(): Trial set up with start = (1, 4), destination = (3, 1), deadline = 25
RoutePlanner.route_to(): destination = (3, 1)
Environment.act(): Primary agent has reached destination!
Simulator.run(): Trial 21
Environment.reset(): Trial set up with start = (5, 6), destination = (1, 4), deadline = 30
RoutePlanner.route_to(): destination = (1, 4)
Environment.act(): Primary agent has reached destination!
Simulator.run(): Trial 22
Environment.reset(): Trial set up with start = (3, 1), destination = (7, 2), deadline = 25
RoutePlanner.route_to(): destination = (7, 2)
Environment.step(): Primary agent hit hard time limit (-100)! Trial aborted.
Simulator.run(): Trial 23
Environment.reset(): Trial set up with start = (8, 2), destination = (4, 5), deadline = 35
RoutePlanner.route_to(): destination = (4, 5)
Environment.act(): Primary agent has reached destination!
Simulator.run(): Trial 24
Environment.reset(): Trial set up with start = (6, 5), destination = (8, 3), deadline = 20
RoutePlanner.route_to(): destination = (8, 3)
Environment.act(): Primary agent has reached destination!
Simulator.run(): Trial 25
Environment.reset(): Trial set up with start = (1, 2), destination = (3, 5), deadline = 25
RoutePlanner.route_to(): destination = (3, 5)
Environment.act(): Primary agent has reached destination!
Simulator.run(): Trial 26
Environment.reset(): Trial set up with start = (4, 3), destination = (1, 5), deadline = 25
RoutePlanner.route_to(): destination = (1, 5)
Environment.act(): Primary agent has reached destination!
Simulator.run(): Trial 27
Environment.reset(): Trial set up with start = (3, 5), destination = (7, 3), deadline = 30
RoutePlanner.route_to(): destination = (7, 3)
Environment.act(): Primary agent has reached destination!
Simulator.run(): Trial 28
Environment.reset(): Trial set up with start = (6, 6), destination = (7, 3), deadline = 20
RoutePlanner.route_to(): destination = (7, 3)
Environment.act(): Primary agent has reached destination!
Simulator.run(): Trial 29
Environment.reset(): Trial set up with start = (6, 6), destination = (3, 2), deadline = 35
RoutePlanner.route_to(): destination = (3, 2)
Environment.step(): Primary agent hit hard time limit (-100)! Trial aborted.
Simulator.run(): Trial 30
Environment.reset(): Trial set up with start = (3, 2), destination = (6, 3), deadline = 20
RoutePlanner.route_to(): destination = (6, 3)
Environment.act(): Primary agent has reached destination!
Simulator.run(): Trial 31
Environment.reset(): Trial set up with start = (8, 5), destination = (6, 3), deadline = 20
RoutePlanner.route_to(): destination = (6, 3)
Environment.act(): Primary agent has reached destination!
Simulator.run(): Trial 32
Environment.reset(): Trial set up with start = (7, 4), destination = (4, 6), deadline = 25
RoutePlanner.route_to(): destination = (4, 6)
Environment.step(): Primary agent hit hard time limit (-100)! Trial aborted.
Simulator.run(): Trial 33
Environment.reset(): Trial set up with start = (5, 2), destination = (7, 6), deadline = 30
RoutePlanner.route_to(): destination = (7, 6)
Environment.act(): Primary agent has reached destination!
Simulator.run(): Trial 34
Environment.reset(): Trial set up with start = (7, 6), destination = (5, 2), deadline = 30
RoutePlanner.route_to(): destination = (5, 2)
Environment.step(): Primary agent hit hard time limit (-100)! Trial aborted.
Simulator.run(): Trial 35
Environment.reset(): Trial set up with start = (8, 6), destination = (4, 1), deadline = 45
RoutePlanner.route_to(): destination = (4, 1)
Environment.step(): Primary agent hit hard time limit (-100)! Trial aborted.
Simulator.run(): Trial 36
Environment.reset(): Trial set up with start = (8, 2), destination = (4, 1), deadline = 25
RoutePlanner.route_to(): destination = (4, 1)
Environment.step(): Primary agent hit hard time limit (-100)! Trial aborted.
Simulator.run(): Trial 37
Environment.reset(): Trial set up with start = (6, 6), destination = (1, 5), deadline = 30
RoutePlanner.route_to(): destination = (1, 5)
Environment.act(): Primary agent has reached destination!
Simulator.run(): Trial 38
Environment.reset(): Trial set up with start = (2, 4), destination = (8, 4), deadline = 30
RoutePlanner.route_to(): destination = (8, 4)
Environment.act(): Primary agent has reached destination!
Simulator.run(): Trial 39
Environment.reset(): Trial set up with start = (8, 2), destination = (3, 5), deadline = 40
RoutePlanner.route_to(): destination = (3, 5)
Environment.act(): Primary agent has reached destination!
Simulator.run(): Trial 40
Environment.reset(): Trial set up with start = (4, 6), destination = (5, 1), deadline = 30
RoutePlanner.route_to(): destination = (5, 1)
Environment.act(): Primary agent has reached destination!
Simulator.run(): Trial 41
Environment.reset(): Trial set up with start = (5, 4), destination = (8, 2), deadline = 25
RoutePlanner.route_to(): destination = (8, 2)
Environment.act(): Primary agent has reached destination!
Simulator.run(): Trial 42
Environment.reset(): Trial set up with start = (2, 1), destination = (1, 6), deadline = 30
RoutePlanner.route_to(): destination = (1, 6)
Environment.act(): Primary agent has reached destination!
Simulator.run(): Trial 43
Environment.reset(): Trial set up with start = (1, 3), destination = (7, 5), deadline = 40
RoutePlanner.route_to(): destination = (7, 5)
Environment.act(): Primary agent has reached destination!
Simulator.run(): Trial 44
Environment.reset(): Trial set up with start = (2, 4), destination = (4, 2), deadline = 20
RoutePlanner.route_to(): destination = (4, 2)
Environment.step(): Primary agent hit hard time limit (-100)! Trial aborted.
Simulator.run(): Trial 45
Environment.reset(): Trial set up with start = (3, 2), destination = (1, 5), deadline = 25
RoutePlanner.route_to(): destination = (1, 5)
Environment.step(): Primary agent hit hard time limit (-100)! Trial aborted.
Simulator.run(): Trial 46
Environment.reset(): Trial set up with start = (5, 1), destination = (5, 6), deadline = 25
RoutePlanner.route_to(): destination = (5, 6)
Environment.act(): Primary agent has reached destination!
Simulator.run(): Trial 47
Environment.reset(): Trial set up with start = (2, 5), destination = (8, 6), deadline = 35
RoutePlanner.route_to(): destination = (8, 6)
Environment.step(): Primary agent hit hard time limit (-100)! Trial aborted.
Simulator.run(): Trial 48
Environment.reset(): Trial set up with start = (8, 6), destination = (5, 3), deadline = 30
RoutePlanner.route_to(): destination = (5, 3)
Environment.act(): Primary agent has reached destination!
Simulator.run(): Trial 49
Environment.reset(): Trial set up with start = (3, 4), destination = (1, 6), deadline = 20
RoutePlanner.route_to(): destination = (1, 6)
Environment.act(): Primary agent has reached destination!
Simulator.run(): Trial 50
Environment.reset(): Trial set up with start = (5, 1), destination = (6, 5), deadline = 25
RoutePlanner.route_to(): destination = (6, 5)
Environment.act(): Primary agent has reached destination!
Simulator.run(): Trial 51
Environment.reset(): Trial set up with start = (7, 6), destination = (1, 3), deadline = 45
RoutePlanner.route_to(): destination = (1, 3)
Environment.act(): Primary agent has reached destination!
Simulator.run(): Trial 52
Environment.reset(): Trial set up with start = (4, 2), destination = (2, 5), deadline = 25
RoutePlanner.route_to(): destination = (2, 5)
Environment.act(): Primary agent has reached destination!
Simulator.run(): Trial 53
Environment.reset(): Trial set up with start = (4, 5), destination = (1, 4), deadline = 20
RoutePlanner.route_to(): destination = (1, 4)
Environment.step(): Primary agent hit hard time limit (-100)! Trial aborted.
Simulator.run(): Trial 54
Environment.reset(): Trial set up with start = (7, 4), destination = (1, 4), deadline = 30
RoutePlanner.route_to(): destination = (1, 4)
Environment.act(): Primary agent has reached destination!
Simulator.run(): Trial 55
Environment.reset(): Trial set up with start = (2, 6), destination = (3, 3), deadline = 20
RoutePlanner.route_to(): destination = (3, 3)
Environment.step(): Primary agent hit hard time limit (-100)! Trial aborted.
Simulator.run(): Trial 56
Environment.reset(): Trial set up with start = (3, 6), destination = (8, 3), deadline = 40
RoutePlanner.route_to(): destination = (8, 3)
Environment.act(): Primary agent has reached destination!
Simulator.run(): Trial 57
Environment.reset(): Trial set up with start = (8, 1), destination = (7, 4), deadline = 20
RoutePlanner.route_to(): destination = (7, 4)
Environment.step(): Primary agent hit hard time limit (-100)! Trial aborted.
Simulator.run(): Trial 58
Environment.reset(): Trial set up with start = (7, 4), destination = (1, 5), deadline = 35
RoutePlanner.route_to(): destination = (1, 5)
Environment.step(): Primary agent hit hard time limit (-100)! Trial aborted.
Simulator.run(): Trial 59
Environment.reset(): Trial set up with start = (8, 3), destination = (1, 5), deadline = 45
RoutePlanner.route_to(): destination = (1, 5)
Environment.step(): Primary agent hit hard time limit (-100)! Trial aborted.
Simulator.run(): Trial 60
Environment.reset(): Trial set up with start = (6, 3), destination = (2, 1), deadline = 30
RoutePlanner.route_to(): destination = (2, 1)
Environment.act(): Primary agent has reached destination!
Simulator.run(): Trial 61
Environment.reset(): Trial set up with start = (2, 3), destination = (7, 3), deadline = 25
RoutePlanner.route_to(): destination = (7, 3)
Environment.act(): Primary agent has reached destination!
Simulator.run(): Trial 62
Environment.reset(): Trial set up with start = (6, 2), destination = (1, 4), deadline = 35
RoutePlanner.route_to(): destination = (1, 4)
Environment.act(): Primary agent has reached destination!
Simulator.run(): Trial 63
Environment.reset(): Trial set up with start = (7, 2), destination = (8, 5), deadline = 20
RoutePlanner.route_to(): destination = (8, 5)
Environment.step(): Primary agent hit hard time limit (-100)! Trial aborted.
Simulator.run(): Trial 64
Environment.reset(): Trial set up with start = (8, 6), destination = (2, 6), deadline = 30
RoutePlanner.route_to(): destination = (2, 6)
Environment.act(): Primary agent has reached destination!
Simulator.run(): Trial 65
Environment.reset(): Trial set up with start = (5, 5), destination = (3, 1), deadline = 30
RoutePlanner.route_to(): destination = (3, 1)
Environment.act(): Primary agent has reached destination!
Simulator.run(): Trial 66
Environment.reset(): Trial set up with start = (7, 6), destination = (6, 1), deadline = 30
RoutePlanner.route_to(): destination = (6, 1)
Environment.act(): Primary agent has reached destination!
Simulator.run(): Trial 67
Environment.reset(): Trial set up with start = (8, 6), destination = (2, 6), deadline = 30
RoutePlanner.route_to(): destination = (2, 6)
Environment.act(): Primary agent has reached destination!
Simulator.run(): Trial 68
Environment.reset(): Trial set up with start = (8, 1), destination = (6, 4), deadline = 25
RoutePlanner.route_to(): destination = (6, 4)
Environment.step(): Primary agent hit hard time limit (-100)! Trial aborted.
Simulator.run(): Trial 69
Environment.reset(): Trial set up with start = (6, 3), destination = (4, 5), deadline = 20
RoutePlanner.route_to(): destination = (4, 5)
Environment.act(): Primary agent has reached destination!
Simulator.run(): Trial 70
Environment.reset(): Trial set up with start = (8, 4), destination = (3, 6), deadline = 35
RoutePlanner.route_to(): destination = (3, 6)
Environment.act(): Primary agent has reached destination!
Simulator.run(): Trial 71
Environment.reset(): Trial set up with start = (2, 6), destination = (8, 6), deadline = 30
RoutePlanner.route_to(): destination = (8, 6)
Environment.act(): Primary agent has reached destination!
Simulator.run(): Trial 72
Environment.reset(): Trial set up with start = (1, 5), destination = (5, 2), deadline = 35
RoutePlanner.route_to(): destination = (5, 2)
Environment.act(): Primary agent has reached destination!
Simulator.run(): Trial 73
Environment.reset(): Trial set up with start = (4, 1), destination = (7, 6), deadline = 40
RoutePlanner.route_to(): destination = (7, 6)
Environment.act(): Primary agent has reached destination!
Simulator.run(): Trial 74
Environment.reset(): Trial set up with start = (5, 2), destination = (1, 4), deadline = 30
RoutePlanner.route_to(): destination = (1, 4)
Environment.act(): Primary agent has reached destination!
Simulator.run(): Trial 75
Environment.reset(): Trial set up with start = (8, 6), destination = (5, 3), deadline = 30
RoutePlanner.route_to(): destination = (5, 3)
Environment.step(): Primary agent hit hard time limit (-100)! Trial aborted.
Simulator.run(): Trial 76
Environment.reset(): Trial set up with start = (7, 5), destination = (2, 5), deadline = 25
RoutePlanner.route_to(): destination = (2, 5)
Environment.act(): Primary agent has reached destination!
Simulator.run(): Trial 77
Environment.reset(): Trial set up with start = (6, 5), destination = (1, 3), deadline = 35
RoutePlanner.route_to(): destination = (1, 3)
Environment.act(): Primary agent has reached destination!
Simulator.run(): Trial 78
Environment.reset(): Trial set up with start = (1, 1), destination = (8, 2), deadline = 40
RoutePlanner.route_to(): destination = (8, 2)
Environment.step(): Primary agent hit hard time limit (-100)! Trial aborted.
Simulator.run(): Trial 79
Environment.reset(): Trial set up with start = (8, 3), destination = (3, 6), deadline = 40
RoutePlanner.route_to(): destination = (3, 6)
Environment.step(): Primary agent hit hard time limit (-100)! Trial aborted.
Simulator.run(): Trial 80
Environment.reset(): Trial set up with start = (3, 4), destination = (7, 4), deadline = 20
RoutePlanner.route_to(): destination = (7, 4)
Environment.step(): Primary agent hit hard time limit (-100)! Trial aborted.
Simulator.run(): Trial 81
Environment.reset(): Trial set up with start = (7, 5), destination = (2, 5), deadline = 25
RoutePlanner.route_to(): destination = (2, 5)
Environment.step(): Primary agent hit hard time limit (-100)! Trial aborted.
Simulator.run(): Trial 82
Environment.reset(): Trial set up with start = (1, 5), destination = (8, 1), deadline = 55
RoutePlanner.route_to(): destination = (8, 1)
Environment.step(): Primary agent hit hard time limit (-100)! Trial aborted.
Simulator.run(): Trial 83
Environment.reset(): Trial set up with start = (2, 2), destination = (5, 1), deadline = 20
RoutePlanner.route_to(): destination = (5, 1)
Environment.act(): Primary agent has reached destination!
Simulator.run(): Trial 84
Environment.reset(): Trial set up with start = (2, 5), destination = (8, 6), deadline = 35
RoutePlanner.route_to(): destination = (8, 6)
Environment.act(): Primary agent has reached destination!
Simulator.run(): Trial 85
Environment.reset(): Trial set up with start = (4, 6), destination = (6, 4), deadline = 20
RoutePlanner.route_to(): destination = (6, 4)
Environment.step(): Primary agent hit hard time limit (-100)! Trial aborted.
Simulator.run(): Trial 86
Environment.reset(): Trial set up with start = (5, 5), destination = (2, 1), deadline = 35
RoutePlanner.route_to(): destination = (2, 1)
Environment.act(): Primary agent has reached destination!
Simulator.run(): Trial 87
Environment.reset(): Trial set up with start = (5, 2), destination = (8, 3), deadline = 20
RoutePlanner.route_to(): destination = (8, 3)
Environment.act(): Primary agent has reached destination!
Simulator.run(): Trial 88
Environment.reset(): Trial set up with start = (2, 6), destination = (7, 6), deadline = 25
RoutePlanner.route_to(): destination = (7, 6)
Environment.step(): Primary agent hit hard time limit (-100)! Trial aborted.
Simulator.run(): Trial 89
Environment.reset(): Trial set up with start = (6, 6), destination = (2, 2), deadline = 40
RoutePlanner.route_to(): destination = (2, 2)
Environment.step(): Primary agent hit hard time limit (-100)! Trial aborted.
Simulator.run(): Trial 90
Environment.reset(): Trial set up with start = (3, 3), destination = (6, 6), deadline = 30
RoutePlanner.route_to(): destination = (6, 6)
Environment.step(): Primary agent hit hard time limit (-100)! Trial aborted.
Simulator.run(): Trial 91
Environment.reset(): Trial set up with start = (5, 6), destination = (8, 3), deadline = 30
RoutePlanner.route_to(): destination = (8, 3)
Environment.act(): Primary agent has reached destination!
Simulator.run(): Trial 92
Environment.reset(): Trial set up with start = (2, 4), destination = (7, 5), deadline = 30
RoutePlanner.route_to(): destination = (7, 5)
Environment.act(): Primary agent has reached destination!
Simulator.run(): Trial 93
Environment.reset(): Trial set up with start = (5, 4), destination = (8, 5), deadline = 20
RoutePlanner.route_to(): destination = (8, 5)
Environment.step(): Primary agent hit hard time limit (-100)! Trial aborted.
Simulator.run(): Trial 94
Environment.reset(): Trial set up with start = (1, 5), destination = (7, 4), deadline = 35
RoutePlanner.route_to(): destination = (7, 4)
Environment.step(): Primary agent hit hard time limit (-100)! Trial aborted.
Simulator.run(): Trial 95
Environment.reset(): Trial set up with start = (2, 2), destination = (3, 5), deadline = 20
RoutePlanner.route_to(): destination = (3, 5)
Environment.act(): Primary agent has reached destination!
Simulator.run(): Trial 96
Environment.reset(): Trial set up with start = (4, 2), destination = (8, 2), deadline = 20
RoutePlanner.route_to(): destination = (8, 2)
Environment.step(): Primary agent hit hard time limit (-100)! Trial aborted.
Simulator.run(): Trial 97
Environment.reset(): Trial set up with start = (3, 6), destination = (7, 5), deadline = 25
RoutePlanner.route_to(): destination = (7, 5)
Environment.step(): Primary agent hit hard time limit (-100)! Trial aborted.
Simulator.run(): Trial 98
Environment.reset(): Trial set up with start = (1, 6), destination = (2, 3), deadline = 20
RoutePlanner.route_to(): destination = (2, 3)
Environment.step(): Primary agent hit hard time limit (-100)! Trial aborted.
Simulator.run(): Trial 99
Environment.reset(): Trial set up with start = (3, 5), destination = (6, 1), deadline = 35
RoutePlanner.route_to(): destination = (6, 1)
Environment.act(): Primary agent has reached destination!
```

# Appendix B: $\alpha$ = 0.21, $\gamma$ = 0.48, $\epsilon$ = 0.03
In this example, without enforcing the deadline, the agent reached the destination in 97 cases and failed in 3 cases even with setting $\alpha$, $\gamma$ and $\epsilon$ randomly.
```
alpha: 0.21, gamma: 0.48, epsilon: 0.03
Simulator.run(): Trial 0
Environment.reset(): Trial set up with start = (5, 5), destination = (2, 1), deadline = 35
RoutePlanner.route_to(): destination = (2, 1)
Environment.step(): Primary agent hit hard time limit (-100)! Trial aborted.
Simulator.run(): Trial 1
Environment.reset(): Trial set up with start = (2, 3), destination = (6, 3), deadline = 20
RoutePlanner.route_to(): destination = (6, 3)
Environment.act(): Primary agent has reached destination!
Simulator.run(): Trial 2
Environment.reset(): Trial set up with start = (8, 4), destination = (6, 1), deadline = 25
RoutePlanner.route_to(): destination = (6, 1)
Environment.act(): Primary agent has reached destination!
Simulator.run(): Trial 3
Environment.reset(): Trial set up with start = (4, 3), destination = (8, 6), deadline = 35
RoutePlanner.route_to(): destination = (8, 6)
Environment.act(): Primary agent has reached destination!
Simulator.run(): Trial 4
Environment.reset(): Trial set up with start = (8, 4), destination = (2, 4), deadline = 30
RoutePlanner.route_to(): destination = (2, 4)
Environment.act(): Primary agent has reached destination!
Simulator.run(): Trial 5
Environment.reset(): Trial set up with start = (5, 2), destination = (8, 3), deadline = 20
RoutePlanner.route_to(): destination = (8, 3)
Environment.act(): Primary agent has reached destination!
Simulator.run(): Trial 6
Environment.reset(): Trial set up with start = (3, 3), destination = (7, 6), deadline = 35
RoutePlanner.route_to(): destination = (7, 6)
Environment.act(): Primary agent has reached destination!
Simulator.run(): Trial 7
Environment.reset(): Trial set up with start = (4, 4), destination = (8, 5), deadline = 25
RoutePlanner.route_to(): destination = (8, 5)
Environment.act(): Primary agent has reached destination!
Simulator.run(): Trial 8
Environment.reset(): Trial set up with start = (3, 2), destination = (3, 6), deadline = 20
RoutePlanner.route_to(): destination = (3, 6)
Environment.act(): Primary agent has reached destination!
Simulator.run(): Trial 9
Environment.reset(): Trial set up with start = (8, 1), destination = (6, 5), deadline = 30
RoutePlanner.route_to(): destination = (6, 5)
Environment.step(): Primary agent hit hard time limit (-100)! Trial aborted.
Simulator.run(): Trial 10
Environment.reset(): Trial set up with start = (6, 3), destination = (3, 2), deadline = 20
RoutePlanner.route_to(): destination = (3, 2)
Environment.act(): Primary agent has reached destination!
Simulator.run(): Trial 11
Environment.reset(): Trial set up with start = (4, 3), destination = (1, 1), deadline = 25
RoutePlanner.route_to(): destination = (1, 1)
Environment.act(): Primary agent has reached destination!
Simulator.run(): Trial 12
Environment.reset(): Trial set up with start = (1, 4), destination = (8, 3), deadline = 40
RoutePlanner.route_to(): destination = (8, 3)
Environment.act(): Primary agent has reached destination!
Simulator.run(): Trial 13
Environment.reset(): Trial set up with start = (7, 6), destination = (4, 3), deadline = 30
RoutePlanner.route_to(): destination = (4, 3)
Environment.act(): Primary agent has reached destination!
Simulator.run(): Trial 14
Environment.reset(): Trial set up with start = (1, 5), destination = (6, 2), deadline = 40
RoutePlanner.route_to(): destination = (6, 2)
Environment.step(): Primary agent hit hard time limit (-100)! Trial aborted.
Simulator.run(): Trial 15
Environment.reset(): Trial set up with start = (8, 6), destination = (2, 2), deadline = 50
RoutePlanner.route_to(): destination = (2, 2)
Environment.act(): Primary agent has reached destination!
Simulator.run(): Trial 16
Environment.reset(): Trial set up with start = (6, 2), destination = (3, 4), deadline = 25
RoutePlanner.route_to(): destination = (3, 4)
Environment.act(): Primary agent has reached destination!
Simulator.run(): Trial 17
Environment.reset(): Trial set up with start = (4, 3), destination = (1, 5), deadline = 25
RoutePlanner.route_to(): destination = (1, 5)
Environment.act(): Primary agent has reached destination!
Simulator.run(): Trial 18
Environment.reset(): Trial set up with start = (1, 3), destination = (8, 6), deadline = 50
RoutePlanner.route_to(): destination = (8, 6)
Environment.act(): Primary agent has reached destination!
Simulator.run(): Trial 19
Environment.reset(): Trial set up with start = (8, 2), destination = (1, 5), deadline = 50
RoutePlanner.route_to(): destination = (1, 5)
Environment.act(): Primary agent has reached destination!
Simulator.run(): Trial 20
Environment.reset(): Trial set up with start = (6, 5), destination = (2, 1), deadline = 40
RoutePlanner.route_to(): destination = (2, 1)
Environment.act(): Primary agent has reached destination!
Simulator.run(): Trial 21
Environment.reset(): Trial set up with start = (4, 6), destination = (3, 1), deadline = 30
RoutePlanner.route_to(): destination = (3, 1)
Environment.act(): Primary agent has reached destination!
Simulator.run(): Trial 22
Environment.reset(): Trial set up with start = (5, 4), destination = (4, 1), deadline = 20
RoutePlanner.route_to(): destination = (4, 1)
Environment.act(): Primary agent has reached destination!
Simulator.run(): Trial 23
Environment.reset(): Trial set up with start = (8, 4), destination = (4, 5), deadline = 25
RoutePlanner.route_to(): destination = (4, 5)
Environment.act(): Primary agent has reached destination!
Simulator.run(): Trial 24
Environment.reset(): Trial set up with start = (4, 4), destination = (2, 2), deadline = 20
RoutePlanner.route_to(): destination = (2, 2)
Environment.act(): Primary agent has reached destination!
Simulator.run(): Trial 25
Environment.reset(): Trial set up with start = (7, 1), destination = (8, 6), deadline = 30
RoutePlanner.route_to(): destination = (8, 6)
Environment.act(): Primary agent has reached destination!
Simulator.run(): Trial 26
Environment.reset(): Trial set up with start = (4, 6), destination = (1, 2), deadline = 35
RoutePlanner.route_to(): destination = (1, 2)
Environment.act(): Primary agent has reached destination!
Simulator.run(): Trial 27
Environment.reset(): Trial set up with start = (1, 5), destination = (7, 2), deadline = 45
RoutePlanner.route_to(): destination = (7, 2)
Environment.act(): Primary agent has reached destination!
Simulator.run(): Trial 28
Environment.reset(): Trial set up with start = (6, 5), destination = (8, 1), deadline = 30
RoutePlanner.route_to(): destination = (8, 1)
Environment.act(): Primary agent has reached destination!
Simulator.run(): Trial 29
Environment.reset(): Trial set up with start = (8, 3), destination = (5, 6), deadline = 30
RoutePlanner.route_to(): destination = (5, 6)
Environment.act(): Primary agent has reached destination!
Simulator.run(): Trial 30
Environment.reset(): Trial set up with start = (3, 3), destination = (5, 5), deadline = 20
RoutePlanner.route_to(): destination = (5, 5)
Environment.act(): Primary agent has reached destination!
Simulator.run(): Trial 31
Environment.reset(): Trial set up with start = (1, 6), destination = (4, 2), deadline = 35
RoutePlanner.route_to(): destination = (4, 2)
Environment.act(): Primary agent has reached destination!
Simulator.run(): Trial 32
Environment.reset(): Trial set up with start = (8, 4), destination = (5, 1), deadline = 30
RoutePlanner.route_to(): destination = (5, 1)
Environment.act(): Primary agent has reached destination!
Simulator.run(): Trial 33
Environment.reset(): Trial set up with start = (8, 6), destination = (5, 1), deadline = 40
RoutePlanner.route_to(): destination = (5, 1)
Environment.act(): Primary agent has reached destination!
Simulator.run(): Trial 34
Environment.reset(): Trial set up with start = (8, 5), destination = (1, 2), deadline = 50
RoutePlanner.route_to(): destination = (1, 2)
Environment.act(): Primary agent has reached destination!
Simulator.run(): Trial 35
Environment.reset(): Trial set up with start = (1, 2), destination = (3, 5), deadline = 25
RoutePlanner.route_to(): destination = (3, 5)
Environment.act(): Primary agent has reached destination!
Simulator.run(): Trial 36
Environment.reset(): Trial set up with start = (2, 5), destination = (7, 2), deadline = 40
RoutePlanner.route_to(): destination = (7, 2)
Environment.act(): Primary agent has reached destination!
Simulator.run(): Trial 37
Environment.reset(): Trial set up with start = (2, 4), destination = (8, 5), deadline = 35
RoutePlanner.route_to(): destination = (8, 5)
Environment.act(): Primary agent has reached destination!
Simulator.run(): Trial 38
Environment.reset(): Trial set up with start = (7, 5), destination = (2, 4), deadline = 30
RoutePlanner.route_to(): destination = (2, 4)
Environment.act(): Primary agent has reached destination!
Simulator.run(): Trial 39
Environment.reset(): Trial set up with start = (4, 5), destination = (1, 2), deadline = 30
RoutePlanner.route_to(): destination = (1, 2)
Environment.act(): Primary agent has reached destination!
Simulator.run(): Trial 40
Environment.reset(): Trial set up with start = (7, 5), destination = (4, 4), deadline = 20
RoutePlanner.route_to(): destination = (4, 4)
Environment.act(): Primary agent has reached destination!
Simulator.run(): Trial 41
Environment.reset(): Trial set up with start = (1, 3), destination = (8, 6), deadline = 50
RoutePlanner.route_to(): destination = (8, 6)
Environment.act(): Primary agent has reached destination!
Simulator.run(): Trial 42
Environment.reset(): Trial set up with start = (5, 2), destination = (7, 5), deadline = 25
RoutePlanner.route_to(): destination = (7, 5)
Environment.act(): Primary agent has reached destination!
Simulator.run(): Trial 43
Environment.reset(): Trial set up with start = (8, 1), destination = (6, 5), deadline = 30
RoutePlanner.route_to(): destination = (6, 5)
Environment.act(): Primary agent has reached destination!
Simulator.run(): Trial 44
Environment.reset(): Trial set up with start = (2, 4), destination = (8, 3), deadline = 35
RoutePlanner.route_to(): destination = (8, 3)
Environment.act(): Primary agent has reached destination!
Simulator.run(): Trial 45
Environment.reset(): Trial set up with start = (1, 2), destination = (8, 5), deadline = 50
RoutePlanner.route_to(): destination = (8, 5)
Environment.act(): Primary agent has reached destination!
Simulator.run(): Trial 46
Environment.reset(): Trial set up with start = (6, 2), destination = (2, 5), deadline = 35
RoutePlanner.route_to(): destination = (2, 5)
Environment.act(): Primary agent has reached destination!
Simulator.run(): Trial 47
Environment.reset(): Trial set up with start = (8, 4), destination = (3, 5), deadline = 30
RoutePlanner.route_to(): destination = (3, 5)
Environment.act(): Primary agent has reached destination!
Simulator.run(): Trial 48
Environment.reset(): Trial set up with start = (2, 2), destination = (8, 3), deadline = 35
RoutePlanner.route_to(): destination = (8, 3)
Environment.act(): Primary agent has reached destination!
Simulator.run(): Trial 49
Environment.reset(): Trial set up with start = (7, 2), destination = (4, 5), deadline = 30
RoutePlanner.route_to(): destination = (4, 5)
Environment.act(): Primary agent has reached destination!
Simulator.run(): Trial 50
Environment.reset(): Trial set up with start = (1, 3), destination = (4, 2), deadline = 20
RoutePlanner.route_to(): destination = (4, 2)
Environment.act(): Primary agent has reached destination!
Simulator.run(): Trial 51
Environment.reset(): Trial set up with start = (8, 4), destination = (4, 1), deadline = 35
RoutePlanner.route_to(): destination = (4, 1)
Environment.act(): Primary agent has reached destination!
Simulator.run(): Trial 52
Environment.reset(): Trial set up with start = (6, 4), destination = (2, 1), deadline = 35
RoutePlanner.route_to(): destination = (2, 1)
Environment.act(): Primary agent has reached destination!
Simulator.run(): Trial 53
Environment.reset(): Trial set up with start = (6, 2), destination = (7, 6), deadline = 25
RoutePlanner.route_to(): destination = (7, 6)
Environment.act(): Primary agent has reached destination!
Simulator.run(): Trial 54
Environment.reset(): Trial set up with start = (2, 6), destination = (7, 3), deadline = 40
RoutePlanner.route_to(): destination = (7, 3)
Environment.act(): Primary agent has reached destination!
Simulator.run(): Trial 55
Environment.reset(): Trial set up with start = (6, 6), destination = (5, 3), deadline = 20
RoutePlanner.route_to(): destination = (5, 3)
Environment.act(): Primary agent has reached destination!
Simulator.run(): Trial 56
Environment.reset(): Trial set up with start = (4, 6), destination = (3, 3), deadline = 20
RoutePlanner.route_to(): destination = (3, 3)
Environment.act(): Primary agent has reached destination!
Simulator.run(): Trial 57
Environment.reset(): Trial set up with start = (8, 6), destination = (3, 2), deadline = 45
RoutePlanner.route_to(): destination = (3, 2)
Environment.act(): Primary agent has reached destination!
Simulator.run(): Trial 58
Environment.reset(): Trial set up with start = (8, 1), destination = (6, 4), deadline = 25
RoutePlanner.route_to(): destination = (6, 4)
Environment.act(): Primary agent has reached destination!
Simulator.run(): Trial 59
Environment.reset(): Trial set up with start = (8, 2), destination = (3, 2), deadline = 25
RoutePlanner.route_to(): destination = (3, 2)
Environment.act(): Primary agent has reached destination!
Simulator.run(): Trial 60
Environment.reset(): Trial set up with start = (4, 5), destination = (1, 6), deadline = 20
RoutePlanner.route_to(): destination = (1, 6)
Environment.act(): Primary agent has reached destination!
Simulator.run(): Trial 61
Environment.reset(): Trial set up with start = (4, 1), destination = (2, 6), deadline = 35
RoutePlanner.route_to(): destination = (2, 6)
Environment.act(): Primary agent has reached destination!
Simulator.run(): Trial 62
Environment.reset(): Trial set up with start = (1, 1), destination = (2, 6), deadline = 30
RoutePlanner.route_to(): destination = (2, 6)
Environment.act(): Primary agent has reached destination!
Simulator.run(): Trial 63
Environment.reset(): Trial set up with start = (7, 1), destination = (6, 5), deadline = 25
RoutePlanner.route_to(): destination = (6, 5)
Environment.act(): Primary agent has reached destination!
Simulator.run(): Trial 64
Environment.reset(): Trial set up with start = (8, 5), destination = (6, 3), deadline = 20
RoutePlanner.route_to(): destination = (6, 3)
Environment.act(): Primary agent has reached destination!
Simulator.run(): Trial 65
Environment.reset(): Trial set up with start = (4, 3), destination = (7, 4), deadline = 20
RoutePlanner.route_to(): destination = (7, 4)
Environment.act(): Primary agent has reached destination!
Simulator.run(): Trial 66
Environment.reset(): Trial set up with start = (7, 5), destination = (2, 2), deadline = 40
RoutePlanner.route_to(): destination = (2, 2)
Environment.act(): Primary agent has reached destination!
Simulator.run(): Trial 67
Environment.reset(): Trial set up with start = (7, 6), destination = (4, 4), deadline = 25
RoutePlanner.route_to(): destination = (4, 4)
Environment.act(): Primary agent has reached destination!
Simulator.run(): Trial 68
Environment.reset(): Trial set up with start = (1, 6), destination = (7, 1), deadline = 55
RoutePlanner.route_to(): destination = (7, 1)
Environment.act(): Primary agent has reached destination!
Simulator.run(): Trial 69
Environment.reset(): Trial set up with start = (2, 4), destination = (6, 1), deadline = 35
RoutePlanner.route_to(): destination = (6, 1)
Environment.act(): Primary agent has reached destination!
Simulator.run(): Trial 70
Environment.reset(): Trial set up with start = (6, 5), destination = (3, 6), deadline = 20
RoutePlanner.route_to(): destination = (3, 6)
Environment.act(): Primary agent has reached destination!
Simulator.run(): Trial 71
Environment.reset(): Trial set up with start = (7, 5), destination = (1, 5), deadline = 30
RoutePlanner.route_to(): destination = (1, 5)
Environment.act(): Primary agent has reached destination!
Simulator.run(): Trial 72
Environment.reset(): Trial set up with start = (8, 5), destination = (1, 5), deadline = 35
RoutePlanner.route_to(): destination = (1, 5)
Environment.act(): Primary agent has reached destination!
Simulator.run(): Trial 73
Environment.reset(): Trial set up with start = (7, 5), destination = (1, 2), deadline = 45
RoutePlanner.route_to(): destination = (1, 2)
Environment.act(): Primary agent has reached destination!
Simulator.run(): Trial 74
Environment.reset(): Trial set up with start = (6, 5), destination = (3, 1), deadline = 35
RoutePlanner.route_to(): destination = (3, 1)
Environment.act(): Primary agent has reached destination!
Simulator.run(): Trial 75
Environment.reset(): Trial set up with start = (6, 3), destination = (2, 4), deadline = 25
RoutePlanner.route_to(): destination = (2, 4)
Environment.act(): Primary agent has reached destination!
Simulator.run(): Trial 76
Environment.reset(): Trial set up with start = (5, 3), destination = (8, 4), deadline = 20
RoutePlanner.route_to(): destination = (8, 4)
Environment.act(): Primary agent has reached destination!
Simulator.run(): Trial 77
Environment.reset(): Trial set up with start = (1, 5), destination = (2, 1), deadline = 25
RoutePlanner.route_to(): destination = (2, 1)
Environment.act(): Primary agent has reached destination!
Simulator.run(): Trial 78
Environment.reset(): Trial set up with start = (4, 1), destination = (4, 6), deadline = 25
RoutePlanner.route_to(): destination = (4, 6)
Environment.act(): Primary agent has reached destination!
Simulator.run(): Trial 79
Environment.reset(): Trial set up with start = (5, 3), destination = (2, 5), deadline = 25
RoutePlanner.route_to(): destination = (2, 5)
Environment.act(): Primary agent has reached destination!
Simulator.run(): Trial 80
Environment.reset(): Trial set up with start = (2, 4), destination = (6, 2), deadline = 30
RoutePlanner.route_to(): destination = (6, 2)
Environment.act(): Primary agent has reached destination!
Simulator.run(): Trial 81
Environment.reset(): Trial set up with start = (3, 2), destination = (8, 2), deadline = 25
RoutePlanner.route_to(): destination = (8, 2)
Environment.act(): Primary agent has reached destination!
Simulator.run(): Trial 82
Environment.reset(): Trial set up with start = (4, 3), destination = (8, 5), deadline = 30
RoutePlanner.route_to(): destination = (8, 5)
Environment.act(): Primary agent has reached destination!
Simulator.run(): Trial 83
Environment.reset(): Trial set up with start = (3, 2), destination = (1, 6), deadline = 30
RoutePlanner.route_to(): destination = (1, 6)
Environment.act(): Primary agent has reached destination!
Simulator.run(): Trial 84
Environment.reset(): Trial set up with start = (1, 6), destination = (8, 4), deadline = 45
RoutePlanner.route_to(): destination = (8, 4)
Environment.act(): Primary agent has reached destination!
Simulator.run(): Trial 85
Environment.reset(): Trial set up with start = (1, 3), destination = (7, 3), deadline = 30
RoutePlanner.route_to(): destination = (7, 3)
Environment.act(): Primary agent has reached destination!
Simulator.run(): Trial 86
Environment.reset(): Trial set up with start = (1, 1), destination = (6, 3), deadline = 35
RoutePlanner.route_to(): destination = (6, 3)
Environment.act(): Primary agent has reached destination!
Simulator.run(): Trial 87
Environment.reset(): Trial set up with start = (2, 1), destination = (7, 1), deadline = 25
RoutePlanner.route_to(): destination = (7, 1)
Environment.act(): Primary agent has reached destination!
Simulator.run(): Trial 88
Environment.reset(): Trial set up with start = (8, 3), destination = (5, 6), deadline = 30
RoutePlanner.route_to(): destination = (5, 6)
Environment.act(): Primary agent has reached destination!
Simulator.run(): Trial 89
Environment.reset(): Trial set up with start = (2, 2), destination = (8, 3), deadline = 35
RoutePlanner.route_to(): destination = (8, 3)
Environment.act(): Primary agent has reached destination!
Simulator.run(): Trial 90
Environment.reset(): Trial set up with start = (8, 4), destination = (3, 2), deadline = 35
RoutePlanner.route_to(): destination = (3, 2)
Environment.act(): Primary agent has reached destination!
Simulator.run(): Trial 91
Environment.reset(): Trial set up with start = (7, 5), destination = (5, 2), deadline = 25
RoutePlanner.route_to(): destination = (5, 2)
Environment.act(): Primary agent has reached destination!
Simulator.run(): Trial 92
Environment.reset(): Trial set up with start = (1, 6), destination = (3, 2), deadline = 30
RoutePlanner.route_to(): destination = (3, 2)
Environment.act(): Primary agent has reached destination!
Simulator.run(): Trial 93
Environment.reset(): Trial set up with start = (2, 5), destination = (7, 1), deadline = 45
RoutePlanner.route_to(): destination = (7, 1)
Environment.act(): Primary agent has reached destination!
Simulator.run(): Trial 94
Environment.reset(): Trial set up with start = (2, 6), destination = (6, 3), deadline = 35
RoutePlanner.route_to(): destination = (6, 3)
Environment.act(): Primary agent has reached destination!
Simulator.run(): Trial 95
Environment.reset(): Trial set up with start = (2, 6), destination = (5, 4), deadline = 25
RoutePlanner.route_to(): destination = (5, 4)
Environment.act(): Primary agent has reached destination!
Simulator.run(): Trial 96
Environment.reset(): Trial set up with start = (8, 1), destination = (2, 4), deadline = 45
RoutePlanner.route_to(): destination = (2, 4)
Environment.act(): Primary agent has reached destination!
Simulator.run(): Trial 97
Environment.reset(): Trial set up with start = (5, 2), destination = (2, 1), deadline = 20
RoutePlanner.route_to(): destination = (2, 1)
Environment.act(): Primary agent has reached destination!
Simulator.run(): Trial 98
Environment.reset(): Trial set up with start = (4, 6), destination = (3, 2), deadline = 25
RoutePlanner.route_to(): destination = (3, 2)
Environment.act(): Primary agent has reached destination!
Simulator.run(): Trial 99
Environment.reset(): Trial set up with start = (7, 6), destination = (1, 2), deadline = 50
RoutePlanner.route_to(): destination = (1, 2)
Environment.act(): Primary agent has reached destination!
```