# Week 5, Task 1
This notebook covers the assignment for week 5. It uses a reinforcement learning model that needs to learn to press a sequence of buttons (in the correct order), while choosing the correct speed-accuracy tradeoff for each button. 

In [None]:
import who_rl

# We will use these libraries later in the notebook, but let's import everything here.
import copy
import numpy as np

The file contains a function `simulate_who_task`, which takes a `ui` object as its parameter. That UI must have three elements labelled "e1", "e2", and "e3". The transition and reward functions of the RL agent are created so that the agent must learn a pointing policy, where it gets a positive reward when pressing "e3", assuming it has first pressed "e2" and "e1" before that. So, it must discover the sequence of pressing elements in the correct order, as well as the correct speed-accuracy tradeoff for doing that. The agent gets a positive reward of `+1` at pressing "e3" (correctly), and otherwise `0`. In addition, a number is substracted from this reward, equal to the movement time if the current pointing movement. The optimal policy is therefore one that minimises movement time, while still quickly hitting the targets in the correct order.

In [None]:
import ui
# Here is the example ui, note the elements named "e1" 2 and 3.
ui.visualise_UI(ui.big_ui)


In [None]:
# Here is the model accomplishing the task. It takes some time to train the models via trial and error.
path, mt = who_rl.simulate_who_task(ui.big_ui)
print("The path taken by the model is", repr(path))
print("Movement times and speed-accuracy tradeoffs associated with this path are", repr(mt))

When the path contains an element name, the pointing action is a hit. When it contains a coordinate, the pointing was a miss. The second returned array contains the movement time, in seconds, passed since the start of the task. The total movement time is the last value. Associated with each movement time is the selected speed-accuracy tradeoff (MA = Maximum accuracy; A = Accuracy; B = Balance; S = Speed; MS = Maximum speed).

In [None]:
# Visualise the following:
path, mt = who_rl.simulate_who_task(ui.big_ui)
ui.visualise_UI(ui.big_ui, path = path, annotate = mt)
print("Task ended with mt =", mt[-1][0])

In [None]:
# Or just use the handy function for it.
who_rl.visualise_who_task(ui.big_ui)

In [None]:
# Again, we might want to have an average. This will take some time, so we'll report the progress
who_rl.visualise_who_task(ui.big_ui) # first get an example visualisation
results = []
for i in range(10):
    print("Run", i+1, "out of 10")
    path, mt = who_rl.simulate_who_task(ui.big_ui)
    results.append(mt[-1][0])
print("Average mt =", np.mean(results))

In [None]:
# Can we adapt the optimal pointing policy to the structure of the task?
new_ui = copy.deepcopy(ui.big_ui)
new_ui.modify_element("e2", "x_size", 400)
new_ui.modify_element("e2", "y_size", 400)
# Need to remove some elements to make space
del new_ui.elements["c2"]
who_rl.visualise_who_task(new_ui)

In [None]:
# Here we make e3 small. Maybe not small enough yet?
import numpy as np
new_ui = copy.deepcopy(ui.big_ui)
new_ui.modify_element("e2", "x_size", 400)
new_ui.modify_element("e2", "y_size", 400)
new_ui.modify_element("e3", "x_size", 10)
new_ui.modify_element("e3", "y_size", 10)
# Need to remove some elements to make space
del new_ui.elements["c2"]
who_rl.visualise_who_task(new_ui)
results = []
for i in range(10):
    print("Run", i+1, "out of 10")
    path, mt = who_rl.simulate_who_task(new_ui)
    results.append(mt[-1][0])
print("Average mt =", np.mean(results))

We can see from the above that the model does not really mind approaching the target with maximum speed (this is called a ballistic movement), and then "homing in" with a precise pointing movement. What if we, however, penalise for missing the target, meaning that the task requires one singular movement from the start to the target?


In [None]:
# Note that we are increasing the training time here to make sure that the model learns.
who_rl.visualise_who_task(new_ui, miss_penalty = -1, episodes = 500000)

In [None]:
# Here is how to change the pointing ability of the simulated user. This is a very able user.
who_rl.visualise_who_task(ui.big_ui, who_alpha = 0.05, episodes = 500000)

In [None]:
# Add a new element, e4. Use c4 as a template, and then remove that element.
new_ui = copy.deepcopy(ui.big_ui)
new_element = copy.deepcopy(new_ui.elements["c3"])
new_element.name = "e4"
new_ui.elements["e4"] = new_element
del new_ui.elements["c3"]
who_rl.visualise_who_task(new_ui)

In [None]:
# Here is the same task, but now hitting e3 is more rewarding.
new_ui = copy.deepcopy(ui.big_ui)
new_element = copy.deepcopy(new_ui.elements["c3"])
new_element.name = "e4"
new_ui.elements["e4"] = new_element
del new_ui.elements["c3"]
who_rl.visualise_who_task(new_ui, e3_reward = 5)