# Tracking the gaze

Many eye trackers are based on video-oculography: they use infrared lights and video to record the user's eye movements. Other solutions for eye tracking exist, but here we will focus on the eye trackers that record video.

In a typical eye tracking experiment, we have a user look at a screen where something interesting is displayed. While the user looks at the screen, we monitor where the user is looking at.

![ET_1_setup-2.svg](attachment:ET_1_setup-2.svg)

If we knew where the dimensions of the screen, center of the eye ball and the center of the pupil were, we would be able to calculate where someone is looking at. Simply draw a line that starts from the eye ball center and goes through the pupil center, then check where the line intersects a plane defined by the screen:

![ET_2_step_coordinates.svg](attachment:ET_2_step_coordinates.svg)

Unfortunately, we usually do have this information directly. What we instead have is something like the eye tracker image shown here below the screen:

![ET_3_et_view-3.svg](attachment:ET_3_et_view-3.svg)

In the Figure we have an eye tracker (the black box) placed below the display, recording the user's eye. The image below the eye tracker shows the user's eye from the eye-tracker point of view.

How can we extract the position on the screen where the user is looking at from this information?

From now on, let's refer to 'the location on the screen where the user is looking at' as the *gaze coordinates*

We could use a couple of different approaches:

1. Attempt to deduce the eye ball and pupil coordinates from the video image of the eye, then calculate the gaze coordinates as in the first Figure.
2. Attempt to deduce the gaze coordinates directly from the coordinates of the pupil in the video image.

In the first approach, we are essentially creating a 3D model of the screen and user's eye ball. This is complicated, so let's focus now on the second approach. For this we need to

1. Detect the center of the pupil
2. Map the pupil center to the gaze coordinates

The pupil detection is handled with some type of computer vision approach. In other words, we create and algorithm that applies various computer vision tricks to isolate the pupil from the rest of the video frame, and then determine the center coordinates.

![ET_4_pupil_detection-3.svg](attachment:ET_4_pupil_detection-3.svg)

After we have the pupil center coordinates $p_x$, $p_y$, we need to insert them into some type of function that gives us an approximation of the gaze coordinates:

$$
(g_x, g_y) \approx f(p_x, p_y)
$$

Here is an example function for mapping pupil coordinates to the gaze coordinates:

$$
g_x = A_x p_x + B_x p_y + C_x
$$
$$
g_y = A_y p_y + B_y p_y + C_y
$$

This is a simple linear function that will most likely not give good enough results; the actual commercial eye trackers use more sophisticated models. The parameters A, B and C are found through calibration.

### Calibration

In eye tracker calibration, the user is asked to look at pre-determined points on the screen (typically 1 -- 9 points depending on the eye tracker). The next Figure shows 9 calibration points on the screen. The user is first asked, e.g., to look at the calibration point in the upper left corner. The pupil location and the location of this calibration point is stored. Then we move to the next calibration point and repeat, and so on.

The eye tracker records the pupil location when the user is looking at each point. Then, the data is fit into an equation such as the linear one shown above so that the distance between the gaze coordinates $(g_x, g_y)$ (during calibration, these were the calibration points) and the calibration points is minimized.

After finding the parameters A, B, C... the pupil coordinates ($p_x$, $p_y$) should be mapped to the screen coordinates, giving the coordinates for the gaze ($g_x$, $g_y$).

![ET_6_calibration.svg](attachment:ET_6_calibration.svg)

Unfortunately, simply detecting the pupil movements and mapping them to gaze coordiantes is not enough to reliably track the gaze.

### Handling head movements

Typically, we want the user to be able to move their head naturally while they are completing an experimental task, such as navigating a web page. But from the point of view of the eye tracker, the head movements will also cause the pupil center to move, even if the use is still looking at the same location in the screen. To address this, we need to somehow separate the head movements from the movements of the eye.

This can be accomplished if have some reference marker that moves when the head moves, but not when the gaze moves. One such reference marker is the glint, or the reflection of the external lights from the surface the of the eye (shown in blue below). These reflections are stationary with respect to the head movements. Therefore, we can compare the movements of the pupil with the movements of the reflected area/glint. If the glint and the pupil center both move in the same direction, it means that the movements were due to head movements. But if the pupil moved more than the glint, then the gaze coordinates must have changed.

![ET_5_glint-3.svg](attachment:ET_5_glint-3.svg)

### Other considerations

Other issues that may affect eye tracker performance

* Eye shape and size
* Illumination, reflections
* Eye glasses or contact lenses
* Device slippage for mobile eye trackers (worn by the user)

Most of these are related to pupil detection. If the eye lid covers most of the eye, or the user has to wear eye glasses that cause reflections, it will be become harder for the computer vision algorithms to detect the pupil from the eye tracker video.