# Case Study: MTA Arrival Clock

In this case study, we are going to look at the common MTA arrival clock in subway stations across New York and see if we can redesign it based on human perception principles. Here is an example of the MTA arrival clock.

<div><img src="https://user-images.githubusercontent.com/3606672/246465387-b59fbe67-ee52-4e1c-8599-77744120cfc8.svg" width=400/></div><pr>

This sign contains two rows, and each row provides four types of information:
1. The number on the left indicates the order of arrival.
2. The subway line symbol.
3. The train's destination.
4. The amount of time before arrival in the current station.

Due to limited space, it can only display information for two upcoming trains at the same time. Thus, when more than two trains are approaching, it periodically switches between the corresponding entries.

Another problem due to the limited space is that if a passenger is standing too far from the sign, the text is too small to read. Since subway platforms can be very long, this problem is actually quite common. We can simulate this experience by scaling the above image really small:

<div><img src="https://user-images.githubusercontent.com/3606672/246465387-b59fbe67-ee52-4e1c-8599-77744120cfc8.svg" width=100/></div><pr>

Depending on your display's resolution, text on this sign can be nearly illegible. The digit 3 and 8 can look very similar from afar, so are the digit 5 and 6. The only thing we can still clearly recognize is the color of subway line's symbols.

In this case study, we will try to address the following questions:
* Can we increase the information density to show more than 2 upcoming trains' information at the same time?
* Can we visualize the time to arrival so it is more legible from afar?

## Observations and assumptions

An important observation is that the only thing we can see from afar is the color of the subway line's symbol. We should leverage this in our visualization.

In terms of space allocation, the train's destination occupies the most amount of space in the current design. However, I would like to argue that destination is not the most important information for passengers. Oftentimes, the reason we look at this sign is to learn about the arrival time. Thus, we should prioritize visualizing arrival time over the destination.

## Our design

Instead of showing the time to arrival times as text, we are going to visualize it using the position channel. Recall that position as a visual channel is more accurate than other channels such as length, angle, color, etc. The idea is to visualize arrival time as a 1D bubble chart, and add other information as annotations. Here is the design:

<div><img src="https://user-images.githubusercontent.com/3606672/246465520-5b7cad68-4722-4192-ad69-d0916c8f9cea.svg" width=400/></div>

The design contains a simple axis that represents time. We use gray dots at 1 min interval to represent ticks on this axis. A simple arrow on the left indicates the direction of decreasing in time. Each train is represented by its corresponding symbol on the axis. The size of the symbol is same as in the previous design for readability purposes. Its location indicates the time to arrival. The time to arrival and destination are also added as text annotations.

The advantage of this design is that we can show information of more than two trains at the same time without switching. The arrival order is natually implied in the visualization. Lastly, when viewing from afar as shown below, one can still get an estimate of time to arrival based on the position of the symbols.

<div><img src="https://user-images.githubusercontent.com/3606672/246465520-5b7cad68-4722-4192-ad69-d0916c8f9cea.svg" width=100/></div>

## The data
Let's first put the data into data frames.

In [1]:
import altair as alt
import pandas as pd
import numpy as np

df = pd.DataFrame({
    "line": ["F", "F", "E", "M"],
    "arrival_time": [3, 8, 6, 0], 
    "destination": ["Jamaica-179 St", "Kings Hwy", "Jamaica-179 St", "Forest Hills-71 Av"]})
line_color = pd.DataFrame({
    "line": ["E", "F", "M"],
    "color": ["#0039a6", "#ff6319", "#ff6319"]
})

In [2]:
df

Unnamed: 0,line,arrival_time,destination
0,F,3,Jamaica-179 St
1,F,8,Kings Hwy
2,E,6,Jamaica-179 St
3,M,0,Forest Hills-71 Av


In [3]:
line_color

Unnamed: 0,line,color
0,E,#0039a6
1,F,#ff6319
2,M,#ff6319


The `df` data frame contains the line symbol, arrival time and destination information. We did not store the order of arrival as it can be inferred from the arrival time. The `line_color` data frame provides a simple lookup table of the color values for each line. Note that we hard coded the data for simplicity. In practice, arrival time information can be retrieved from the [realtime data feed](https://new.mta.info/developers).

## Background layer

Next, let's create a background layer.

In [4]:
bg_df = pd.DataFrame({
    "x": np.arange(11)
})
x_scale = alt.Scale(domain=[-1, 11])
background = alt.Chart(bg_df).mark_circle(size=100, color="gray").encode(
    x=alt.X("x:Q", axis=None, scale=x_scale)
).properties(width=866, height=200)
background

For the background layer, we just display 11 gray dots to represent ticks from 0 min to 10 min. The background layer also specifies the x-axis scale and the dimension of the plot.  In other layers, we need to make sure to use a consistent x-axis scale, and we do not need to specify dimensions again.

## Visualizing arrival time

Next, let us visualize the arrival times.

In [5]:
arrival_time = alt.Chart(df).mark_circle(size=5000, opacity=1).encode(
    x=alt.X("arrival_time:Q", axis=None, scale=x_scale),
    color=alt.Color("color:N", scale=None)
).transform_lookup(lookup="line", from_=alt.LookupData(line_color, "line", ["color"]))
background + arrival_time

The arrival time can be simply represented as a dot plot. In this case, we explicitly set the size of the circles and look up the corresponding color using the `line_color` data frame.

## Visualizing line label

Next, we visualize line labels using a text layer.

In [6]:
line_label = alt.Chart(df).mark_text(color="white", fontSize=40, font="Helvetica", fontStyle="bold").encode(
    x=alt.X("arrival_time:Q", axis=None, scale=x_scale),
    text="line"
)
background + arrival_time + line_label

We explicitly set the font, font size and font style to match the ones MTA uses.

## Visualizing direction

The visualization so far contains all the main components. However, it is unclear which direction of the axis represents shorter time to arrival and which direction represents longer time to arrival. To disambiguate this, let us add a symbol to indicate the direction of arrival.

In [7]:
dir_label = alt.Chart(df).mark_text(color="gray", fontSize=50, font="Helvetica", fontStyle="bold")\
    .encode(x=alt.value(20), y=alt.value(100), text=alt.value("❮"))
background + dir_label + arrival_time + line_label

To show the direction, we simply placed the ASCII character "❮" in a particular location with matching color and size. This is again done using text marks. At this point, the core of the visualization is complete. We are able to see that the M train is arriving as it is on the left most dot (0 min arrival time). The F train will arrive next after 3 min. This is nice. However, it assumes the viewer already knows how to read the chart. We can improve the readability of the chart by adding in additional annotations.

## Annotations

The first annotation is the time to arrive. We add it as text below the subway line symbol so people who see the chart for the first time can still figure out the exact time to arrive. This is done again using simple text marks.

In [8]:
time_label = alt.Chart(df).mark_text(color="black", fontSize=30, font="Helvetica").encode(
    x=alt.X("arrival_time:Q", axis=None, scale=x_scale),
    y=alt.value(160),
    text=alt.condition(alt.datum.arrival_time == 0, alt.value("arriving"), "time:N"))\
.transform_calculate(time="datum.arrival_time + ' min'")
background + dir_label + arrival_time + line_label + time_label

Next, we add the destination as another annotation.  We use smaller font sizes because destination information is less important than the time to arrive based on our assumption. As a consequence, destination is only legible when viewing from close distance.

In [9]:
destination_label = alt.Chart(df)\
.mark_text(color="black", fontSize=16, font="Helvetica").encode(
    x=alt.X("arrival_time:Q", axis=None, scale=x_scale),
    y=alt.value(40),
    text="destination:N"
)
background + dir_label + arrival_time + line_label + time_label + destination_label

This completes our redesign of the MTA arrival clock.

## Evaluation

Let us compare our design with the original design side by side.

<div>Original: <img src="https://user-images.githubusercontent.com/3606672/246465387-b59fbe67-ee52-4e1c-8599-77744120cfc8.svg" width=400/></div>
<div>Ours: <img src="https://user-images.githubusercontent.com/3606672/246465520-5b7cad68-4722-4192-ad69-d0916c8f9cea.svg" width=400/></div>

The first thing to note is that our design did not drop any information that is present in the original design. All four types of information can still be extract from our design. In particular, the order of arrival is now shown visually instead of by text.

In addition, our design is able to display more than two trains' information at the same time. This is an advantage of encoding arrival time as position. We also change the relative size of different types of information. Destination now occupies less space based on our assumption that it is less important.

Lastly, when viewing from afar, we are still able to see the line symbols and its relative position dispite all text are illegible. This is enough to convey the rough arrival time from a distant.

<div><img src="https://user-images.githubusercontent.com/3606672/246465387-b59fbe67-ee52-4e1c-8599-77744120cfc8.svg" width=100/></div>
<div><img src="https://user-images.githubusercontent.com/3606672/246465520-5b7cad68-4722-4192-ad69-d0916c8f9cea.svg" width=100/></div>

Of course, our design is not perfect either. For example, we currently do not display any information for trains arriving in more than 10 min. This information may be important for passenger to decide whether to keep waiting or find alternative way of transportation.

Another, perhaps more severe, drawback is that our design does not handle the case when multilple trains having the same arrive time. I.e. multiple subway line symbols have the same position and they may overlap. This situation is quite common and requires addition care in our design. We will it as the assignment for this module.

## Conclusion

In this case study, we have create a redesign of the MTA arrival clock. We identified several drawbacks of the original design and leveraged our knownledge in human perception to address those drawbacks in the redesign.