The greatest sailboat race of all time seen through statistical graphics
I don't know if you've been following the America's Cup. It's the oldest sailing competition and, by some accounts, the oldest international sporting event bar none. This year, this time honored contest has been thrust into the modern age with the adoption of foiling winged catamarans that skim the water of San Francisco Bay at 90 Km/h. Not only that, the competition has also entered the Big Data age with 30,000 data points per second generated by on-board sensors, not to mention the multiple video feeds, the enhanced reality visuals and more.
The prospect of having some of that data made available to the general public was mouth watering. It turns out that for one race what is shared is a paltry 14,000 records and all the columns corresponding to on board instruments contain only zeros and America's cup data engineers have confirmed the omission is necessitated by the rules. But what's there already tells the story of a race in quite some depth.
Here I focus on Race 10, a cliff hanger of a race, hailed by many as one of the greatest sailing races of all time. I wanted to build some high density graphics that showed the crucial events of the race. Some of the answers offered by the graphics would need to be confirmed by more rigorous statistical methods but we will stop short of that in this article. The first graphics is going to be based on the course followed by the boats and this is what happens just plotting their longitude and latitude, captured 5 times per second:
Since the race goes back and forth three times in between two gates, the overlap of the yachts' trajectories makes this first visualization hard to read, so I decided to mirror the Longitude every time the sailboats reach a mark (sailing speak for turn around point). Imagine the race course as a folded piece of paper and the visualization as its unfolded version, with the creases in the north-south direction at the marks. Being a race, speed matters, so I decided to use color to represent it. Since in sailing "fast" is only relative to the wind, I learned to use the ratio of boat speed to wind speed from America's Cup commentators, accomplished sailors Gary Jobson and Ken Read. In Race 10 this ranged between 0.5 on some not-so-good tacks (upwind turns), when the boat is briefly traveling against the wind, to 2.7. Symbols are used to identify the boats. A nice side effect of this is that the symbols show the position of the boats at regular time intervals, suggesting which one is ahead particularly near a cross. In sailing, speed is not everything: equally important is angle w.r.t the wind, since sailboats can't go straight upwind and can go straight downwind only paying a massive speed penalty, and the course is roughly aligned with the wind. The combination of speed and angle is called velocity made good, or VMG. As for raw speed, what a good VMG is depends on the intensity of the wind, so it makes sense to take the ratio of VMG to wind speed, which I call relative VMG or RVMG. In the following graphics, this is expressed as thickness of the line. To summarize, the next graphics shows the trajectory of the two boats with the twist that the race course is replicated three times and mirrored as needed to avoid overlaps; color is speed and thickness is RVMG; symbols identify the boats and their position at regular time intervals.
If you watched this fantastic race, you can recognize all of its decisive moments in this graphic. The race starts with a very fast reach, with team New Zealand (NZL) pushing team USA (USA) wide at the first mark rounding. USA has a difficult jybe (turn going downwind). The two boats go in sync down to mark 2, with USA following a closer angle for one tack, not sure why. At the mark USA goes for a complicated maneuver to obtain a split — a split is always preferred by the chaser, as these boats always leave a wake of disturbed air behind them, irrespective of wind, so great is their speed — but the maneuver costs them dearly in boat speed. In the upwind leg, the boats seem happy to crisscross paths as the leaders don't make a defensive tack over their opponents. The leadership changes hands 4 times. Peaks in speed at the crossings show the boat on port tack taking evasive maneuver to avoid collision, while the one on starboard tack, with the right of way, tries to make it hard on the other boat. Right before mark 3, the two boats part ways: USA ducks deeply behind NZL whereas NZL slows down heading upwind and goes into the mark rounding with a zigzag. USA speed seems to suffer after the last tack and they go into the downwind leg with a slight deficit, but with the split. At the first cross, a decision looms between ducking, losing ground or jibing, losing the split. As USA tactician Ainslie later explained, neither looks good, but they duck. NZL is clear ahead at their next crossing and, without major errors, the race is over. USA fails to keep the pressure up though with a poor last mark rounding.
Now I would like to transition into a different, more abstract visualization, but before doing that I need to show a version of the previous graphics with color representing time.
This isn't so interesting per se but you need to keep an eye on it to read the next graphics, where time is represented by the same color scale. The next graphics is focused on speed and direction w.r.t the wind, what sailors call point of sail. In polar coordinates, imagine the wind coming from above and the boat with its stern (back) in the center and its bow (front) pointing out. The labels are the traditional names for different points of sail: in irons (against the wind), close hauled (almost against the wind) and so forth. The distance from the center is boat speed relative to the wind. Color represents time and going back and forth to and from the previous graphic you can associate the different colors with various phases of the race. Each point represents a speed and direction reading, taken five times per second and each boat has a separate panel. As you can see, the points are not randomly scattered. Boats tend to stay on a course that gives them the best VMG most of the time. The biggest exceptions are the blue and purple clusters, which are the beginning and final stretches of the race, which are oriented at almost 90 degrees to the wind and as such VMG doesn't matter there, only speed. So the main six clusters are port and starboard tack, upwind and downwind, and the starting and final reaches. In between these we see connecting lines: roughly horizontally we have tacks (upper half) and jibes (lower half); vertically we have mark roundings. There's a few lines that don't fit any of the above: the acceleration from the start in red and a few "tactical" situations such as NZL zig-zagging before mark 3 (in blue-green) and USA ducking deep behind NZL (green) also before the same mark. Right click on the graphics to see it at full size.
By promoting speed from a color scale to a more perceptually precise spatial scale, we can gain new insights, like the remarkable speed difference between the two teams rounding mark 2 (in yellow) , how scattered the final run is for USA compared to the tight cluster of points for NZL (in purple) and how more consistent are jibe speeds for NZL. In favor of USA, we may see slightly faster speeds through the tacks, but a different graphics later doesn't confirm this.
The problem with this visualization is in the tight clusters of points, that is straight line travel. It's hard to see the density of points, since they overlap. So let's now drop the individual data points. In the next graph, the density of color red is proportional to the time spent sailing at a certain point of sail and speed. It's a more static view of the race with emphasis on the normal modes of sailing and less on the episodes and outliers.
We can see that, upwind, USA seems to travel a bit more close hauled whereas NZ is a little faster. Impossible to tell by eye who achieved the best compromise. On the downwind side, particularly on starboard, NZL seems to travel as if on tracks, with a very tight cluster of points, probably due to longer starboard tacks. Now for a final set of graphics, we focus on speed and point of sail separately. But since speed without direction does not a race win, let's switch to RVMG. And to avoid mixing apples and oranges, let's try to analyze straight line speed vs speed in the turns. I created a "turn test" based on boat heading moving averages over 10 seconds. The next graphics confirms that the test and intuition mostly agree on what a turn is.
With that available, we can now look at the frequency of different points of sail. A density plot in polar coordinates may look unfamiliar, but the meaning is pretty simple: the furthest a line is from the center, the longer a boat has spent traveling at that point of sail. The first plot is for boats going in a straight line.
What we see is that indeed USA sails a bit more close hauled both upwind and downwind. But does it pay off in terms of RVMG? In the next plot, negative RVMG is just RVMG downwind.
It seems that USA enjoys a slight advantage upwind, but the opposite is true downwind. Remarkable, as the beginning of the contest was characterized by NZL's upwind superiority. If only USA engineers had applied earlier the changes made halfway through the contest, USA would have had a much better shot at the trophy. Now let's take a look at the turns.
Well, it looks like the tacks are pretty even, contrary to the impression created by a previous graphics. Maybe just the worse NZL tack was worse, but overall they seem even with their opponent. But NZL seem to have a slight edge on the jibes, which makes sense because they were the first team to foil (fly over the water) and therefore to practice the foiling jibe, a new type of turn performed keeping both hulls out of the water.
I will readily admit this graphical analysis largely confirms what the experienced America's Cup TV commentators could infer just by watching the race, but they have a life of sailing on their resume. The advantage of this analysis is that we can repeat it race by race, objectively, and look at the differences. We can use the underlying models, such as density estimates, to quantify the differences between the boats and suggest where improvements are possible or could have the most impact. Of course the teams, which reportedly employ twice as many engineers as sailors, know all of this already and a lot more. But for the sailing-loving data geek, this is fun!
Materials and Methods
These article was composed in RStudio, using R markdown, a mix of the R programming language and the markdown markup language integrated with the package knitr. The pretty graphics are made possible by the ggplot2 package. The source code is available on github. The data is made available by the America's cup organization.