challenge1entry/tonyHirst-f1tata-dataChallenge1-entry.txt

The current F1 timing screens provide a wealth of information to race engineers and fans about the current state of a session, and historical elements of it, such as the number of pit stops completed by each driver, or best overall laptime and sector times. However, for new fans, whilst the timing screen may be just a sea of coloured numbers, a historical view over a session may help them make sense of it.

The timing feed itself is based on a message stream in which individual messages separately address distinct cells in the timing screen, changing the value and colour of the cell contents. The semantics of the columns are fixed, so users know what the contents of each column refer to. (The “meaning” of a column may change, for example, when a car is in the pit as opposed to on track.) As well as the message contents, additional information can be gleaned from the time stamp of the message.

Timing data elements update individual cells, with cells retaining whatever value was previously set in them until they are next updated. If you think of the timing screen as a spreadsheet, you can create new screens from the same data. Using cell references into the timing screen sheet, we can populate the values of custom formatted cells in a new sheet. Using the idea of progressive enhancement, we can add graphical elements on a per cell basis, re-presenting the contents of a single cell in a graphical form. For example, we might display a sector time as a bar chart (though we would need to fix in advance the vertical scale), with a fill colour corresponding to the colour of the unenhanced cell. In this example, a column of bars representing times in a particular sector would not allow for a meaningful comparison, but we could transform the placement of cells so that a comparison is possible. For example, take the sector 1 times column and turn it into a row, placing a row immediately beneath it containing the values of column 2 (racing number) or column 3 (driver name). The contents of other cells with fixed semantics might also be used to modify the visual presentation of a cell. For example, with sector times in practice and qualifying sessions, we could toggle the bar display with a display that shows the difference between the sector time and the best sector time, which is available in cell 21 of row.

Taking the interval column in the race screen, we can transform that to a row with cell widths proportional to the interval. If we align driver names or racing numbers with the left hand edge of those cells, we can generate a visual representation of how bunched up or spaced out the cars are in time relative to race position. Bunched labels suggests that cars are competing. (A similar sort of display could be applied to the best laptimes in practice or qualifying sessions, for example, by calculating laptime differences between consecutive rows.)

If we add some additional state to the display, we can also make comparisons between consecutive cell updates. For example, if we know that the driver number hasn’t changed since a laptime was last updated in the same row, we can calculate the difference in consecutive laptimes for that driver, again, displaying them graphically.

Things become more problematic as drivers change race position. It’s easy enough for us to look at a row on the timing screen starting in column 2, driver’s racing number and read off their latest laptime and sector time – our perception does the synthesis. However, if a computer programme tries to read the row, it could make a mistake if it reads the timing screen halfway through the row updating – because the row updates one cell at a time, and not necessarily in the same cell order each time. The timestamps on cell updates corresponding to a single row may not be identical either and in theory could clash with updates corresponding to other rows.

Some live timing applications publish data in a more structured format, such as at the level of a row, using semantically meaningful XML or JSON data structures and that would make it much easier to generate logs of events at the row level. However, it is possible to parse the timing data in order to produce datasets that correspond to many of the datasets published via the FIA F1 Media Centre timing sheets as well as new data products. (I have started to produce proof of concept code to generate these data structures [https://github.com/psychemedia/tata-f1] but have not had time to complete it or use the data to produce example visualisations from the sample timing data.) So for example, we can generate a list of laptimes by driver, or a list of purple laptimes and when they were recorded, or list of pit events. We can also keep track of the weather.

Once we start to get information in a structured form, with row based events recorded in an appropriate database or data structure, we can generate a range of data views over the whole course of the race. Note, using “screen memory” we do not necessarily need to maintain a log of events, we can use the row based events to update and extend a currently existing graphic by placing additional marks on it. Events we might capture are:

-	new sector time set by a driver, with approximate timestamp (cell updates for a row may be timestamped across a period of millseconds);
-	new laptime time by a driver, with approximate timestamp;
-	record of a pit event.

Supporting documents 1-3 provide indicative data views that can be generated from data derived solely from the timing data screens with explanations of their use to fans:

-	tonyHirst-f1tata-dataChallenge1-document1: views over practice sessions;
-	tonyHirst-f1tata-dataChallenge1-document2: views over qualifying sessions;
-	tonyHirst-f1tata-dataChallenge1-document3: views over the race session.


tony.hirst@gmail.com