# SailGP Data Analyst Challenge

The aim is to test you python abilities. The challenge is to analyze the data provided and answer the questions below. You can use any library you want to help you with the analysis. The data is from the SailGP event in Auckland 2025. The data is in the 'DATA' folder.

There are various sources available.

The Boat Logs are in the 'Boat_Logs' folder. The data is in csv format and the columns are described in the 'Boat_Logs/Boat_Logs_Columns.csv' file.
The 'Course_Marks_2025-01-19.csv' file contains the mark positions and wind reading on the course for the whole day.

The Race_XML folder contains the xml files for each race that contains information on where the boundaries of the course are, the theoretical position of the marks and the target racecourse axis.

The 2025-01-19_man_summary.csv file contains the metrics from the manoeuvre summary for the day.
The 2025-01-19_straight_lines.csv file contains the metrics from the straight line summary for the day.

Both are derived from the boat logs.

The 2502 m8_APW_HSB2_HSRW.kph.csv file contains the polar data for the boats in that config.

## Requierements
- Chose at least 3 questions from the list below to answer.
- Python 3.8 or higher
- Notebook should be able to run without any errors from start to finish.
- Specify the libraries (imports) used in the notebook.
- Any comments to make the notebook self-explanatory and easy to follow would be appreciated.
- If you can't get to the end of a question, we would appreciate the code you have written so far and explain what you were trying to do.

## Further information:
- We usually use bokeh for visualizations. So any showcase of bokeh would be appreciated.
-

## Submitting the results.
It would be great if you could provide a jupyter notebook with the code and the results of the analysis. You can submit the results by sharing a link to a git repository.



Free section to initialize the notebook with the necessary imports and functions that will be used in the notebook.

In [None]:
### Imports and re-used functions

import pandas as pd
import numpy as np
from scipy.stats import circmean
import matplotlib.pyplot as plt

# Read a boat's data
csv_path = r'C:\Users\matsa\OneDrive\Υπολογιστής\SailGP\SGP Data challenge VS\SGP_Data_Challenge\Data\Boat_logs\data_AUS.csv'

data = pd.read_csv(csv_path)
data["DATETIME"] = pd.to_datetime(data["DATETIME"])
data.set_index("DATETIME", inplace=True)
# Define a function for rolling circular mean
def rolling_circmean(series, window):
    return series.rolling(window=window).apply(lambda x: circmean(x, high=360, low=0), raw=True)

## Question 1: Write a Python function that can take a compass direction (ie. TWD or Heading) and calculate an accurate mean value across a downsampled frequency. Eg. If TWD is at 1Hz, give me a 10s average.

We had to take care of the circular data from 0 to 360 degrees on this point. Code in **2025 SGP (1) downsampling and 10s averaging TWD.ipynb**

## Question 2: Given a course XML and a timeseries of boat Lat/Lon values, calculate a VMC column for the same timeseries.


In [None]:
vmc=bsp*cos(angle)

## Question 3: Verify and comment on the boats calibration. If possible propose a post-calibrated set of wind numbers and a potential calibration table.


**AUS calibration**
*Coding for this question in **2025 SGP (3) calibration - tackgybes.ipynb**, **2025 SGP (3) calibration-straightlines cleaned.ipynb**, **2025 SGP (3) calibration - tackgybes from man_summary.ipynb***

In order to calibrate for TWA we need to detect differences between TWA before and after valid manouevers. As for TWS and TWD, it’s better to observe the mark roundings upwind-downwind to detect upwash.
Coding for detection of valid tacks-gybes  and manouevers as in **2025 SGP (3) calibration - tackgybes.ipynb** 
When using the boat_log file and the SGP variables
TWS_SGP and TWD_SGP seem to be very well calibrated. The valid mark roundings from upwind to downwind and vice versa, showed delta_TWS=-0.1 and delta_TWD=-0.6, non-significant differences.
After validating and selecting the tacks and gybes, the TWA seems well calibrated as well.
Upwind: deltaTWA=1.6
Downwind: deltaTWA=0.8
Ideally, we could split by TWS Bins and look back at the numbers again, but we do not have enough data. Wind ranges from around 32 to 41 km/h and only 11 manouevers were found valid for calibration.
If a calibration table should be suggested based on the log file: 
	          TWS	      TWA Offset (to add to S)
Upwind	    (34, 38]	   0.8
	        (38, 42]	   0.8
Downwind	(34, 38]	   0.3
	        (38, 42]	   0.6


The offsets appear on the straightlines.csv, I am guessing they come from another dataset. Aiming to find where this comes from, the man_summary.csv was checked and tried to find the valid manouevers. Only flying ones were selected and by observation differences in TWA and TWS differences before-after less than 5 degrees and kmh respectively (subject to change depending on TWS and more).
Offset results depend a lot on data cleaning and data quality, I can’t identify how the entry_ and exit_ values are obtained or which datasets are used to obtain the offsets. With what I have for now, according to the manouevers provided in the man_summary, this is a suggested offset table for TWA.

				 TWS		TWA Offset (to add to S)
Upwind			(31, 35]		3
				(35, 39]		1.1
				(39, 43]		0.75
Downwind		(31, 35]		2.25
				(35, 39]		1.3
				(39, 43]		1.35


34-38km/h wind speed: 	The mean difference between HDG and TWA Upwind was found to be 2 degrees, and -3.2 Downwind 
38-42km/h wind speed: 	smaller differences of 0.1 and 1 degree for Upwind and Downwind respectively.

*When plotting for HDG and TWA quite a difference between them was detected meaning that there might be a delay issue in calculations of TWA.*

We miss information about the current (heard that it was significant in Auckland), so can not say much about cog-hdg difference, the current and leeway effect seem to be higher when on Downwind.

When looking at the upwind and downwind straightlines, for AUS,
the BSP/SOG% is >100%, speedo might overread downwind ~1km/h.
Upwind BSP/SOG% seem different from P to S. Speedo underreads around 3 kmh on port tack, while ok on S tack - presence of current?


## Question 4: Given a timeseries of Lat/Lon positions and a course XML, in a Python notebook, calculate a Distance to Leader metric for each boat.

## Question 5: Given a course XML, along with a wind speed and direction and a polar, calculate the minimum number of tacks or gybes for each leg of the course and each gate mark on the leg.

## Question 6: Calculate a “tacked” set of variables depending on the tack of the boat, so that sailors don’t need to think about what tack they’re on when looking at measurements. And show the results in a visualisation.


Code in **2025 SGP (6) tacked.ipynb**

**A dataframe named tacked is created. This includes the same variables as the main data_{BOAT}.csv but the variables ANGLES(CA1-6), TWIST and ROT, TWA, AWA, LEEWAY, HEEL are transformed in order not to depend on the tack**

ANGLES and ROT (-) means towards windward side

Twist(+) means the top is more open than the bottom 

heel (-) --> heeling to windward -  negative heel

Leeway (+) --> drifting

**10s downsampling is used**

## Question 7: Given a set of tacks (in CSV), and train a model to explain the key features of these tacks when optimizing for vmg. Show appropriate visualisations to explain your conclusions.

## Question 8: Give insights on the racing on what made a team win or underperform in the race.

**Code in: 2025 SGP (8) performance.ipynb**

*USA Boat: Initially found with an inverted LENGTH_RH_S_mm sign, which was corrected to match the other boats.*

Starts 
AUS: Consistently the best at starts, achieving the top speeds, acceleration averages, and shortest distances to M1.
NZL: Struggled significantly with a mean speed deficit of over 30 km/h compared to the top performers.
DEN & USA: In races 6 and 7, achieved higher speeds and acceleration than AUS but started further behind the line.
Manouevers
Race 5: Most teams made 7 turns; AUS and NZL only made 5.
Race 6: ESP completed 5 more maneuvers than DEN but still secured 3rd place.
Totals: AUS executed the fewest maneuvers with 20, while ESP topped the chart with 31.

Races
R5
ESP: much faster while reaching, fly higher than the rest
GBR: Best downwind speed with a positive heel.
AUS: Dominated upwind with more pitch (~0.5°) than competitors might by a reason for being faster 

R6
DEN faster than most, more pitch angle  winning points
GER sailed faster while reaching 2 degrees pitch angle more than the others
BRA fly higher, positive heel  not quick enough
ITA struggles to maintain the negative heel on reaching  lower BSP

R7
Same as R6 for the DEN with more pitch than all in reaching and upwind, 
AUS keeps flying lower 
Big differences in RH reaching between boats, 600mm for ITA - AUS

FINAL 
AUS was the fastest in all legs (~2km/h), pitching more Up and Downwind, maintaining neutral heel and relatively low RH.
 GBR faster than ESP in reaching and upwind

DAY averages
Best BSP
Downwind AUS and GBR 
Upwind AUS, DEN, ESP
Reaching GER
PITCH: BRA much less especially downwind, also slower. AUS more  faster.
HEEL
ITA unstable while reaching
winning boats around neutral heel
RH: ESP and GBR exhibited stable flight heights (900-1050 mm & 850-1050 mm respectively)

FLIGHT TIME %
R5: GBR scored best with 98.5%
R6: BRA had the best flight time of 95.8%
R7: GBR and AUS managed the 100% of flight time 
USA appears with 100% of flight time in all races – doubt on that.
In general, the GBR had the most flight time of the day (97.8%), while the DEN the least (91.4%)
FINAL: flight times where very close GBR: 98.53% - ESP:98.51% - AUS:98.46

WING
R5
AUS less twist than the rest reaching and downwind, GBR same as ITA
Top winning teams - same tactics of the way they use the wing (parallel lines) with more open or close angles
BRA less rot in all legs, matching on Down and Up the NZL

R6 
BRA didn’t change much the twist between legs – much less twist than the rest in reaching and upwind
R7 
BRA seems to have the CA1 more open, while similar twist--> underperforming.
NZL that came 2nd used less twist and less rot. 
FINAL
AUS and GBR have the same strategy in Upwind in terms ROT (15degrees) and TWIST(~25deg). Downwind AUS used ~2,5deg less rotation
ESP more wing twist and less rotation --> underperforming 
Flatter wings might perform better.

Teams like AUS excelled due to their consistently higher speed and pitch, along with their ability to maintain stable RH and heel. AUS stood out as the day’s best performer with consistently high speeds, efficient maneuvers, and flatter wing settings, maintaining stable RH and heel. GBR excelled in maintaining flight time and competitive speeds across legs.
Conversely, teams like NZL and ITA struggled due to a lack of speed, poor pitch/heel management, and unstability in RH. 
Pitch seems important together with the team’s ability to manage the challenging conditions.
