# Spark Anaysis of Total Performance Data: GPS and Sectional Data

Analyzing the TPD GPS data alongside Equibase (EQB) data can provide significant predictive insights into horse performance. By focusing on the granular details of a horse’s movement during a race, you can derive valuable metrics that complement traditional EQB ratings and help identify under- or over-rated horses. 

## Roadmap for maximizing the predictive capabilities of the TPD GPS data:

### Key Ideas and Strategies

1. Derive Advanced Pace Metrics

Understanding how a horse’s speed changes over the course of a race can reveal its racing style and potential strengths or weaknesses:

	•	Early Pace: Average speed and acceleration during the first segment (e.g., first 20% of the race).
	•	Mid-Race Pace: Average speed and deceleration during the middle segments.
	•	Late Pace: Average speed and deceleration in the final segment.
	•	Sustained Speed: Identify segments where the horse maintains a steady speed.
	•	Peak Speed Timing: The point in the race where the horse reaches its peak speed.

2. Fatigue Factor

Calculate how much the horse slows down as the race progresses:

	•	Use metrics like:
	•	Percentage drop from peak speed to finish speed.
	•	Maximum acceleration vs. deceleration ratios.
	•	Change in stride frequency as the race progresses.

3. Sectional Efficiency

Quantify how efficiently the horse runs its sections:

    •	Compare actual times vs. expected times for each section based on the route characteristics.
	•	Efficiency Ratio:
    •	A high ratio might indicate a horse ran extra distance due to poor cornering or positioning.

4. Overlay TPD with EQB Ratings

Use EQB’s traditional metrics (e.g., speed ratings, form) to cross-reference with TPD data:

	•	Identify horses that consistently outperform EQB predictions.
	•	Investigate horses with high EQB ratings but poor TPD-based performance (e.g., poor fatigue factors or inefficient sectional running).

5. Route Characteristics

If the routes table contains track-specific details (e.g., turn sharpness, surface type, gradient):

	•	Incorporate these into the analysis.
	•	Evaluate how specific horses handle different track conditions (e.g., wide turns, long stretches).
	•	Identify patterns like “performs better on flatter tracks” or “struggles on uphill finishes.”

6. Horse vs. Peer Comparisons

Evaluate how each horse performs relative to its competition in the same race:

	•	Compare sectional times and speeds with other horses in the race.
	•	Rank horses based on performance within each race segment.

7. Acceleration Profiles

	•	Plot acceleration over time to identify patterns (e.g., burst speed vs. steady acceleration).
    •	Highlight horses with exceptional closing speed (valuable in longer races) or fast starts (important in short sprints).

9. Cluster Analysis of Racing Styles

Use clustering techniques to group horses by similar racing profiles:

	•	Inputs: Early pace, mid-pace, late pace, fatigue factor, sectional efficiency.
	•	Output: Clusters representing different racing styles (e.g., “early speed burners,” “closers,” “steady sustainers”).

9. Historical Analysis

Identify trends over a horse’s career:

	•	Does the horse improve or decline over time?
	•	Are there patterns in performance tied to specific jockeys, trainers, or race conditions?



# Using Spark for Efficient Processing

Steps to Implement

	1.	Load the Data
	•	Load gpspoint, gps_aggregated_results, and routes into Spark DataFrames.
	2.	Segment the Race
	•	Divide each race into sections (e.g., by distance markers or time intervals).
	•	Use PARTITION BY in Spark to process horses within each race separately.
	3.	Derive Metrics
	•	Speed Metrics: Use window functions to calculate average, min, max speeds.
	•	Acceleration/Deceleration: Compute using differences in speed and timestamps.
	•	Efficiency: Calculate distance ran vs. track distance.
	4.	Integrate with EQB Data
	•	Join with EQB tables on course_cd, race_date, and race_number for comparison.
	5.	Save Aggregated Data
	•	Save results into gps_aggregated_results and tpd_features.

Example Spark Code

Here’s a high-level implementation for deriving pace metrics:


In [1]:
from pyspark.sql import SparkSession
import configparser
import logging
import os


def setup_logging(script_dir, log_dir=None):
    """Sets up logging configuration to write logs to a file and the console."""
    try:
        # Default log directory
        if not log_dir:
            log_dir = '/home/exx/myCode/horse-racing/FoxRiverAIRacing/logs'
        
        # Ensure the log directory exists
        os.makedirs(log_dir, exist_ok=True)
        log_file = os.path.join(log_dir, 'SparkPy_load.log')

        # Create a logger and clear existing handlers
        logger = logging.getLogger()
        if logger.hasHandlers():
            logger.handlers.clear()

        logger.setLevel(logging.INFO)

        # Create file handler
        file_handler = logging.FileHandler(log_file)
        file_handler.setLevel(logging.INFO)

        # Create console handler
        console_handler = logging.StreamHandler()
        console_handler.setLevel(logging.INFO)

        # Define a common format
        formatter = logging.Formatter(
            '%(asctime)s - %(levelname)s - %(message)s',
            datefmt='%Y-%m-%d %H:%M:%S'
        )
        file_handler.setFormatter(formatter)
        console_handler.setFormatter(formatter)

        # Add handlers to the logger
        logger.addHandler(file_handler)
        logger.addHandler(console_handler)

        logger.info("Logging has been set up successfully.")
    except Exception as e:
        print(f"Failed to set up logging: {e}", file=sys.stderr)
        sys.exit(1)

def read_config(config_file_path="config.ini"):
    """Read database configuration from config.ini."""
    config = configparser.ConfigParser()
    config.read(config_file_path)
    if 'database' not in config:
        raise KeyError("Database configuration missing in config.ini")
    return config['database']




In [2]:
import os

os.environ["DB_PASSWORD"] = "SparkPy24!"
print(os.getenv("DB_PASSWORD"))

SparkPy24!


In [3]:
from pyspark.sql import SparkSession, Window
from pyspark.sql.functions import col, lag, avg, max, min, stddev, expr, unix_timestamp
import logging
import os
from sqlalchemy import create_engine, text
import configparser

# Reduce Py4J logging
logger = logging.getLogger("py4j")
logger.setLevel(logging.ERROR)


# Configure logging
log_file = "/home/exx/myCode/horse-racing/FoxRiverAIRacing/logs/SparkPy_load.log"
logging.basicConfig(level=logging.INFO, handlers=[
    logging.FileHandler(log_file),
    logging.StreamHandler()
])
logging.info("Starting Spark Processing for GPS Data")

db_config = {
    "host": "192.168.4.25",
    "port": "5433",
    "dbname": "foxriverai",
    "user": "rshane",
    "password": os.getenv("DB_PASSWORD")  # Fetch from environment variable
}

if not db_config["password"]:
    raise Exception("Database password is missing. Set it in the DB_PASSWORD environment variable.")

jdbc_url = f"jdbc:postgresql://{db_config['host']}:{db_config['port']}/{db_config['dbname']}"
jdbc_properties = {
    "user": db_config["user"],
    "password": db_config["password"],
    "driver": "org.postgresql.Driver"
}
jdbc_driver_path = "/home/exx/myCode/horse-racing/FoxRiverAIRacing/jdbc/postgresql-42.7.4.jar"

logging.info("Variables defined successfully.")


INFO:root:Starting Spark Processing for GPS Data
INFO:root:Variables defined successfully.


In [4]:
from pyspark.sql import SparkSession, Window
from pyspark.sql.functions import (
    col, avg, max as F_max, min as F_min, count, stddev, row_number, lag, lead,
    when, lit, first, last, abs, sum as F_sum, udf
)
from pyspark.sql.types import DoubleType
import math
import logging

# Initialize Spark session
def initialize_spark():
    jdbc_driver_path = "/home/exx/myCode/horse-racing/FoxRiverAIRacing/jdbc/postgresql-42.7.4.jar"
    extra_class_path = jdbc_driver_path  # Ensure this is the correct path to your JDBC JAR
    spark = SparkSession.builder \
        .appName("GPS Sectionals Analysis - Enhanced Aggregation") \
        .config("spark.driver.extraClassPath", extra_class_path) \
        .config("spark.executor.extraClassPath", extra_class_path) \
        .config("spark.driver.memory", "64g") \
        .config("spark.executor.memory", "32g") \
        .config("spark.executor.memoryOverhead", "8g") \
        .config("spark.sql.debug.maxToStringFields", "1000") \
        .config("spark.sql.adaptive.enabled", "true") \
        .getOrCreate()
    spark.sparkContext.setLogLevel("ERROR")
    logging.info("Spark session created successfully.")
    return spark

# Define the haversine function and UDF
def define_haversine_udf():
    def haversine(lat1, lon1, lat2, lon2):
        # Check for None values
        if None in (lat1, lon1, lat2, lon2):
            return 0.0
        # Convert decimal degrees to radians
        lon1, lat1, lon2, lat2 = map(
            lambda x: math.radians(float(x)), [lon1, lat1, lon2, lat2]
        )
        # Haversine formula
        dlon = lon2 - lon1
        dlat = lat2 - lat1
        a = math.sin(dlat / 2) ** 2 + \
            math.cos(lat1) * math.cos(lat2) * math.sin(dlon / 2) ** 2
        c = 2 * math.asin(math.sqrt(a))
        # Radius of earth in meters
        r = 6371000
        return c * r
    return udf(haversine, DoubleType())

# Load data from the database
def load_data(spark, course):
    # Load sectionals data
    sectionals_df = spark.read.jdbc(
        url=jdbc_url,
        table="sectionals",
        properties=jdbc_properties
    ).filter(col("course_cd") == course).select(
        col("course_cd"),
        col("race_date"),
        col("race_number"),
        col("saddle_cloth_number"),
        col("gate_name"),
        col("length_to_finish"),
        col("sectional_time"),
        col("running_time"),
        col("distance_back"),
        col("distance_ran"),
        col("number_of_strides")
    )

    # Load gpspoint data
    gps_df = spark.read.jdbc(
        url=jdbc_url,
        table="gpspoint",
        properties=jdbc_properties
    ).filter(col("course_cd") == course).select(
        col("course_cd"),
        col("race_date"),
        col("race_number"),
        col("saddle_cloth_number"),
        "time_stamp",
        "longitude",
        "latitude",
        "progress",
        "speed",
        "stride_frequency"
    )

    # Load races data to get nominal race distance
    races_df = spark.read.jdbc(
        url=jdbc_url,
        table="races",
        properties=jdbc_properties
    ).filter(col("course_cd") == course).select(
        "course_cd",
        "race_date",
        "race_number",
        col("distance").alias("nominal_distance"),
        col("dist_unit").alias("nominal_dist_unit")
    )
    return sectionals_df, gps_df, races_df

# Rename columns for clarity
def rename_columns(sectionals_df, gps_df, races_df):
    sectionals_df = sectionals_df.select(
        col("course_cd").alias("s_course_cd"),
        col("race_date").alias("s_race_date"),
        col("race_number").alias("s_race_number"),
        col("saddle_cloth_number").alias("s_saddle_cloth_number"),
        "gate_name",
        "length_to_finish",
        "sectional_time",
        "running_time",
        "distance_back",
        "distance_ran",
        "number_of_strides"
    )

    gps_df = gps_df.select(
        col("course_cd").alias("g_course_cd"),
        col("race_date").alias("g_race_date"),
        col("race_number").alias("g_race_number"),
        col("saddle_cloth_number").alias("g_saddle_cloth_number"),
        "time_stamp",
        "longitude",
        "latitude",
        "progress",
        "speed",
        "stride_frequency"
    )

    races_df = races_df.select(
        col("course_cd").alias("r_course_cd"),
        col("race_date").alias("r_race_date"),
        col("race_number").alias("r_race_number"),
        col("nominal_distance"),
        col("nominal_dist_unit")
    )
    return sectionals_df, gps_df, races_df

# Join sectionals and GPS data
def join_sectionals_gps(sectionals_df, gps_df):
    gps_with_gates = sectionals_df.alias("s").join(
        gps_df.alias("g"),
        (col("s.s_course_cd") == col("g.g_course_cd")) &
        (col("s.s_race_date") == col("g.g_race_date")) &
        (col("s.s_race_number") == col("g.g_race_number")) &
        (col("s.s_saddle_cloth_number") == col("g.g_saddle_cloth_number")),
        how="inner"
    )
    return gps_with_gates

# Find the closest GPS point to each gate
def find_closest_gps_points(gps_with_gates):
    gps_with_gates = gps_with_gates.withColumn(
        "progress_diff",
        abs(col("g.progress") - col("s.length_to_finish"))
    )

    window_spec_gate = Window.partitionBy(
        "s.s_course_cd",
        "s.s_race_date",
        "s.s_race_number",
        "s.s_saddle_cloth_number",
        "gate_name"
    ).orderBy("progress_diff")

    gps_at_gates = gps_with_gates.withColumn(
        "row_number",
        row_number().over(window_spec_gate)
    ).filter(col("row_number") == 1).drop("row_number", "progress_diff")
    return gps_at_gates

# Calculate speed changes and acceleration
def calculate_speed_acceleration(gps_at_gates):
    gate_order_window = Window.partitionBy(
        "s.s_course_cd",
        "s.s_race_date",
        "s.s_race_number",
        "s.s_saddle_cloth_number"
    ).orderBy("gate_name")  # Modify if gate_name is not sortable

    gps_at_gates = gps_at_gates.withColumn(
        "previous_speed",
        lag("g.speed").over(gate_order_window)
    ).withColumn(
        "speed_change",
        col("g.speed") - col("previous_speed")
    ).withColumn(
        "acceleration",
        when(col("previous_speed").isNotNull(),
             col("speed_change") / col("previous_speed")
             ).otherwise(lit(0))
    )
    return gps_at_gates

# Identify fastest and slowest gates
def identify_fastest_slowest_gates(gps_at_gates):
    speed_window = Window.partitionBy(
        "s.s_course_cd",
        "s.s_race_date",
        "s.s_race_number",
        "s.s_saddle_cloth_number"
    )

    gps_at_gates = gps_at_gates.withColumn(
        "max_speed",
        F_max("g.speed").over(speed_window)
    ).withColumn(
        "min_speed",
        F_min("g.speed").over(speed_window)
    ).withColumn(
        "is_fastest_gate",
        when(col("g.speed") == col("max_speed"), lit(1)).otherwise(lit(0))
    ).withColumn(
        "is_slowest_gate",
        when(col("g.speed") == col("min_speed"), lit(1)).otherwise(lit(0))
    )
    return gps_at_gates

# Calculate fatigue factor
def calculate_fatigue_factor(gps_at_gates):
    # Extract finish speed
    finish_speed = gps_at_gates.filter(col("gate_name") == "Finish").select(
        col("s.s_course_cd").alias("course_cd"),
        col("s.s_race_date").alias("race_date"),
        col("s.s_race_number").alias("race_number"),
        col("s.s_saddle_cloth_number").alias("saddle_cloth_number"),
        col("g.speed").alias("finish_speed")
    )

    # Join finish speed back to gps_at_gates
    gps_at_gates = gps_at_gates.join(
        finish_speed,
        on=[
            gps_at_gates["s.s_course_cd"] == finish_speed["course_cd"],
            gps_at_gates["s.s_race_date"] == finish_speed["race_date"],
            gps_at_gates["s.s_race_number"] == finish_speed["race_number"],
            gps_at_gates["s.s_saddle_cloth_number"] == finish_speed["saddle_cloth_number"]
        ],
        how="left"
    )

    # Calculate fatigue factor
    gps_at_gates = gps_at_gates.withColumn(
        "fatigue_factor",
        (col("max_speed") - col("finish_speed")) / col("max_speed")
    )
    return gps_at_gates

# Prepare aggregated metrics per gate
def prepare_aggregated_metrics(gps_at_gates):
    aggregated_metrics = gps_at_gates.select(
        col("s.s_course_cd").alias("course_cd"),
        col("s.s_race_date").alias("race_date"),
        col("s.s_race_number").alias("race_number"),
        col("s.s_saddle_cloth_number").alias("saddle_cloth_number"),
        "gate_name",
        col("g.speed").alias("speed"),
        "acceleration",
        "fatigue_factor",
        "is_fastest_gate",
        "is_slowest_gate"
    )

    per_gate_metrics = aggregated_metrics.groupBy(
        "course_cd",
        "race_date",
        "race_number",
        "saddle_cloth_number",
        "gate_name"
    ).agg(
        avg("speed").alias("avg_speed"),
        avg("acceleration").alias("avg_acceleration"),
        F_max("speed").alias("max_speed"),
        F_min("speed").alias("min_speed"),
        F_max("fatigue_factor").alias("fatigue_factor"),
        F_max("is_fastest_gate").alias("is_fastest_gate"),
        F_max("is_slowest_gate").alias("is_slowest_gate")
    )
    return per_gate_metrics

# Calculate actual distance run and ground loss
def calculate_ground_loss(gps_df, races_df, haversine_udf):
    # Define window specification
    window_spec_time = Window.partitionBy(
        "g_course_cd", "g_race_date", "g_race_number", "g_saddle_cloth_number"
    ).orderBy("time_stamp")

    # Get previous latitude and longitude
    gps_df = gps_df.withColumn("prev_latitude", lag("latitude").over(window_spec_time))
    gps_df = gps_df.withColumn("prev_longitude", lag("longitude").over(window_spec_time))

    # Calculate segment distance using haversine formula
    gps_df = gps_df.withColumn(
        "segment_distance",
        haversine_udf(
            col("prev_latitude"),
            col("prev_longitude"),
            col("latitude"),
            col("longitude")
        )
    )

    # Fill null values
    gps_df = gps_df.fillna({"segment_distance": 0})

    # Calculate cumulative distance
    gps_df = gps_df.withColumn(
        "cumulative_distance",
        F_sum("segment_distance").over(window_spec_time)
    )

    # Get total distance per horse
    total_distance_df = gps_df.groupBy(
        "g_course_cd", "g_race_date", "g_race_number", "g_saddle_cloth_number"
    ).agg(
        F_max("cumulative_distance").alias("total_distance_run")
    )

    # Convert nominal distance to meters with proper scaling
    races_df = races_df.withColumn(
        "nominal_distance_meters",
        when(col("nominal_dist_unit") == 'F', (col("nominal_distance") / 100) * 201.168)
        .when(col("nominal_dist_unit") == 'Y', col("nominal_distance") * 0.9144)
        .otherwise(lit(None))
    )


    # Join total_distance_df with races_df
    distance_comparison_df = total_distance_df.join(
        races_df,
        (total_distance_df["g_course_cd"] == races_df["r_course_cd"]) &
        (total_distance_df["g_race_date"] == races_df["r_race_date"]) &
        (total_distance_df["g_race_number"] == races_df["r_race_number"]),
        how="left"
    )

    # Calculate ground loss
    distance_comparison_df = distance_comparison_df.withColumn(
        "ground_loss",
        col("total_distance_run") - col("nominal_distance_meters")
    )

    # Join and select required columns
    distance_comparison_df = total_distance_df.join(
        races_df,
        (total_distance_df["g_course_cd"] == races_df["r_course_cd"]) &
        (total_distance_df["g_race_date"] == races_df["r_race_date"]) &
        (total_distance_df["g_race_number"] == races_df["r_race_number"]),
        how="left"
    ).withColumn(
        "ground_loss",
        col("total_distance_run") - col("nominal_distance_meters")
    ).select(
        col("g_course_cd").alias("course_cd"),
        col("g_race_date").alias("race_date"),
        col("g_race_number").alias("race_number"),
        col("g_saddle_cloth_number").alias("saddle_cloth_number"),
        "total_distance_run",
        "ground_loss"
    )

    return distance_comparison_df

# Integrate ground loss into final metrics
def integrate_ground_loss(per_gate_metrics, distance_comparison_df):
    final_metrics = per_gate_metrics.join(
        distance_comparison_df,
        on=["course_cd", "race_date", "race_number", "saddle_cloth_number"],
        how="left"
    )
    return final_metrics
    
# Write final metrics to the database
def write_to_database(final_metrics):
    # Ensure all necessary columns are included
    required_columns = [
        'course_cd', 'race_date', 'race_number', 'saddle_cloth_number',
        'gate_name', 'avg_speed', 'avg_acceleration', 'max_speed', 'min_speed',
        'fatigue_factor', 'is_fastest_gate', 'is_slowest_gate',
        'total_distance_run', 'ground_loss'
    ]

    # Check if all required columns are present
    missing_columns = [col for col in required_columns if col not in final_metrics.columns]
    if missing_columns:
        logging.error(f"Missing columns in final_metrics: {missing_columns}")
        return

    # Write final metrics to the database
    final_metrics.write.jdbc(
        url=jdbc_url,
        table="gps_aggregated_results_with_gates",
        mode="append",
        properties=jdbc_properties
    )
    
# Main processing function for each course
def process_course(course, spark, haversine_udf):
    print(f"Processing course: {course}")

    # Load data
    sectionals_df, gps_df, races_df = load_data(spark, course)
    logging.info(f"Loading data from database for course: {course}")
    # Rename columns
    sectionals_df, gps_df, races_df = rename_columns(sectionals_df, gps_df, races_df)

    # Join sectionals and GPS data
    gps_with_gates = join_sectionals_gps(sectionals_df, gps_df)

    # Find closest GPS points to gates
    gps_at_gates = find_closest_gps_points(gps_with_gates)

    # Calculate speed changes and acceleration
    gps_at_gates = calculate_speed_acceleration(gps_at_gates)

    # Identify fastest and slowest gates
    gps_at_gates = identify_fastest_slowest_gates(gps_at_gates)

    # Calculate fatigue factor
    gps_at_gates = calculate_fatigue_factor(gps_at_gates)

    # Prepare aggregated metrics
    per_gate_metrics = prepare_aggregated_metrics(gps_at_gates)

    # Calculate ground loss
    distance_comparison_df = calculate_ground_loss(gps_df, races_df, haversine_udf)

    per_gate_metrics.printSchema()
    logging.info(per_gate_metrics.printSchema())
    distance_comparison_df.printSchema()
    logging.info(distance_comparison_df.printSchema())
    # Integrate ground loss into final metrics
    final_metrics = integrate_ground_loss(per_gate_metrics, distance_comparison_df)
    
    final_metrics = integrate_ground_loss(per_gate_metrics, distance_comparison_df)
    final_metrics.printSchema()
    logging.info(final_metrics.printSchema())

    # Write final metrics to database
    write_to_database(final_metrics)

    print(f"Completed processing for course: {course}")

def main():
    # Initialize Spark session
    spark = initialize_spark()
    spark.sparkContext.setLogLevel("ERROR")
    # Define haversine UDF
    haversine_udf = define_haversine_udf()

    # List of courses
    courses = ['CNL', 'SAR', 'PIM', 'TSA', 'BEL', 'MVR', 'TWO', 'CLS', 'KEE', 'TAM',
               'TTP', 'TKD', 'ELP', 'PEN', 'HOU', 'DMR', 'TLS', 'AQU', 'MTH', 'TGP',
               'TGG', 'CBY', 'LRL', 'TED', 'IND', 'CTD', 'ASD', 'TCD', 'LAD', 'MED',
               'TOP', 'HOO']

    # Process each course
    for course in courses:
        try:
            process_course(course, spark, haversine_udf)
        except Exception as e:
            logging.error(f"Error processing course {course}: {e}")

    # Stop Spark session
    spark.stop()

# if __name__ == "__main__":
main()

Setting default log level to "WARN".
To adjust logging level use sc.setLogLevel(newLevel). For SparkR, use setLogLevel(newLevel).
24/12/05 22:20:23 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
24/12/05 22:20:24 WARN Utils: Service 'SparkUI' could not bind on port 4040. Attempting port 4041.
INFO:root:Spark session created successfully.


Processing course: CNL


INFO:root:Loading data from database for course: CNL
INFO:root:None
INFO:root:None
INFO:root:None


root
 |-- course_cd: string (nullable = true)
 |-- race_date: date (nullable = true)
 |-- race_number: integer (nullable = true)
 |-- saddle_cloth_number: string (nullable = true)
 |-- gate_name: string (nullable = true)
 |-- avg_speed: double (nullable = true)
 |-- avg_acceleration: double (nullable = true)
 |-- max_speed: double (nullable = true)
 |-- min_speed: double (nullable = true)
 |-- fatigue_factor: double (nullable = true)
 |-- is_fastest_gate: integer (nullable = true)
 |-- is_slowest_gate: integer (nullable = true)

root
 |-- course_cd: string (nullable = true)
 |-- race_date: date (nullable = true)
 |-- race_number: integer (nullable = true)
 |-- saddle_cloth_number: string (nullable = true)
 |-- gate_name: string (nullable = true)
 |-- avg_speed: double (nullable = true)
 |-- avg_acceleration: double (nullable = true)
 |-- max_speed: double (nullable = true)
 |-- min_speed: double (nullable = true)
 |-- fatigue_factor: double (nullable = true)
 |-- is_fastest_gate: integ

24/12/05 22:20:31 ERROR Executor: Exception in task 23.0 in stage 14.0 (TID 45)]
java.sql.BatchUpdateException: Batch entry 0 INSERT INTO gps_aggregated_results_with_gates ("course_cd","race_date","race_number","saddle_cloth_number","gate_name","avg_speed","avg_acceleration","max_speed","min_speed","fatigue_factor","is_fastest_gate","is_slowest_gate","total_distance_run","ground_loss") VALUES (('CNL'),('2022-07-11 -05'::date),('9'::int4),('5  '),('0.5f'),('16.4'::double precision),('0.0'::double precision),('16.4'::double precision),('16.4'::double precision),('0.18172213417855254'::double precision),('0'::int4),('0'::int4),('1261.3965676748849'::double precision),(NULL)) was aborted: ERROR: duplicate key value violates unique constraint "gps_aggregated_results_with_gates_pkey"
  Detail: Key (course_cd, race_date, race_number, saddle_cloth_number, gate_name)=(CNL, 2022-07-11, 9, 5  , 0.5f) already exists.  Call getNextException to see other errors in the batch.
	at org.postgresql.jdbc.

INFO:root:Loading data from database for course: SAR


Processing course: SAR


INFO:root:None
INFO:root:None
INFO:root:None


root
 |-- course_cd: string (nullable = true)
 |-- race_date: date (nullable = true)
 |-- race_number: integer (nullable = true)
 |-- saddle_cloth_number: string (nullable = true)
 |-- gate_name: string (nullable = true)
 |-- avg_speed: double (nullable = true)
 |-- avg_acceleration: double (nullable = true)
 |-- max_speed: double (nullable = true)
 |-- min_speed: double (nullable = true)
 |-- fatigue_factor: double (nullable = true)
 |-- is_fastest_gate: integer (nullable = true)
 |-- is_slowest_gate: integer (nullable = true)

root
 |-- course_cd: string (nullable = true)
 |-- race_date: date (nullable = true)
 |-- race_number: integer (nullable = true)
 |-- saddle_cloth_number: string (nullable = true)
 |-- gate_name: string (nullable = true)
 |-- avg_speed: double (nullable = true)
 |-- avg_acceleration: double (nullable = true)
 |-- max_speed: double (nullable = true)
 |-- min_speed: double (nullable = true)
 |-- fatigue_factor: double (nullable = true)
 |-- is_fastest_gate: integ

24/12/05 22:20:33 ERROR Executor: Exception in task 14.0 in stage 29.0 (TID 86)]
java.sql.BatchUpdateException: Batch entry 0 INSERT INTO gps_aggregated_results_with_gates ("course_cd","race_date","race_number","saddle_cloth_number","gate_name","avg_speed","avg_acceleration","max_speed","min_speed","fatigue_factor","is_fastest_gate","is_slowest_gate","total_distance_run","ground_loss") VALUES (('SAR'),('2023-07-13 -05'::date),('10'::int4),('5  '),('0.5f'),('0.12'::double precision),('0.0'::double precision),('0.12'::double precision),('0.12'::double precision),('0.0'::double precision),('1'::int4),('1'::int4),('1945.9310951890839'::double precision),(NULL)) was aborted: ERROR: duplicate key value violates unique constraint "gps_aggregated_results_with_gates_pkey"
  Detail: Key (course_cd, race_date, race_number, saddle_cloth_number, gate_name)=(SAR, 2023-07-13, 10, 5  , 0.5f) already exists.  Call getNextException to see other errors in the batch.
	at org.postgresql.jdbc.BatchResultHan

INFO:root:Loading data from database for course: PIM
INFO:root:None
INFO:root:None


Processing course: PIM
root
 |-- course_cd: string (nullable = true)
 |-- race_date: date (nullable = true)
 |-- race_number: integer (nullable = true)
 |-- saddle_cloth_number: string (nullable = true)
 |-- gate_name: string (nullable = true)
 |-- avg_speed: double (nullable = true)
 |-- avg_acceleration: double (nullable = true)
 |-- max_speed: double (nullable = true)
 |-- min_speed: double (nullable = true)
 |-- fatigue_factor: double (nullable = true)
 |-- is_fastest_gate: integer (nullable = true)
 |-- is_slowest_gate: integer (nullable = true)

root
 |-- course_cd: string (nullable = true)
 |-- race_date: date (nullable = true)
 |-- race_number: integer (nullable = true)
 |-- saddle_cloth_number: string (nullable = true)
 |-- gate_name: string (nullable = true)
 |-- avg_speed: double (nullable = true)
 |-- avg_acceleration: double (nullable = true)
 |-- max_speed: double (nullable = true)
 |-- min_speed: double (nullable = true)
 |-- fatigue_factor: double (nullable = true)
 |--

INFO:root:None
                                                                                

root
 |-- course_cd: string (nullable = true)
 |-- race_date: date (nullable = true)
 |-- race_number: integer (nullable = true)
 |-- saddle_cloth_number: string (nullable = true)
 |-- gate_name: string (nullable = true)
 |-- avg_speed: double (nullable = true)
 |-- avg_acceleration: double (nullable = true)
 |-- max_speed: double (nullable = true)
 |-- min_speed: double (nullable = true)
 |-- fatigue_factor: double (nullable = true)
 |-- is_fastest_gate: integer (nullable = true)
 |-- is_slowest_gate: integer (nullable = true)
 |-- total_distance_run: double (nullable = true)
 |-- ground_loss: double (nullable = true)

root
 |-- course_cd: string (nullable = true)
 |-- race_date: date (nullable = true)
 |-- race_number: integer (nullable = true)
 |-- saddle_cloth_number: string (nullable = true)
 |-- gate_name: string (nullable = true)
 |-- avg_speed: double (nullable = true)
 |-- avg_acceleration: double (nullable = true)
 |-- max_speed: double (nullable = true)
 |-- min_speed: doubl

24/12/05 22:20:34 ERROR Executor: Exception in task 9.0 in stage 44.0 (TID 125)
java.sql.BatchUpdateException: Batch entry 0 INSERT INTO gps_aggregated_results_with_gates ("course_cd","race_date","race_number","saddle_cloth_number","gate_name","avg_speed","avg_acceleration","max_speed","min_speed","fatigue_factor","is_fastest_gate","is_slowest_gate","total_distance_run","ground_loss") VALUES (('PIM'),('2022-05-12 -05'::date),('1'::int4),('5  '),('0.5f'),('17.31'::double precision),('0.0'::double precision),('17.31'::double precision),('17.31'::double precision),('0.11228460255697599'::double precision),('0'::int4),('0'::int4),('1124.3971816924275'::double precision),(NULL)) was aborted: ERROR: duplicate key value violates unique constraint "gps_aggregated_results_with_gates_pkey"
  Detail: Key (course_cd, race_date, race_number, saddle_cloth_number, gate_name)=(PIM, 2022-05-12, 1, 5  , 0.5f) already exists.  Call getNextException to see other errors in the batch.
	at org.postgresql.jdb

INFO:root:Loading data from database for course: TSA
INFO:root:None
INFO:root:None
INFO:root:None


Processing course: TSA
root
 |-- course_cd: string (nullable = true)
 |-- race_date: date (nullable = true)
 |-- race_number: integer (nullable = true)
 |-- saddle_cloth_number: string (nullable = true)
 |-- gate_name: string (nullable = true)
 |-- avg_speed: double (nullable = true)
 |-- avg_acceleration: double (nullable = true)
 |-- max_speed: double (nullable = true)
 |-- min_speed: double (nullable = true)
 |-- fatigue_factor: double (nullable = true)
 |-- is_fastest_gate: integer (nullable = true)
 |-- is_slowest_gate: integer (nullable = true)

root
 |-- course_cd: string (nullable = true)
 |-- race_date: date (nullable = true)
 |-- race_number: integer (nullable = true)
 |-- saddle_cloth_number: string (nullable = true)
 |-- gate_name: string (nullable = true)
 |-- avg_speed: double (nullable = true)
 |-- avg_acceleration: double (nullable = true)
 |-- max_speed: double (nullable = true)
 |-- min_speed: double (nullable = true)
 |-- fatigue_factor: double (nullable = true)
 |--

24/12/05 22:20:37 ERROR Executor: Exception in task 19.0 in stage 59.0 (TID 187)
java.sql.BatchUpdateException: Batch entry 0 INSERT INTO gps_aggregated_results_with_gates ("course_cd","race_date","race_number","saddle_cloth_number","gate_name","avg_speed","avg_acceleration","max_speed","min_speed","fatigue_factor","is_fastest_gate","is_slowest_gate","total_distance_run","ground_loss") VALUES (('TSA'),('2022-10-21 -05'::date),('7'::int4),('5  '),('0.5f'),('14.48'::double precision),('0.0'::double precision),('14.48'::double precision),('14.48'::double precision),('0.2818371607515658'::double precision),('0'::int4),('0'::int4),('1343.0202193956932'::double precision),(NULL)) was aborted: ERROR: duplicate key value violates unique constraint "gps_aggregated_results_with_gates_pkey"
  Detail: Key (course_cd, race_date, race_number, saddle_cloth_number, gate_name)=(TSA, 2022-10-21, 7, 5  , 0.5f) already exists.  Call getNextException to see other errors in the batch.
	at org.postgresql.jdb

INFO:root:Loading data from database for course: BEL
INFO:root:None
INFO:root:None


Processing course: BEL
root
 |-- course_cd: string (nullable = true)
 |-- race_date: date (nullable = true)
 |-- race_number: integer (nullable = true)
 |-- saddle_cloth_number: string (nullable = true)
 |-- gate_name: string (nullable = true)
 |-- avg_speed: double (nullable = true)
 |-- avg_acceleration: double (nullable = true)
 |-- max_speed: double (nullable = true)
 |-- min_speed: double (nullable = true)
 |-- fatigue_factor: double (nullable = true)
 |-- is_fastest_gate: integer (nullable = true)
 |-- is_slowest_gate: integer (nullable = true)

root
 |-- course_cd: string (nullable = true)
 |-- race_date: date (nullable = true)
 |-- race_number: integer (nullable = true)
 |-- saddle_cloth_number: string (nullable = true)
 |-- gate_name: string (nullable = true)
 |-- avg_speed: double (nullable = true)
 |-- avg_acceleration: double (nullable = true)
 |-- max_speed: double (nullable = true)
 |-- min_speed: double (nullable = true)
 |-- fatigue_factor: double (nullable = true)
 |--

INFO:root:None
                                                                                

root
 |-- course_cd: string (nullable = true)
 |-- race_date: date (nullable = true)
 |-- race_number: integer (nullable = true)
 |-- saddle_cloth_number: string (nullable = true)
 |-- gate_name: string (nullable = true)
 |-- avg_speed: double (nullable = true)
 |-- avg_acceleration: double (nullable = true)
 |-- max_speed: double (nullable = true)
 |-- min_speed: double (nullable = true)
 |-- fatigue_factor: double (nullable = true)
 |-- is_fastest_gate: integer (nullable = true)
 |-- is_slowest_gate: integer (nullable = true)
 |-- total_distance_run: double (nullable = true)
 |-- ground_loss: double (nullable = true)

root
 |-- course_cd: string (nullable = true)
 |-- race_date: date (nullable = true)
 |-- race_number: integer (nullable = true)
 |-- saddle_cloth_number: string (nullable = true)
 |-- gate_name: string (nullable = true)
 |-- avg_speed: double (nullable = true)
 |-- avg_acceleration: double (nullable = true)
 |-- max_speed: double (nullable = true)
 |-- min_speed: doubl

24/12/05 22:20:38 ERROR Executor: Exception in task 4.0 in stage 74.0 (TID 235)
java.sql.BatchUpdateException: Batch entry 0 INSERT INTO gps_aggregated_results_with_gates ("course_cd","race_date","race_number","saddle_cloth_number","gate_name","avg_speed","avg_acceleration","max_speed","min_speed","fatigue_factor","is_fastest_gate","is_slowest_gate","total_distance_run","ground_loss") VALUES (('BEL'),('2023-05-04 -05'::date),('2'::int4),('3  '),('0.5f'),('14.24'::double precision),('0.0'::double precision),('14.24'::double precision),('14.24'::double precision),('0.0'::double precision),('1'::int4),('1'::int4),('1359.9760341082974'::double precision),(NULL)) was aborted: ERROR: duplicate key value violates unique constraint "gps_aggregated_results_with_gates_pkey"
  Detail: Key (course_cd, race_date, race_number, saddle_cloth_number, gate_name)=(BEL, 2023-05-04, 2, 3  , 0.5f) already exists.  Call getNextException to see other errors in the batch.
	at org.postgresql.jdbc.BatchResultHan

INFO:root:Loading data from database for course: MVR
INFO:root:None
INFO:root:None
INFO:root:None


Processing course: MVR
root
 |-- course_cd: string (nullable = true)
 |-- race_date: date (nullable = true)
 |-- race_number: integer (nullable = true)
 |-- saddle_cloth_number: string (nullable = true)
 |-- gate_name: string (nullable = true)
 |-- avg_speed: double (nullable = true)
 |-- avg_acceleration: double (nullable = true)
 |-- max_speed: double (nullable = true)
 |-- min_speed: double (nullable = true)
 |-- fatigue_factor: double (nullable = true)
 |-- is_fastest_gate: integer (nullable = true)
 |-- is_slowest_gate: integer (nullable = true)

root
 |-- course_cd: string (nullable = true)
 |-- race_date: date (nullable = true)
 |-- race_number: integer (nullable = true)
 |-- saddle_cloth_number: string (nullable = true)
 |-- gate_name: string (nullable = true)
 |-- avg_speed: double (nullable = true)
 |-- avg_acceleration: double (nullable = true)
 |-- max_speed: double (nullable = true)
 |-- min_speed: double (nullable = true)
 |-- fatigue_factor: double (nullable = true)
 |--

24/12/05 22:20:40 ERROR Executor: Exception in task 6.0 in stage 97.0 (TID 282)]
java.sql.BatchUpdateException: Batch entry 0 INSERT INTO gps_aggregated_results_with_gates ("course_cd","race_date","race_number","saddle_cloth_number","gate_name","avg_speed","avg_acceleration","max_speed","min_speed","fatigue_factor","is_fastest_gate","is_slowest_gate","total_distance_run","ground_loss") VALUES (('MVR'),('2022-01-01 -06'::date),('3'::int4),('1  '),('0.5f'),('14.33'::double precision),('0.0'::double precision),('14.33'::double precision),('14.33'::double precision),('0.2981333333333333'::double precision),('0'::int4),('0'::int4),('1342.7537205718252'::double precision),(NULL)) was aborted: ERROR: duplicate key value violates unique constraint "gps_aggregated_results_with_gates_pkey"
  Detail: Key (course_cd, race_date, race_number, saddle_cloth_number, gate_name)=(MVR, 2022-01-01, 3, 1  , 0.5f) already exists.  Call getNextException to see other errors in the batch.
	at org.postgresql.jdb

INFO:root:Loading data from database for course: TWO
INFO:root:None
INFO:root:None
INFO:root:None


Processing course: TWO
root
 |-- course_cd: string (nullable = true)
 |-- race_date: date (nullable = true)
 |-- race_number: integer (nullable = true)
 |-- saddle_cloth_number: string (nullable = true)
 |-- gate_name: string (nullable = true)
 |-- avg_speed: double (nullable = true)
 |-- avg_acceleration: double (nullable = true)
 |-- max_speed: double (nullable = true)
 |-- min_speed: double (nullable = true)
 |-- fatigue_factor: double (nullable = true)
 |-- is_fastest_gate: integer (nullable = true)
 |-- is_slowest_gate: integer (nullable = true)

root
 |-- course_cd: string (nullable = true)
 |-- race_date: date (nullable = true)
 |-- race_number: integer (nullable = true)
 |-- saddle_cloth_number: string (nullable = true)
 |-- gate_name: string (nullable = true)
 |-- avg_speed: double (nullable = true)
 |-- avg_acceleration: double (nullable = true)
 |-- max_speed: double (nullable = true)
 |-- min_speed: double (nullable = true)
 |-- fatigue_factor: double (nullable = true)
 |--

24/12/05 22:20:45 ERROR Executor: Exception in task 53.0 in stage 112.0 (TID 413)
java.sql.BatchUpdateException: Batch entry 0 INSERT INTO gps_aggregated_results_with_gates ("course_cd","race_date","race_number","saddle_cloth_number","gate_name","avg_speed","avg_acceleration","max_speed","min_speed","fatigue_factor","is_fastest_gate","is_slowest_gate","total_distance_run","ground_loss") VALUES (('TWO'),('2022-04-16 -05'::date),('2'::int4),('5  '),('0.5f'),('16.94'::double precision),('0.0'::double precision),('16.94'::double precision),('16.94'::double precision),('0.2152514872904272'::double precision),('0'::int4),('0'::int4),('1177.4065558045581'::double precision),(NULL)) was aborted: ERROR: duplicate key value violates unique constraint "gps_aggregated_results_with_gates_pkey"
  Detail: Key (course_cd, race_date, race_number, saddle_cloth_number, gate_name)=(TWO, 2022-04-16, 2, 5  , 0.5f) already exists.  Call getNextException to see other errors in the batch.
	at org.postgresql.jd

INFO:root:Loading data from database for course: CLS
INFO:root:None
INFO:root:None
INFO:root:None


Processing course: CLS
root
 |-- course_cd: string (nullable = true)
 |-- race_date: date (nullable = true)
 |-- race_number: integer (nullable = true)
 |-- saddle_cloth_number: string (nullable = true)
 |-- gate_name: string (nullable = true)
 |-- avg_speed: double (nullable = true)
 |-- avg_acceleration: double (nullable = true)
 |-- max_speed: double (nullable = true)
 |-- min_speed: double (nullable = true)
 |-- fatigue_factor: double (nullable = true)
 |-- is_fastest_gate: integer (nullable = true)
 |-- is_slowest_gate: integer (nullable = true)

root
 |-- course_cd: string (nullable = true)
 |-- race_date: date (nullable = true)
 |-- race_number: integer (nullable = true)
 |-- saddle_cloth_number: string (nullable = true)
 |-- gate_name: string (nullable = true)
 |-- avg_speed: double (nullable = true)
 |-- avg_acceleration: double (nullable = true)
 |-- max_speed: double (nullable = true)
 |-- min_speed: double (nullable = true)
 |-- fatigue_factor: double (nullable = true)
 |--

24/12/05 22:20:46 ERROR Executor: Exception in task 0.0 in stage 127.0 (TID 467)
java.sql.BatchUpdateException: Batch entry 0 INSERT INTO gps_aggregated_results_with_gates ("course_cd","race_date","race_number","saddle_cloth_number","gate_name","avg_speed","avg_acceleration","max_speed","min_speed","fatigue_factor","is_fastest_gate","is_slowest_gate","total_distance_run","ground_loss") VALUES (('CLS'),('2024-08-16 -05'::date),('1'::int4),('1  '),('0.5f'),('0.07'::double precision),('0.0'::double precision),('0.07'::double precision),('0.07'::double precision),('0.0'::double precision),('1'::int4),('1'::int4),('1366.3190584960344'::double precision),(NULL)) was aborted: ERROR: duplicate key value violates unique constraint "gps_aggregated_results_with_gates_pkey"
  Detail: Key (course_cd, race_date, race_number, saddle_cloth_number, gate_name)=(CLS, 2024-08-16, 1, 1  , 0.5f) already exists.  Call getNextException to see other errors in the batch.
	at org.postgresql.jdbc.BatchResultHandl

INFO:root:Loading data from database for course: KEE
INFO:root:None
INFO:root:None
INFO:root:None


Processing course: KEE
root
 |-- course_cd: string (nullable = true)
 |-- race_date: date (nullable = true)
 |-- race_number: integer (nullable = true)
 |-- saddle_cloth_number: string (nullable = true)
 |-- gate_name: string (nullable = true)
 |-- avg_speed: double (nullable = true)
 |-- avg_acceleration: double (nullable = true)
 |-- max_speed: double (nullable = true)
 |-- min_speed: double (nullable = true)
 |-- fatigue_factor: double (nullable = true)
 |-- is_fastest_gate: integer (nullable = true)
 |-- is_slowest_gate: integer (nullable = true)

root
 |-- course_cd: string (nullable = true)
 |-- race_date: date (nullable = true)
 |-- race_number: integer (nullable = true)
 |-- saddle_cloth_number: string (nullable = true)
 |-- gate_name: string (nullable = true)
 |-- avg_speed: double (nullable = true)
 |-- avg_acceleration: double (nullable = true)
 |-- max_speed: double (nullable = true)
 |-- min_speed: double (nullable = true)
 |-- fatigue_factor: double (nullable = true)
 |--

24/12/05 22:20:47 ERROR Executor: Exception in task 17.0 in stage 142.0 (TID 505)
java.sql.BatchUpdateException: Batch entry 0 INSERT INTO gps_aggregated_results_with_gates ("course_cd","race_date","race_number","saddle_cloth_number","gate_name","avg_speed","avg_acceleration","max_speed","min_speed","fatigue_factor","is_fastest_gate","is_slowest_gate","total_distance_run","ground_loss") VALUES (('KEE'),('2023-04-07 -05'::date),('3'::int4),('3  '),('0.5f'),('16.59'::double precision),('0.0'::double precision),('16.59'::double precision),('16.59'::double precision),('0.14416475972540044'::double precision),('0'::int4),('0'::int4),('1919.1166139411962'::double precision),(NULL)) was aborted: ERROR: duplicate key value violates unique constraint "gps_aggregated_results_with_gates_pkey"
  Detail: Key (course_cd, race_date, race_number, saddle_cloth_number, gate_name)=(KEE, 2023-04-07, 3, 3  , 0.5f) already exists.  Call getNextException to see other errors in the batch.
	at org.postgresql.j

INFO:root:Loading data from database for course: TAM
INFO:root:None
INFO:root:None
INFO:root:None


Processing course: TAM
root
 |-- course_cd: string (nullable = true)
 |-- race_date: date (nullable = true)
 |-- race_number: integer (nullable = true)
 |-- saddle_cloth_number: string (nullable = true)
 |-- gate_name: string (nullable = true)
 |-- avg_speed: double (nullable = true)
 |-- avg_acceleration: double (nullable = true)
 |-- max_speed: double (nullable = true)
 |-- min_speed: double (nullable = true)
 |-- fatigue_factor: double (nullable = true)
 |-- is_fastest_gate: integer (nullable = true)
 |-- is_slowest_gate: integer (nullable = true)

root
 |-- course_cd: string (nullable = true)
 |-- race_date: date (nullable = true)
 |-- race_number: integer (nullable = true)
 |-- saddle_cloth_number: string (nullable = true)
 |-- gate_name: string (nullable = true)
 |-- avg_speed: double (nullable = true)
 |-- avg_acceleration: double (nullable = true)
 |-- max_speed: double (nullable = true)
 |-- min_speed: double (nullable = true)
 |-- fatigue_factor: double (nullable = true)
 |--

24/12/05 22:20:50 ERROR Executor: Exception in task 51.0 in stage 157.0 (TID 611)
java.sql.BatchUpdateException: Batch entry 0 INSERT INTO gps_aggregated_results_with_gates ("course_cd","race_date","race_number","saddle_cloth_number","gate_name","avg_speed","avg_acceleration","max_speed","min_speed","fatigue_factor","is_fastest_gate","is_slowest_gate","total_distance_run","ground_loss") VALUES (('TAM'),('2022-01-01 -06'::date),('10'::int4),('1  '),('0.5f'),('16.52'::double precision),('0.0'::double precision),('16.52'::double precision),('16.52'::double precision),('0.13401476433844403'::double precision),('0'::int4),('0'::int4),('1844.2053161973836'::double precision),(NULL)) was aborted: ERROR: duplicate key value violates unique constraint "gps_aggregated_results_with_gates_pkey"
  Detail: Key (course_cd, race_date, race_number, saddle_cloth_number, gate_name)=(TAM, 2022-01-01, 10, 1  , 0.5f) already exists.  Call getNextException to see other errors in the batch.
	at org.postgresql

INFO:root:Loading data from database for course: TTP
INFO:root:None
INFO:root:None
INFO:root:None


Processing course: TTP
root
 |-- course_cd: string (nullable = true)
 |-- race_date: date (nullable = true)
 |-- race_number: integer (nullable = true)
 |-- saddle_cloth_number: string (nullable = true)
 |-- gate_name: string (nullable = true)
 |-- avg_speed: double (nullable = true)
 |-- avg_acceleration: double (nullable = true)
 |-- max_speed: double (nullable = true)
 |-- min_speed: double (nullable = true)
 |-- fatigue_factor: double (nullable = true)
 |-- is_fastest_gate: integer (nullable = true)
 |-- is_slowest_gate: integer (nullable = true)

root
 |-- course_cd: string (nullable = true)
 |-- race_date: date (nullable = true)
 |-- race_number: integer (nullable = true)
 |-- saddle_cloth_number: string (nullable = true)
 |-- gate_name: string (nullable = true)
 |-- avg_speed: double (nullable = true)
 |-- avg_acceleration: double (nullable = true)
 |-- max_speed: double (nullable = true)
 |-- min_speed: double (nullable = true)
 |-- fatigue_factor: double (nullable = true)
 |--

24/12/05 22:20:53 ERROR Executor: Exception in task 5.0 in stage 172.0 (TID 668)
java.sql.BatchUpdateException: Batch entry 0 INSERT INTO gps_aggregated_results_with_gates ("course_cd","race_date","race_number","saddle_cloth_number","gate_name","avg_speed","avg_acceleration","max_speed","min_speed","fatigue_factor","is_fastest_gate","is_slowest_gate","total_distance_run","ground_loss") VALUES (('TTP'),('2022-12-02 -06'::date),('1'::int4),('6  '),('0.5f'),('14.46'::double precision),('0.0'::double precision),('14.46'::double precision),('14.46'::double precision),('0.3400725764644893'::double precision),('0'::int4),('0'::int4),('1451.388434019811'::double precision),(NULL)) was aborted: ERROR: duplicate key value violates unique constraint "gps_aggregated_results_with_gates_pkey"
  Detail: Key (course_cd, race_date, race_number, saddle_cloth_number, gate_name)=(TTP, 2022-12-02, 1, 6  , 0.5f) already exists.  Call getNextException to see other errors in the batch.
	at org.postgresql.jdbc

INFO:root:Loading data from database for course: TKD
INFO:root:None
INFO:root:None
INFO:root:None


Processing course: TKD
root
 |-- course_cd: string (nullable = true)
 |-- race_date: date (nullable = true)
 |-- race_number: integer (nullable = true)
 |-- saddle_cloth_number: string (nullable = true)
 |-- gate_name: string (nullable = true)
 |-- avg_speed: double (nullable = true)
 |-- avg_acceleration: double (nullable = true)
 |-- max_speed: double (nullable = true)
 |-- min_speed: double (nullable = true)
 |-- fatigue_factor: double (nullable = true)
 |-- is_fastest_gate: integer (nullable = true)
 |-- is_slowest_gate: integer (nullable = true)

root
 |-- course_cd: string (nullable = true)
 |-- race_date: date (nullable = true)
 |-- race_number: integer (nullable = true)
 |-- saddle_cloth_number: string (nullable = true)
 |-- gate_name: string (nullable = true)
 |-- avg_speed: double (nullable = true)
 |-- avg_acceleration: double (nullable = true)
 |-- max_speed: double (nullable = true)
 |-- min_speed: double (nullable = true)
 |-- fatigue_factor: double (nullable = true)
 |--

24/12/05 22:20:54 ERROR Executor: Exception in task 5.0 in stage 187.0 (TID 728)
java.sql.BatchUpdateException: Batch entry 0 INSERT INTO gps_aggregated_results_with_gates ("course_cd","race_date","race_number","saddle_cloth_number","gate_name","avg_speed","avg_acceleration","max_speed","min_speed","fatigue_factor","is_fastest_gate","is_slowest_gate","total_distance_run","ground_loss") VALUES (('TKD'),('2022-09-01 -05'::date),('2'::int4),('9  '),('0.5f'),('12.2'::double precision),('0.0'::double precision),('12.2'::double precision),('12.2'::double precision),('0.0'::double precision),('1'::int4),('1'::int4),('1868.777204015089'::double precision),(NULL)) was aborted: ERROR: duplicate key value violates unique constraint "gps_aggregated_results_with_gates_pkey"
  Detail: Key (course_cd, race_date, race_number, saddle_cloth_number, gate_name)=(TKD, 2022-09-01, 2, 9  , 0.5f) already exists.  Call getNextException to see other errors in the batch.
	at org.postgresql.jdbc.BatchResultHandle

INFO:root:Loading data from database for course: ELP
INFO:root:None
INFO:root:None
INFO:root:None


Processing course: ELP
root
 |-- course_cd: string (nullable = true)
 |-- race_date: date (nullable = true)
 |-- race_number: integer (nullable = true)
 |-- saddle_cloth_number: string (nullable = true)
 |-- gate_name: string (nullable = true)
 |-- avg_speed: double (nullable = true)
 |-- avg_acceleration: double (nullable = true)
 |-- max_speed: double (nullable = true)
 |-- min_speed: double (nullable = true)
 |-- fatigue_factor: double (nullable = true)
 |-- is_fastest_gate: integer (nullable = true)
 |-- is_slowest_gate: integer (nullable = true)

root
 |-- course_cd: string (nullable = true)
 |-- race_date: date (nullable = true)
 |-- race_number: integer (nullable = true)
 |-- saddle_cloth_number: string (nullable = true)
 |-- gate_name: string (nullable = true)
 |-- avg_speed: double (nullable = true)
 |-- avg_acceleration: double (nullable = true)
 |-- max_speed: double (nullable = true)
 |-- min_speed: double (nullable = true)
 |-- fatigue_factor: double (nullable = true)
 |--

24/12/05 22:20:55 ERROR Executor: Exception in task 8.0 in stage 202.0 (TID 759)
java.sql.BatchUpdateException: Batch entry 0 INSERT INTO gps_aggregated_results_with_gates ("course_cd","race_date","race_number","saddle_cloth_number","gate_name","avg_speed","avg_acceleration","max_speed","min_speed","fatigue_factor","is_fastest_gate","is_slowest_gate","total_distance_run","ground_loss") VALUES (('ELP'),('2023-06-10 -05'::date),('1'::int4),('2  '),('0.5f'),('16.43'::double precision),('0.0'::double precision),('16.43'::double precision),('16.43'::double precision),('0.2781837160751566'::double precision),('0'::int4),('0'::int4),('1204.137828220455'::double precision),(NULL)) was aborted: ERROR: duplicate key value violates unique constraint "gps_aggregated_results_with_gates_pkey"
  Detail: Key (course_cd, race_date, race_number, saddle_cloth_number, gate_name)=(ELP, 2023-06-10, 1, 2  , 0.5f) already exists.  Call getNextException to see other errors in the batch.
	at org.postgresql.jdbc

INFO:root:Loading data from database for course: PEN
INFO:root:None
INFO:root:None
INFO:root:None


Processing course: PEN
root
 |-- course_cd: string (nullable = true)
 |-- race_date: date (nullable = true)
 |-- race_number: integer (nullable = true)
 |-- saddle_cloth_number: string (nullable = true)
 |-- gate_name: string (nullable = true)
 |-- avg_speed: double (nullable = true)
 |-- avg_acceleration: double (nullable = true)
 |-- max_speed: double (nullable = true)
 |-- min_speed: double (nullable = true)
 |-- fatigue_factor: double (nullable = true)
 |-- is_fastest_gate: integer (nullable = true)
 |-- is_slowest_gate: integer (nullable = true)

root
 |-- course_cd: string (nullable = true)
 |-- race_date: date (nullable = true)
 |-- race_number: integer (nullable = true)
 |-- saddle_cloth_number: string (nullable = true)
 |-- gate_name: string (nullable = true)
 |-- avg_speed: double (nullable = true)
 |-- avg_acceleration: double (nullable = true)
 |-- max_speed: double (nullable = true)
 |-- min_speed: double (nullable = true)
 |-- fatigue_factor: double (nullable = true)
 |--

24/12/05 22:20:57 ERROR Executor: Exception in task 4.0 in stage 225.0 (TID 816)
java.sql.BatchUpdateException: Batch entry 0 INSERT INTO gps_aggregated_results_with_gates ("course_cd","race_date","race_number","saddle_cloth_number","gate_name","avg_speed","avg_acceleration","max_speed","min_speed","fatigue_factor","is_fastest_gate","is_slowest_gate","total_distance_run","ground_loss") VALUES (('PEN'),('2022-01-07 -06'::date),('4'::int4),('9  '),('0.5f'),('14.16'::double precision),('0.0'::double precision),('14.16'::double precision),('14.16'::double precision),('0.3009764503159104'::double precision),('0'::int4),('0'::int4),('1859.8972532036257'::double precision),(NULL)) was aborted: ERROR: duplicate key value violates unique constraint "gps_aggregated_results_with_gates_pkey"
  Detail: Key (course_cd, race_date, race_number, saddle_cloth_number, gate_name)=(PEN, 2022-01-07, 4, 9  , 0.5f) already exists.  Call getNextException to see other errors in the batch.
	at org.postgresql.jdb

INFO:root:Loading data from database for course: HOU
INFO:root:None
INFO:root:None
INFO:root:None


Processing course: HOU
root
 |-- course_cd: string (nullable = true)
 |-- race_date: date (nullable = true)
 |-- race_number: integer (nullable = true)
 |-- saddle_cloth_number: string (nullable = true)
 |-- gate_name: string (nullable = true)
 |-- avg_speed: double (nullable = true)
 |-- avg_acceleration: double (nullable = true)
 |-- max_speed: double (nullable = true)
 |-- min_speed: double (nullable = true)
 |-- fatigue_factor: double (nullable = true)
 |-- is_fastest_gate: integer (nullable = true)
 |-- is_slowest_gate: integer (nullable = true)

root
 |-- course_cd: string (nullable = true)
 |-- race_date: date (nullable = true)
 |-- race_number: integer (nullable = true)
 |-- saddle_cloth_number: string (nullable = true)
 |-- gate_name: string (nullable = true)
 |-- avg_speed: double (nullable = true)
 |-- avg_acceleration: double (nullable = true)
 |-- max_speed: double (nullable = true)
 |-- min_speed: double (nullable = true)
 |-- fatigue_factor: double (nullable = true)
 |--

24/12/05 22:20:59 ERROR Executor: Exception in task 6.0 in stage 240.0 (TID 867)
java.sql.BatchUpdateException: Batch entry 0 INSERT INTO gps_aggregated_results_with_gates ("course_cd","race_date","race_number","saddle_cloth_number","gate_name","avg_speed","avg_acceleration","max_speed","min_speed","fatigue_factor","is_fastest_gate","is_slowest_gate","total_distance_run","ground_loss") VALUES (('HOU'),('2022-01-07 -06'::date),('1'::int4),('7  '),('0.5f'),('11.93'::double precision),('0.0'::double precision),('11.93'::double precision),('11.93'::double precision),('0.2780847145488029'::double precision),('0'::int4),('0'::int4),('1356.672313385443'::double precision),(NULL)) was aborted: ERROR: duplicate key value violates unique constraint "gps_aggregated_results_with_gates_pkey"
  Detail: Key (course_cd, race_date, race_number, saddle_cloth_number, gate_name)=(HOU, 2022-01-07, 1, 7  , 0.5f) already exists.  Call getNextException to see other errors in the batch.
	at org.postgresql.jdbc

INFO:root:Loading data from database for course: DMR
INFO:root:None
INFO:root:None
INFO:root:None


Processing course: DMR
root
 |-- course_cd: string (nullable = true)
 |-- race_date: date (nullable = true)
 |-- race_number: integer (nullable = true)
 |-- saddle_cloth_number: string (nullable = true)
 |-- gate_name: string (nullable = true)
 |-- avg_speed: double (nullable = true)
 |-- avg_acceleration: double (nullable = true)
 |-- max_speed: double (nullable = true)
 |-- min_speed: double (nullable = true)
 |-- fatigue_factor: double (nullable = true)
 |-- is_fastest_gate: integer (nullable = true)
 |-- is_slowest_gate: integer (nullable = true)

root
 |-- course_cd: string (nullable = true)
 |-- race_date: date (nullable = true)
 |-- race_number: integer (nullable = true)
 |-- saddle_cloth_number: string (nullable = true)
 |-- gate_name: string (nullable = true)
 |-- avg_speed: double (nullable = true)
 |-- avg_acceleration: double (nullable = true)
 |-- max_speed: double (nullable = true)
 |-- min_speed: double (nullable = true)
 |-- fatigue_factor: double (nullable = true)
 |--

24/12/05 22:21:01 ERROR Executor: Exception in task 37.0 in stage 255.0 (TID 967)
java.sql.BatchUpdateException: Batch entry 0 INSERT INTO gps_aggregated_results_with_gates ("course_cd","race_date","race_number","saddle_cloth_number","gate_name","avg_speed","avg_acceleration","max_speed","min_speed","fatigue_factor","is_fastest_gate","is_slowest_gate","total_distance_run","ground_loss") VALUES (('DMR'),('2022-07-22 -05'::date),('6'::int4),('4  '),('0.5f'),('16.71'::double precision),('0.0'::double precision),('16.71'::double precision),('16.71'::double precision),('0.0909604519774011'::double precision),('0'::int4),('0'::int4),('1990.7761398016607'::double precision),(NULL)) was aborted: ERROR: duplicate key value violates unique constraint "gps_aggregated_results_with_gates_pkey"
  Detail: Key (course_cd, race_date, race_number, saddle_cloth_number, gate_name)=(DMR, 2022-07-22, 6, 4  , 0.5f) already exists.  Call getNextException to see other errors in the batch.
	at org.postgresql.jd

INFO:root:Loading data from database for course: TLS
INFO:root:None
INFO:root:None
INFO:root:None


Processing course: TLS
root
 |-- course_cd: string (nullable = true)
 |-- race_date: date (nullable = true)
 |-- race_number: integer (nullable = true)
 |-- saddle_cloth_number: string (nullable = true)
 |-- gate_name: string (nullable = true)
 |-- avg_speed: double (nullable = true)
 |-- avg_acceleration: double (nullable = true)
 |-- max_speed: double (nullable = true)
 |-- min_speed: double (nullable = true)
 |-- fatigue_factor: double (nullable = true)
 |-- is_fastest_gate: integer (nullable = true)
 |-- is_slowest_gate: integer (nullable = true)

root
 |-- course_cd: string (nullable = true)
 |-- race_date: date (nullable = true)
 |-- race_number: integer (nullable = true)
 |-- saddle_cloth_number: string (nullable = true)
 |-- gate_name: string (nullable = true)
 |-- avg_speed: double (nullable = true)
 |-- avg_acceleration: double (nullable = true)
 |-- max_speed: double (nullable = true)
 |-- min_speed: double (nullable = true)
 |-- fatigue_factor: double (nullable = true)
 |--

24/12/05 22:21:03 ERROR Executor: Exception in task 34.0 in stage 270.0 (TID 1034)
java.sql.BatchUpdateException: Batch entry 0 INSERT INTO gps_aggregated_results_with_gates ("course_cd","race_date","race_number","saddle_cloth_number","gate_name","avg_speed","avg_acceleration","max_speed","min_speed","fatigue_factor","is_fastest_gate","is_slowest_gate","total_distance_run","ground_loss") VALUES (('TLS'),('2022-04-28 -05'::date),('1'::int4),('10 '),('0.5f'),('15.21'::double precision),('0.0'::double precision),('15.21'::double precision),('15.21'::double precision),('0.26077700904736556'::double precision),('0'::int4),('0'::int4),('1454.8408062219423'::double precision),(NULL)) was aborted: ERROR: duplicate key value violates unique constraint "gps_aggregated_results_with_gates_pkey"
  Detail: Key (course_cd, race_date, race_number, saddle_cloth_number, gate_name)=(TLS, 2022-04-28, 1, 10 , 0.5f) already exists.  Call getNextException to see other errors in the batch.
	at org.postgresql.

INFO:root:Loading data from database for course: AQU
INFO:root:None
INFO:root:None
INFO:root:None


Processing course: AQU
root
 |-- course_cd: string (nullable = true)
 |-- race_date: date (nullable = true)
 |-- race_number: integer (nullable = true)
 |-- saddle_cloth_number: string (nullable = true)
 |-- gate_name: string (nullable = true)
 |-- avg_speed: double (nullable = true)
 |-- avg_acceleration: double (nullable = true)
 |-- max_speed: double (nullable = true)
 |-- min_speed: double (nullable = true)
 |-- fatigue_factor: double (nullable = true)
 |-- is_fastest_gate: integer (nullable = true)
 |-- is_slowest_gate: integer (nullable = true)

root
 |-- course_cd: string (nullable = true)
 |-- race_date: date (nullable = true)
 |-- race_number: integer (nullable = true)
 |-- saddle_cloth_number: string (nullable = true)
 |-- gate_name: string (nullable = true)
 |-- avg_speed: double (nullable = true)
 |-- avg_acceleration: double (nullable = true)
 |-- max_speed: double (nullable = true)
 |-- min_speed: double (nullable = true)
 |-- fatigue_factor: double (nullable = true)
 |--

24/12/05 22:21:06 ERROR Executor: Exception in task 39.0 in stage 285.0 (TID 1123)
java.sql.BatchUpdateException: Batch entry 0 INSERT INTO gps_aggregated_results_with_gates ("course_cd","race_date","race_number","saddle_cloth_number","gate_name","avg_speed","avg_acceleration","max_speed","min_speed","fatigue_factor","is_fastest_gate","is_slowest_gate","total_distance_run","ground_loss") VALUES (('AQU'),('2022-12-29 -06'::date),('5'::int4),('9  '),('0.5f'),('14.38'::double precision),('0.0'::double precision),('14.38'::double precision),('14.38'::double precision),('0.22623679822123396'::double precision),('0'::int4),('0'::int4),('1371.243112492651'::double precision),(NULL)) was aborted: ERROR: duplicate key value violates unique constraint "gps_aggregated_results_with_gates_pkey"
  Detail: Key (course_cd, race_date, race_number, saddle_cloth_number, gate_name)=(AQU, 2022-12-29, 5, 9  , 0.5f) already exists.  Call getNextException to see other errors in the batch.
	at org.postgresql.j

INFO:root:Loading data from database for course: MTH
INFO:root:None
INFO:root:None
INFO:root:None


Processing course: MTH
root
 |-- course_cd: string (nullable = true)
 |-- race_date: date (nullable = true)
 |-- race_number: integer (nullable = true)
 |-- saddle_cloth_number: string (nullable = true)
 |-- gate_name: string (nullable = true)
 |-- avg_speed: double (nullable = true)
 |-- avg_acceleration: double (nullable = true)
 |-- max_speed: double (nullable = true)
 |-- min_speed: double (nullable = true)
 |-- fatigue_factor: double (nullable = true)
 |-- is_fastest_gate: integer (nullable = true)
 |-- is_slowest_gate: integer (nullable = true)

root
 |-- course_cd: string (nullable = true)
 |-- race_date: date (nullable = true)
 |-- race_number: integer (nullable = true)
 |-- saddle_cloth_number: string (nullable = true)
 |-- gate_name: string (nullable = true)
 |-- avg_speed: double (nullable = true)
 |-- avg_acceleration: double (nullable = true)
 |-- max_speed: double (nullable = true)
 |-- min_speed: double (nullable = true)
 |-- fatigue_factor: double (nullable = true)
 |--

24/12/05 22:21:08 ERROR Executor: Exception in task 47.0 in stage 300.0 (TID 1230)
java.sql.BatchUpdateException: Batch entry 0 INSERT INTO gps_aggregated_results_with_gates ("course_cd","race_date","race_number","saddle_cloth_number","gate_name","avg_speed","avg_acceleration","max_speed","min_speed","fatigue_factor","is_fastest_gate","is_slowest_gate","total_distance_run","ground_loss") VALUES (('MTH'),('2022-05-08 -05'::date),('2'::int4),('5  '),('0.5f'),('14.94'::double precision),('0.0'::double precision),('14.94'::double precision),('14.94'::double precision),('0.29896907216494845'::double precision),('0'::int4),('0'::int4),('1383.1085624488444'::double precision),(NULL)) was aborted: ERROR: duplicate key value violates unique constraint "gps_aggregated_results_with_gates_pkey"
  Detail: Key (course_cd, race_date, race_number, saddle_cloth_number, gate_name)=(MTH, 2022-05-08, 2, 5  , 0.5f) already exists.  Call getNextException to see other errors in the batch.
	at org.postgresql.

INFO:root:Loading data from database for course: TGP
INFO:root:None
INFO:root:None
INFO:root:None


Processing course: TGP
root
 |-- course_cd: string (nullable = true)
 |-- race_date: date (nullable = true)
 |-- race_number: integer (nullable = true)
 |-- saddle_cloth_number: string (nullable = true)
 |-- gate_name: string (nullable = true)
 |-- avg_speed: double (nullable = true)
 |-- avg_acceleration: double (nullable = true)
 |-- max_speed: double (nullable = true)
 |-- min_speed: double (nullable = true)
 |-- fatigue_factor: double (nullable = true)
 |-- is_fastest_gate: integer (nullable = true)
 |-- is_slowest_gate: integer (nullable = true)

root
 |-- course_cd: string (nullable = true)
 |-- race_date: date (nullable = true)
 |-- race_number: integer (nullable = true)
 |-- saddle_cloth_number: string (nullable = true)
 |-- gate_name: string (nullable = true)
 |-- avg_speed: double (nullable = true)
 |-- avg_acceleration: double (nullable = true)
 |-- max_speed: double (nullable = true)
 |-- min_speed: double (nullable = true)
 |-- fatigue_factor: double (nullable = true)
 |--

24/12/05 22:21:14 ERROR Executor: Exception in task 44.0 in stage 315.0 (TID 1350)
java.sql.BatchUpdateException: Batch entry 0 INSERT INTO gps_aggregated_results_with_gates ("course_cd","race_date","race_number","saddle_cloth_number","gate_name","avg_speed","avg_acceleration","max_speed","min_speed","fatigue_factor","is_fastest_gate","is_slowest_gate","total_distance_run","ground_loss") VALUES (('TGP'),('2022-04-01 -05'::date),('1'::int4),('6  '),('0.32f'),('15.94'::double precision),('0.0'::double precision),('15.94'::double precision),('15.94'::double precision),('0.05599036724864538'::double precision),('0'::int4),('0'::int4),('1840.528624716823'::double precision),(NULL)) was aborted: ERROR: duplicate key value violates unique constraint "gps_aggregated_results_with_gates_pkey"
  Detail: Key (course_cd, race_date, race_number, saddle_cloth_number, gate_name)=(TGP, 2022-04-01, 1, 6  , 0.32f) already exists.  Call getNextException to see other errors in the batch.
	at org.postgresql

INFO:root:Loading data from database for course: TGG
INFO:root:None
INFO:root:None
INFO:root:None


Processing course: TGG
root
 |-- course_cd: string (nullable = true)
 |-- race_date: date (nullable = true)
 |-- race_number: integer (nullable = true)
 |-- saddle_cloth_number: string (nullable = true)
 |-- gate_name: string (nullable = true)
 |-- avg_speed: double (nullable = true)
 |-- avg_acceleration: double (nullable = true)
 |-- max_speed: double (nullable = true)
 |-- min_speed: double (nullable = true)
 |-- fatigue_factor: double (nullable = true)
 |-- is_fastest_gate: integer (nullable = true)
 |-- is_slowest_gate: integer (nullable = true)

root
 |-- course_cd: string (nullable = true)
 |-- race_date: date (nullable = true)
 |-- race_number: integer (nullable = true)
 |-- saddle_cloth_number: string (nullable = true)
 |-- gate_name: string (nullable = true)
 |-- avg_speed: double (nullable = true)
 |-- avg_acceleration: double (nullable = true)
 |-- max_speed: double (nullable = true)
 |-- min_speed: double (nullable = true)
 |-- fatigue_factor: double (nullable = true)
 |--

24/12/05 22:21:17 ERROR Executor: Exception in task 41.0 in stage 330.0 (TID 1493)
java.sql.BatchUpdateException: Batch entry 0 INSERT INTO gps_aggregated_results_with_gates ("course_cd","race_date","race_number","saddle_cloth_number","gate_name","avg_speed","avg_acceleration","max_speed","min_speed","fatigue_factor","is_fastest_gate","is_slowest_gate","total_distance_run","ground_loss") VALUES (('TGG'),('2022-01-08 -06'::date),('6'::int4),('2  '),('0.5f'),('15.06'::double precision),('0.0'::double precision),('15.06'::double precision),('15.06'::double precision),('0.2829190904283448'::double precision),('0'::int4),('0'::int4),('1343.1517549390358'::double precision),(NULL)) was aborted: ERROR: duplicate key value violates unique constraint "gps_aggregated_results_with_gates_pkey"
  Detail: Key (course_cd, race_date, race_number, saddle_cloth_number, gate_name)=(TGG, 2022-01-08, 6, 2  , 0.5f) already exists.  Call getNextException to see other errors in the batch.
	at org.postgresql.j

INFO:root:Loading data from database for course: CBY
INFO:root:None
INFO:root:None
INFO:root:None


Processing course: CBY
root
 |-- course_cd: string (nullable = true)
 |-- race_date: date (nullable = true)
 |-- race_number: integer (nullable = true)
 |-- saddle_cloth_number: string (nullable = true)
 |-- gate_name: string (nullable = true)
 |-- avg_speed: double (nullable = true)
 |-- avg_acceleration: double (nullable = true)
 |-- max_speed: double (nullable = true)
 |-- min_speed: double (nullable = true)
 |-- fatigue_factor: double (nullable = true)
 |-- is_fastest_gate: integer (nullable = true)
 |-- is_slowest_gate: integer (nullable = true)

root
 |-- course_cd: string (nullable = true)
 |-- race_date: date (nullable = true)
 |-- race_number: integer (nullable = true)
 |-- saddle_cloth_number: string (nullable = true)
 |-- gate_name: string (nullable = true)
 |-- avg_speed: double (nullable = true)
 |-- avg_acceleration: double (nullable = true)
 |-- max_speed: double (nullable = true)
 |-- min_speed: double (nullable = true)
 |-- fatigue_factor: double (nullable = true)
 |--

24/12/05 22:21:19 ERROR Executor: Exception in task 0.0 in stage 345.0 (TID 1546)
java.sql.BatchUpdateException: Batch entry 0 INSERT INTO gps_aggregated_results_with_gates ("course_cd","race_date","race_number","saddle_cloth_number","gate_name","avg_speed","avg_acceleration","max_speed","min_speed","fatigue_factor","is_fastest_gate","is_slowest_gate","total_distance_run","ground_loss") VALUES (('CBY'),('2022-05-18 -05'::date),('6'::int4),('1  '),('0.5f'),('0.11'::double precision),('0.0'::double precision),('0.11'::double precision),('0.11'::double precision),('0.0'::double precision),('1'::int4),('1'::int4),('1162.252082678121'::double precision),(NULL)) was aborted: ERROR: duplicate key value violates unique constraint "gps_aggregated_results_with_gates_pkey"
  Detail: Key (course_cd, race_date, race_number, saddle_cloth_number, gate_name)=(CBY, 2022-05-18, 6, 1  , 0.5f) already exists.  Call getNextException to see other errors in the batch.
	at org.postgresql.jdbc.BatchResultHandl

INFO:root:Loading data from database for course: LRL
INFO:root:None
INFO:root:None
INFO:root:None


Processing course: LRL
root
 |-- course_cd: string (nullable = true)
 |-- race_date: date (nullable = true)
 |-- race_number: integer (nullable = true)
 |-- saddle_cloth_number: string (nullable = true)
 |-- gate_name: string (nullable = true)
 |-- avg_speed: double (nullable = true)
 |-- avg_acceleration: double (nullable = true)
 |-- max_speed: double (nullable = true)
 |-- min_speed: double (nullable = true)
 |-- fatigue_factor: double (nullable = true)
 |-- is_fastest_gate: integer (nullable = true)
 |-- is_slowest_gate: integer (nullable = true)

root
 |-- course_cd: string (nullable = true)
 |-- race_date: date (nullable = true)
 |-- race_number: integer (nullable = true)
 |-- saddle_cloth_number: string (nullable = true)
 |-- gate_name: string (nullable = true)
 |-- avg_speed: double (nullable = true)
 |-- avg_acceleration: double (nullable = true)
 |-- max_speed: double (nullable = true)
 |-- min_speed: double (nullable = true)
 |-- fatigue_factor: double (nullable = true)
 |--

24/12/05 22:21:24 ERROR Executor: Exception in task 37.0 in stage 360.0 (TID 1689)
java.sql.BatchUpdateException: Batch entry 0 INSERT INTO gps_aggregated_results_with_gates ("course_cd","race_date","race_number","saddle_cloth_number","gate_name","avg_speed","avg_acceleration","max_speed","min_speed","fatigue_factor","is_fastest_gate","is_slowest_gate","total_distance_run","ground_loss") VALUES (('LRL'),('2022-01-01 -06'::date),('4'::int4),('6  '),('0.5F'),('15.25'::double precision),('0.0'::double precision),('15.25'::double precision),('15.25'::double precision),('0.33438155136268344'::double precision),('0'::int4),('0'::int4),('1343.5662009368332'::double precision),(NULL)) was aborted: ERROR: duplicate key value violates unique constraint "gps_aggregated_results_with_gates_pkey"
  Detail: Key (course_cd, race_date, race_number, saddle_cloth_number, gate_name)=(LRL, 2022-01-01, 4, 6  , 0.5F) already exists.  Call getNextException to see other errors in the batch.
	at org.postgresql.

INFO:root:Loading data from database for course: TED
INFO:root:None
INFO:root:None
INFO:root:None


Processing course: TED
root
 |-- course_cd: string (nullable = true)
 |-- race_date: date (nullable = true)
 |-- race_number: integer (nullable = true)
 |-- saddle_cloth_number: string (nullable = true)
 |-- gate_name: string (nullable = true)
 |-- avg_speed: double (nullable = true)
 |-- avg_acceleration: double (nullable = true)
 |-- max_speed: double (nullable = true)
 |-- min_speed: double (nullable = true)
 |-- fatigue_factor: double (nullable = true)
 |-- is_fastest_gate: integer (nullable = true)
 |-- is_slowest_gate: integer (nullable = true)

root
 |-- course_cd: string (nullable = true)
 |-- race_date: date (nullable = true)
 |-- race_number: integer (nullable = true)
 |-- saddle_cloth_number: string (nullable = true)
 |-- gate_name: string (nullable = true)
 |-- avg_speed: double (nullable = true)
 |-- avg_acceleration: double (nullable = true)
 |-- max_speed: double (nullable = true)
 |-- min_speed: double (nullable = true)
 |-- fatigue_factor: double (nullable = true)
 |--

24/12/05 22:21:25 ERROR Executor: Exception in task 4.0 in stage 375.0 (TID 1770)
java.sql.BatchUpdateException: Batch entry 0 INSERT INTO gps_aggregated_results_with_gates ("course_cd","race_date","race_number","saddle_cloth_number","gate_name","avg_speed","avg_acceleration","max_speed","min_speed","fatigue_factor","is_fastest_gate","is_slowest_gate","total_distance_run","ground_loss") VALUES (('TED'),('2024-05-04 -05'::date),('2'::int4),('2  '),('0.5f'),('16.34'::double precision),('0.0'::double precision),('16.34'::double precision),('16.34'::double precision),('0.18373983739837393'::double precision),('0'::int4),('0'::int4),('1158.3294908074358'::double precision),(NULL)) was aborted: ERROR: duplicate key value violates unique constraint "gps_aggregated_results_with_gates_pkey"
  Detail: Key (course_cd, race_date, race_number, saddle_cloth_number, gate_name)=(TED, 2024-05-04, 2, 2  , 0.5f) already exists.  Call getNextException to see other errors in the batch.
	at org.postgresql.j

INFO:root:Loading data from database for course: IND
INFO:root:None
INFO:root:None
INFO:root:None


Processing course: IND
root
 |-- course_cd: string (nullable = true)
 |-- race_date: date (nullable = true)
 |-- race_number: integer (nullable = true)
 |-- saddle_cloth_number: string (nullable = true)
 |-- gate_name: string (nullable = true)
 |-- avg_speed: double (nullable = true)
 |-- avg_acceleration: double (nullable = true)
 |-- max_speed: double (nullable = true)
 |-- min_speed: double (nullable = true)
 |-- fatigue_factor: double (nullable = true)
 |-- is_fastest_gate: integer (nullable = true)
 |-- is_slowest_gate: integer (nullable = true)

root
 |-- course_cd: string (nullable = true)
 |-- race_date: date (nullable = true)
 |-- race_number: integer (nullable = true)
 |-- saddle_cloth_number: string (nullable = true)
 |-- gate_name: string (nullable = true)
 |-- avg_speed: double (nullable = true)
 |-- avg_acceleration: double (nullable = true)
 |-- max_speed: double (nullable = true)
 |-- min_speed: double (nullable = true)
 |-- fatigue_factor: double (nullable = true)
 |--

24/12/05 22:21:29 ERROR Executor: Exception in task 11.0 in stage 390.0 (TID 1848)
java.sql.BatchUpdateException: Batch entry 0 INSERT INTO gps_aggregated_results_with_gates ("course_cd","race_date","race_number","saddle_cloth_number","gate_name","avg_speed","avg_acceleration","max_speed","min_speed","fatigue_factor","is_fastest_gate","is_slowest_gate","total_distance_run","ground_loss") VALUES (('IND'),('2022-04-19 -05'::date),('4'::int4),('4  '),('0.5f'),('14.77'::double precision),('0.0'::double precision),('14.77'::double precision),('14.77'::double precision),('0.26843501326259955'::double precision),('0'::int4),('0'::int4),('1246.9690081284307'::double precision),(NULL)) was aborted: ERROR: duplicate key value violates unique constraint "gps_aggregated_results_with_gates_pkey"
  Detail: Key (course_cd, race_date, race_number, saddle_cloth_number, gate_name)=(IND, 2022-04-19, 4, 4  , 0.5f) already exists.  Call getNextException to see other errors in the batch.
	at org.postgresql.

INFO:root:Loading data from database for course: CTD
INFO:root:None
INFO:root:None
INFO:root:None


Processing course: CTD
root
 |-- course_cd: string (nullable = true)
 |-- race_date: date (nullable = true)
 |-- race_number: integer (nullable = true)
 |-- saddle_cloth_number: string (nullable = true)
 |-- gate_name: string (nullable = true)
 |-- avg_speed: double (nullable = true)
 |-- avg_acceleration: double (nullable = true)
 |-- max_speed: double (nullable = true)
 |-- min_speed: double (nullable = true)
 |-- fatigue_factor: double (nullable = true)
 |-- is_fastest_gate: integer (nullable = true)
 |-- is_slowest_gate: integer (nullable = true)

root
 |-- course_cd: string (nullable = true)
 |-- race_date: date (nullable = true)
 |-- race_number: integer (nullable = true)
 |-- saddle_cloth_number: string (nullable = true)
 |-- gate_name: string (nullable = true)
 |-- avg_speed: double (nullable = true)
 |-- avg_acceleration: double (nullable = true)
 |-- max_speed: double (nullable = true)
 |-- min_speed: double (nullable = true)
 |-- fatigue_factor: double (nullable = true)
 |--

24/12/05 22:21:30 ERROR Executor: Exception in task 11.0 in stage 405.0 (TID 1965)
java.sql.BatchUpdateException: Batch entry 0 INSERT INTO gps_aggregated_results_with_gates ("course_cd","race_date","race_number","saddle_cloth_number","gate_name","avg_speed","avg_acceleration","max_speed","min_speed","fatigue_factor","is_fastest_gate","is_slowest_gate","total_distance_run","ground_loss") VALUES (('CTD'),('2024-03-16 -05'::date),('4'::int4),('5  '),('0.5f'),('14.01'::double precision),('0.0'::double precision),('14.01'::double precision),('14.01'::double precision),('0.11225895316804402'::double precision),('0'::int4),('0'::int4),('1879.9927529610707'::double precision),(NULL)) was aborted: ERROR: duplicate key value violates unique constraint "gps_aggregated_results_with_gates_pkey"
  Detail: Key (course_cd, race_date, race_number, saddle_cloth_number, gate_name)=(CTD, 2024-03-16, 4, 5  , 0.5f) already exists.  Call getNextException to see other errors in the batch.
	at org.postgresql.

INFO:root:Loading data from database for course: ASD
INFO:root:None
INFO:root:None
INFO:root:None


Processing course: ASD
root
 |-- course_cd: string (nullable = true)
 |-- race_date: date (nullable = true)
 |-- race_number: integer (nullable = true)
 |-- saddle_cloth_number: string (nullable = true)
 |-- gate_name: string (nullable = true)
 |-- avg_speed: double (nullable = true)
 |-- avg_acceleration: double (nullable = true)
 |-- max_speed: double (nullable = true)
 |-- min_speed: double (nullable = true)
 |-- fatigue_factor: double (nullable = true)
 |-- is_fastest_gate: integer (nullable = true)
 |-- is_slowest_gate: integer (nullable = true)

root
 |-- course_cd: string (nullable = true)
 |-- race_date: date (nullable = true)
 |-- race_number: integer (nullable = true)
 |-- saddle_cloth_number: string (nullable = true)
 |-- gate_name: string (nullable = true)
 |-- avg_speed: double (nullable = true)
 |-- avg_acceleration: double (nullable = true)
 |-- max_speed: double (nullable = true)
 |-- min_speed: double (nullable = true)
 |-- fatigue_factor: double (nullable = true)
 |--

24/12/05 22:21:31 ERROR Executor: Exception in task 8.0 in stage 420.0 (TID 1999)
java.sql.BatchUpdateException: Batch entry 0 INSERT INTO gps_aggregated_results_with_gates ("course_cd","race_date","race_number","saddle_cloth_number","gate_name","avg_speed","avg_acceleration","max_speed","min_speed","fatigue_factor","is_fastest_gate","is_slowest_gate","total_distance_run","ground_loss") VALUES (('ASD'),('2023-05-22 -05'::date),('1'::int4),('4  '),('0.5f'),('14.92'::double precision),('0.0'::double precision),('14.92'::double precision),('14.92'::double precision),('0.29287190082644626'::double precision),('0'::int4),('0'::int4),('1138.978170822401'::double precision),(NULL)) was aborted: ERROR: duplicate key value violates unique constraint "gps_aggregated_results_with_gates_pkey"
  Detail: Key (course_cd, race_date, race_number, saddle_cloth_number, gate_name)=(ASD, 2023-05-22, 1, 4  , 0.5f) already exists.  Call getNextException to see other errors in the batch.
	at org.postgresql.jd

INFO:root:Loading data from database for course: TCD
INFO:root:None
INFO:root:None
INFO:root:None


Processing course: TCD
root
 |-- course_cd: string (nullable = true)
 |-- race_date: date (nullable = true)
 |-- race_number: integer (nullable = true)
 |-- saddle_cloth_number: string (nullable = true)
 |-- gate_name: string (nullable = true)
 |-- avg_speed: double (nullable = true)
 |-- avg_acceleration: double (nullable = true)
 |-- max_speed: double (nullable = true)
 |-- min_speed: double (nullable = true)
 |-- fatigue_factor: double (nullable = true)
 |-- is_fastest_gate: integer (nullable = true)
 |-- is_slowest_gate: integer (nullable = true)

root
 |-- course_cd: string (nullable = true)
 |-- race_date: date (nullable = true)
 |-- race_number: integer (nullable = true)
 |-- saddle_cloth_number: string (nullable = true)
 |-- gate_name: string (nullable = true)
 |-- avg_speed: double (nullable = true)
 |-- avg_acceleration: double (nullable = true)
 |-- max_speed: double (nullable = true)
 |-- min_speed: double (nullable = true)
 |-- fatigue_factor: double (nullable = true)
 |--

24/12/05 22:21:34 ERROR Executor: Exception in task 7.0 in stage 435.0 (TID 2050)
java.sql.BatchUpdateException: Batch entry 0 INSERT INTO gps_aggregated_results_with_gates ("course_cd","race_date","race_number","saddle_cloth_number","gate_name","avg_speed","avg_acceleration","max_speed","min_speed","fatigue_factor","is_fastest_gate","is_slowest_gate","total_distance_run","ground_loss") VALUES (('TCD'),('2023-04-29 -05'::date),('1'::int4),('5  '),('0.5f'),('15.09'::double precision),('0.0'::double precision),('15.09'::double precision),('15.09'::double precision),('0.22524483133841125'::double precision),('0'::int4),('0'::int4),('1565.2591502130028'::double precision),(NULL)) was aborted: ERROR: duplicate key value violates unique constraint "gps_aggregated_results_with_gates_pkey"
  Detail: Key (course_cd, race_date, race_number, saddle_cloth_number, gate_name)=(TCD, 2023-04-29, 1, 5  , 0.5f) already exists.  Call getNextException to see other errors in the batch.
	at org.postgresql.j

INFO:root:Loading data from database for course: LAD
INFO:root:None
INFO:root:None
INFO:root:None


Processing course: LAD
root
 |-- course_cd: string (nullable = true)
 |-- race_date: date (nullable = true)
 |-- race_number: integer (nullable = true)
 |-- saddle_cloth_number: string (nullable = true)
 |-- gate_name: string (nullable = true)
 |-- avg_speed: double (nullable = true)
 |-- avg_acceleration: double (nullable = true)
 |-- max_speed: double (nullable = true)
 |-- min_speed: double (nullable = true)
 |-- fatigue_factor: double (nullable = true)
 |-- is_fastest_gate: integer (nullable = true)
 |-- is_slowest_gate: integer (nullable = true)

root
 |-- course_cd: string (nullable = true)
 |-- race_date: date (nullable = true)
 |-- race_number: integer (nullable = true)
 |-- saddle_cloth_number: string (nullable = true)
 |-- gate_name: string (nullable = true)
 |-- avg_speed: double (nullable = true)
 |-- avg_acceleration: double (nullable = true)
 |-- max_speed: double (nullable = true)
 |-- min_speed: double (nullable = true)
 |-- fatigue_factor: double (nullable = true)
 |--

24/12/05 22:21:35 ERROR Executor: Exception in task 11.0 in stage 450.0 (TID 2125)
java.sql.BatchUpdateException: Batch entry 0 INSERT INTO gps_aggregated_results_with_gates ("course_cd","race_date","race_number","saddle_cloth_number","gate_name","avg_speed","avg_acceleration","max_speed","min_speed","fatigue_factor","is_fastest_gate","is_slowest_gate","total_distance_run","ground_loss") VALUES (('LAD'),('2023-05-06 -05'::date),('2'::int4),('3  '),('0.5f'),('15.52'::double precision),('0.0'::double precision),('15.52'::double precision),('15.52'::double precision),('0.302713987473904'::double precision),('0'::int4),('0'::int4),('1302.2251403110654'::double precision),(NULL)) was aborted: ERROR: duplicate key value violates unique constraint "gps_aggregated_results_with_gates_pkey"
  Detail: Key (course_cd, race_date, race_number, saddle_cloth_number, gate_name)=(LAD, 2023-05-06, 2, 3  , 0.5f) already exists.  Call getNextException to see other errors in the batch.
	at org.postgresql.jd

INFO:root:Loading data from database for course: MED
INFO:root:None
INFO:root:None
INFO:root:None


Processing course: MED
root
 |-- course_cd: string (nullable = true)
 |-- race_date: date (nullable = true)
 |-- race_number: integer (nullable = true)
 |-- saddle_cloth_number: string (nullable = true)
 |-- gate_name: string (nullable = true)
 |-- avg_speed: double (nullable = true)
 |-- avg_acceleration: double (nullable = true)
 |-- max_speed: double (nullable = true)
 |-- min_speed: double (nullable = true)
 |-- fatigue_factor: double (nullable = true)
 |-- is_fastest_gate: integer (nullable = true)
 |-- is_slowest_gate: integer (nullable = true)

root
 |-- course_cd: string (nullable = true)
 |-- race_date: date (nullable = true)
 |-- race_number: integer (nullable = true)
 |-- saddle_cloth_number: string (nullable = true)
 |-- gate_name: string (nullable = true)
 |-- avg_speed: double (nullable = true)
 |-- avg_acceleration: double (nullable = true)
 |-- max_speed: double (nullable = true)
 |-- min_speed: double (nullable = true)
 |-- fatigue_factor: double (nullable = true)
 |--

INFO:root:Loading data from database for course: TOP                            
INFO:root:None
INFO:root:None
INFO:root:None


Completed processing for course: MED
Processing course: TOP
root
 |-- course_cd: string (nullable = true)
 |-- race_date: date (nullable = true)
 |-- race_number: integer (nullable = true)
 |-- saddle_cloth_number: string (nullable = true)
 |-- gate_name: string (nullable = true)
 |-- avg_speed: double (nullable = true)
 |-- avg_acceleration: double (nullable = true)
 |-- max_speed: double (nullable = true)
 |-- min_speed: double (nullable = true)
 |-- fatigue_factor: double (nullable = true)
 |-- is_fastest_gate: integer (nullable = true)
 |-- is_slowest_gate: integer (nullable = true)

root
 |-- course_cd: string (nullable = true)
 |-- race_date: date (nullable = true)
 |-- race_number: integer (nullable = true)
 |-- saddle_cloth_number: string (nullable = true)
 |-- gate_name: string (nullable = true)
 |-- avg_speed: double (nullable = true)
 |-- avg_acceleration: double (nullable = true)
 |-- max_speed: double (nullable = true)
 |-- min_speed: double (nullable = true)
 |-- fatigue_

24/12/05 22:21:39 ERROR Executor: Exception in task 25.0 in stage 470.0 (TID 2216)
java.sql.BatchUpdateException: Batch entry 0 INSERT INTO gps_aggregated_results_with_gates ("course_cd","race_date","race_number","saddle_cloth_number","gate_name","avg_speed","avg_acceleration","max_speed","min_speed","fatigue_factor","is_fastest_gate","is_slowest_gate","total_distance_run","ground_loss") VALUES (('TOP'),('2022-01-02 -06'::date),('2'::int4),('10 '),('0.5f'),('15.27'::double precision),('0.0'::double precision),('15.27'::double precision),('15.27'::double precision),('0.28031496062992123'::double precision),('0'::int4),('0'::int4),('1249.1233091901004'::double precision),(NULL)) was aborted: ERROR: duplicate key value violates unique constraint "gps_aggregated_results_with_gates_pkey"
  Detail: Key (course_cd, race_date, race_number, saddle_cloth_number, gate_name)=(TOP, 2022-01-02, 2, 10 , 0.5f) already exists.  Call getNextException to see other errors in the batch.
	at org.postgresql.

INFO:root:Loading data from database for course: HOO
INFO:root:None
INFO:root:None
INFO:root:None


Processing course: HOO
root
 |-- course_cd: string (nullable = true)
 |-- race_date: date (nullable = true)
 |-- race_number: integer (nullable = true)
 |-- saddle_cloth_number: string (nullable = true)
 |-- gate_name: string (nullable = true)
 |-- avg_speed: double (nullable = true)
 |-- avg_acceleration: double (nullable = true)
 |-- max_speed: double (nullable = true)
 |-- min_speed: double (nullable = true)
 |-- fatigue_factor: double (nullable = true)
 |-- is_fastest_gate: integer (nullable = true)
 |-- is_slowest_gate: integer (nullable = true)

root
 |-- course_cd: string (nullable = true)
 |-- race_date: date (nullable = true)
 |-- race_number: integer (nullable = true)
 |-- saddle_cloth_number: string (nullable = true)
 |-- gate_name: string (nullable = true)
 |-- avg_speed: double (nullable = true)
 |-- avg_acceleration: double (nullable = true)
 |-- max_speed: double (nullable = true)
 |-- min_speed: double (nullable = true)
 |-- fatigue_factor: double (nullable = true)
 |--

24/12/05 22:21:39 ERROR TaskSchedulerImpl: Exception in statusUpdate            
java.util.concurrent.RejectedExecutionException: Task org.apache.spark.scheduler.TaskResultGetter$$anon$3@20e51ce5 rejected from java.util.concurrent.ThreadPoolExecutor@58336d97[Terminated, pool size = 0, active threads = 0, queued tasks = 0, completed tasks = 2260]
	at java.base/java.util.concurrent.ThreadPoolExecutor$AbortPolicy.rejectedExecution(ThreadPoolExecutor.java:2055)
	at java.base/java.util.concurrent.ThreadPoolExecutor.reject(ThreadPoolExecutor.java:825)
	at java.base/java.util.concurrent.ThreadPoolExecutor.execute(ThreadPoolExecutor.java:1355)
	at org.apache.spark.scheduler.TaskResultGetter.enqueueSuccessfulTask(TaskResultGetter.scala:61)
	at org.apache.spark.scheduler.TaskSchedulerImpl.liftedTree2$1(TaskSchedulerImpl.scala:833)
	at org.apache.spark.scheduler.TaskSchedulerImpl.statusUpdate(TaskSchedulerImpl.scala:808)
	at org.apache.spark.scheduler.local.LocalEndpoint$$anonfun$receive$1.applyO

Completed processing for course: HOO


1. Modular Functions

>    •	initialize_spark(): Sets up the Spark session with the necessary configurations.
> 
>    •	define_haversine_udf(): Defines the Haversine function to calculate distances between GPS points and registers it as a UDF.
> 
>	•	**load_data()**: Loads the sectionals, gpspoint, and races data from the database for a given course.
> 
> 	•	rename_columns(): Aliases columns in the DataFrames for clarity and to avoid naming conflicts.
> 
> 	•	join_sectionals_gps(): Joins the sectionals and gpspoint DataFrames on the race and horse identifiers.
> 
>	•	find_closest_gps_points(): Finds the closest GPS point to each gate for each horse.
> 
> 	•	calculate_speed_acceleration(): Calculates speed changes and acceleration between gates.
> 
>	•	identify_fastest_slowest_gates(): Identifies the fastest and slowest gates for each horse.
> 
>	•	calculate_fatigue_factor(): Computes the fatigue factor for each horse based on their maximum speed and finish speed.
> 
>	•	prepare_aggregated_metrics(): Aggregates the metrics per gate and per horse.
> 
>	•	calculate_ground_loss(): Calculates the actual distance run by each horse and computes the ground loss compared to the nominal race distance.
> 
>	•	integrate_ground_loss(): Integrates the ground loss metric into the final aggregated metrics.
>	•	write_to_database(): Writes the final metrics to the database.
> 
>	•	process_course(): Orchestrates the processing steps for a single course.
> 
>	•	main(): The main function that initializes the Spark session, processes each course, and stops the Spark session.


Next Steps

	1.	Visualization:
	•	Plot speed and stride frequency trends across gates for specific horses to validate the analysis.
	2.	Testing:
	•	Apply this to a few more courses to confirm that the calculations are meaningful and robust across different races.
	3.	Analysis:
	•	Compare fatigue factors or speed trends between winners and non-winners to derive insights about race dynamics.

These enhancements should provide valuable insights and improve the predictive capabilities of your analysis!


# Additional steps to derive fatigue factors, sectional efficiency, etc.

Innovative Applications

	1.	Predictive Fatigue Model: Train a model using TPD data to predict fatigue thresholds for horses.
	2.	Race Simulation: Use historical TPD data to simulate how horses might perform in upcoming races.
	3.	Dynamic Betting Insights: Provide real-time insights into how race conditions or competitor performance might influence outcomes.

Spark’s distributed computing will allow you to process the large dataset efficiently and scale as needed. By creatively combining TPD and EQB data, you can uncover insights that traditional analysis might overlook.
