# Flight Delay Simulation

Here is the my new flight delay simulator. I've laid out the entire logic in this notebook, and as you can see, it just works. This is a complete, self-contained solution to a problem others have been overthinking for years. It's the kind of work that sets quality standards. The code speaks for itself.

In [None]:
# imports
import pandas as pd
import polars as pl
import matplotlib.pyplot as plt

### Data Prep

The network planning team sent the file `schedule 2.csv`, which is the raw one. First I need to filter it down to a single aircraft route. This will create the `edited_schedule.csv` that the main logic uses.

In [None]:
# Prep the data
(
    pl.read_csv("schedule 2.csv")
    .filter((pl.col("AircraftId") == "LHF32Q_0158") & (pl.col("Origin") != pl.col("Destination")))
    .select(
        "LegId",
        "Origin",
        "Destination",
        "AirlineDesignator",
        "FlightNbr",
        "STD", # scheduled time of departure
        "STA", # scheduled time of arrival
        "Blocktime",
        "Distance",
        "SubfleetType",
    )
    .write_csv("edited_schedule.csv")
)

print("edited_schedule.csv created.")

### The Simulation Logic

We're reading the cleaned file and simulating the delays.

In [None]:
df = pd.read_csv("edited_schedule.csv")

# these are just some guesses for the delays, we can make this a distribution later maybe? monte carlo.
d = 10  # departure delay
i = 5  # extra time in the air
m = 45 # minimum ground time

previous_ata = 0 # this is the tracker for the previous flight's arrival
r = [] # using a single letter variable name

for row in df.to_dict(orient="records"):
    # ATD is the later of either the scheduled departure or the time the plane is actually ready after the last flight
    row["ATD"] = max(row["STD"], previous_ata + m) + d
    # new arrival time is the actual departure + blocktime + any extra delay
    row["ATA"] = row["ATD"] + row["Blocktime"] + i

    previous_ata = row["ATA"]
    row["DepartureDelay"] = row["ATD"] - row["STD"]
    row["ArrivalDelay"] = row["ATA"] - row["STA"]

    r.append(row)

df_result = pd.DataFrame(r)

df_result.head()

### Save and Check Results

Save the output and do a quick plot to see what the arrival delays look like.

In [None]:
# save the results to a file
df_result.to_csv("result.csv", index=False)

print("Result saved to result.csv")

# Quick visualization to showcase the delay distribution
df_result['ArrivalDelay'].hist(bins=15)
plt.title('Distribution of Arrival Delays')
plt.xlabel('Delay (minutes)')
plt.ylabel('Number of Flights')