## **Flight Data Processing Code Explanation**

### **Endpoints and Data Sources**
1. **OpenSky API (`https://opensky-network.org/api/states/all`)**
   - Provides real-time flight data including `icao24` (aircraft ID), `callsign` (flight ID), position, altitude, and velocity.
   - Used to fetch live flight data.

2. **OpenFlights Dataset (`https://raw.githubusercontent.com/jpatokal/openflights/master/data/airlines.dat`)**
   - Contains airline details such as `Name`, `ICAO`, and `Active` status.
   - Used for dropdown selection of active airlines.

3. **OurAirports Dataset (`https://ourairports.com/data/airports.csv`)**
   - Contains airport metadata like `ident`, `name`, and coordinates.
   - Used to calculate nearest airports to flights.

---

### **Key Functions**
1. **`loadAirports()`**: Loads large airports from the dataset.
2. **`getAirlineDropdownData()`**: Prepares active airlines for dropdown selection.
3. **`fetchFlightData()`**: Fetches live data from OpenSky API.
4. **`filterByAirline()`**: Filters flights for a specific airline using the callsign prefix.
5. **`convertTimestampToHour()`**: Converts Unix timestamps to `HH:MM:SS` format.
6. **`processFlights()`**:
   - Combines all logic to filter, format, and calculate flight details.
   - Returns a processed DataFrame.

---

### **Output DataFrame**
| **Column**            | **Description**                                      |
|-----------------------|-----------------------------------------------------|
| `icao24`              | Unique aircraft transponder code.                   |
| `callsign`            | Flight identifier (e.g., RYR997F).                  |
| `departingFrom`       | Origin country of the flight.                       |
| `timePosition`        | Timestamp of the last known position (HH:MM:SS).    |
| `longitude`           | Aircraft longitude.                                 |
| `latitude`            | Aircraft latitude.                                  |
| `altitude`            | Altitude in meters.                                 |
| `speedKmh`            | Aircraft speed in km/h.                             |
| `estimatedArrivalAt`  | Nearest or destination airport name.                |

---

### **Purpose**
This code processes real-time flight data for a user-selected airline, providing:
- Visualization of aircraft positions on a map.
- A DataFrame summarizing key flight details.

In [1]:
import requests
import pandas as pd
from geopy.distance import geodesic

# OpenSky API Base URL
BASE_URL = "https://opensky-network.org/api/states/all"

# OpenFlights URL
OPENFLIGHTS_URL = "https://raw.githubusercontent.com/jpatokal/openflights/master/data/airlines.dat"

# Load airport data
def loadAirports():
    url = "https://ourairports.com/data/airports.csv"
    airports = pd.read_csv(url)
    airports = airports[['ident', 'name', 'latitude_deg', 'longitude_deg', 'type']]
    airports = airports[airports['type'] == 'large_airport']  # Filter large airports
    return airports

# Fetch airline data for dropdown
def getAirlineDropdownData():
    try:
        # Define column names for OpenFlights dataset
        columnNames = ["AirlineID", "Name", "Alias", "IATA", "ICAO", "Callsign", "Country", "Active"]

        # Load airline data
        airlinesDf = pd.read_csv(
            OPENFLIGHTS_URL,
            header=None,
            names=columnNames,
            na_values=["\\N"],
            usecols=["Name", "ICAO", "Active"]
        )

        # Filter only active airlines and drop rows with missing ICAO codes
        airlinesDf = airlinesDf[(airlinesDf["Active"] == "Y") & (airlinesDf["ICAO"].notna())]

        # Normalize data and prepare for dropdown
        airlinesDf = airlinesDf.rename(columns={"ICAO": "ShortName", "Name": "LongName"})
        airlinesDf["LongName"] = airlinesDf["LongName"].str.strip()

        # Separate airlines starting with numbers
        startsWithNumber = airlinesDf["LongName"].str[0].str.isnumeric()

        # Sort alphabetically and place numeric starters at the end
        sortedAirlines = pd.concat([
            airlinesDf[~startsWithNumber].sort_values(by="LongName"),
            airlinesDf[startsWithNumber].sort_values(by="LongName")
        ]).reset_index(drop=True)

        # Create concatenated column for dropdown
        sortedAirlines["Airline"] = sortedAirlines["LongName"] + " - " + sortedAirlines["ShortName"]

        return sortedAirlines[["ShortName", "Airline"]]
    except Exception as e:
        print(f"Error loading OpenFlights data: {e}")
        return pd.DataFrame(columns=["ShortName", "Airline"])

# Fetch flight data from OpenSky API
def fetchFlightData():
    try:
        response = requests.get(BASE_URL)
        if response.status_code == 200:
            return response.json()
        else:
            print(f"Error: {response.status_code} - {response.text}")
            return None
    except Exception as e:
        print(f"An error occurred: {e}")
        return None

# Filter flights by airline code (callsign prefix)
def filterByAirline(flightsDf, airlineCode):
    flightsDf = flightsDf.copy()
    flightsDf.loc[:, 'callsign'] = flightsDf['callsign'].str.strip()
    return flightsDf[flightsDf['callsign'].str.startswith(airlineCode, na=False)]

# Find the nearest airport
def findNearestAirport(lat, lon, airports):
    flightPosition = (lat, lon)
    distances = airports.apply(
        lambda row: geodesic(flightPosition, (row['latitude_deg'], row['longitude_deg'])).kilometers,
        axis=1
    )
    nearestIndex = distances.idxmin()
    nearestAirport = airports.loc[nearestIndex]
    return nearestAirport['name'], nearestAirport['ident'], distances[nearestIndex]

# Convert Unix timestamps to human-readable time (hour)
def convertTimestampToHour(unixTime):
    from datetime import datetime, timezone
    if pd.notna(unixTime):
        return datetime.fromtimestamp(int(unixTime), tz=timezone.utc).strftime('%H:%M:%S')
    return None

# Process flight data for a specific airline
def processFlights(selectedAirlineCode):
    # Load airport data
    airports = loadAirports()

    # Fetch OpenSky flight data
    openskyData = fetchFlightData()

    if openskyData and "states" in openskyData:
        # Define columns and create DataFrame
        columns = [
            "icao24", "callsign", "origin_country", "time_position", "last_contact",
            "longitude", "latitude", "baro_altitude", "on_ground", "velocity",
            "true_track", "vertical_rate", "sensors", "geo_altitude", "squawk",
            "spi", "position_source"
        ]
        flightsDf = pd.DataFrame(openskyData["states"], columns=columns)

        # Filter flights for the selected airline
        filteredFlights = filterByAirline(flightsDf, selectedAirlineCode).dropna(subset=['latitude', 'longitude']).copy()

        # Find nearest airport for each flight
        filteredFlights["nearestAirport"] = filteredFlights.apply(
            lambda row: findNearestAirport(row["latitude"], row["longitude"], airports)[0]
            if pd.notna(row["latitude"]) and pd.notna(row["longitude"]) else None,
            axis=1
        )

        # Convert timestamps to readable time
        filteredFlights["timePosition"] = filteredFlights["time_position"].apply(convertTimestampToHour)

        # Rename columns
        columnRenameMap = {
            "origin_country": "departingFrom",
            "timePosition": "timePosition",
            "baro_altitude": "altitude",
            "velocity": "speedKmh",
            "nearestAirport": "estimatedArrivalAt"
        }
        filteredFlights.rename(columns=columnRenameMap, inplace=True)

        # Convert speed from m/s to km/h
        if "speedKmh" in filteredFlights.columns:
            filteredFlights["speedKmh"] = filteredFlights["speedKmh"] * 3.6

        # Select specific columns for output
        displayColumns = [
            "icao24", "callsign", "departingFrom", "timePosition",
            "longitude", "latitude", "altitude", "speedKmh", "estimatedArrivalAt"
        ]
        return filteredFlights[displayColumns].reset_index(drop=True)

    else:
        print("No flight data available.")
        return pd.DataFrame()



In [2]:
# Fetch airlines and select Ryanair
airlines = getAirlineDropdownData()
selectedAirline = airlines[airlines["ShortName"] == "RYR"].reset_index(drop=True)
selectedAirline

Unnamed: 0,ShortName,Airline
0,RYR,Ryanair - RYR


In [3]:
# Process flights for the selected airline
flights = processFlights(selectedAirline.iloc[0]["ShortName"])

# Display the processed DataFrame
flights

Unnamed: 0,icao24,callsign,departingFrom,timePosition,longitude,latitude,altitude,speedKmh,estimatedArrivalAt
0,4ca9c1,RYR9ND,Ireland,13:45:47,-6.1738,46.1286,11475.72,910.728,Santiago-Rosalía de Castro Airport
1,4ca9a9,RYR32JR,Ireland,13:45:47,-3.1138,46.1809,11277.60,781.596,Bordeaux-Mérignac Airport
2,4ca9ed,RYR571Z,Ireland,13:45:46,17.8079,49.3029,8001.00,744.516,M. R. Štefánik Airport
3,4ca9ea,RYR29VV,Ireland,13:45:48,-2.4941,54.4559,10629.90,795.852,Manchester Airport
4,4ca9cf,RYR2RM,Ireland,13:45:38,-2.8107,40.9169,3200.40,409.608,Adolfo Suárez Madrid–Barajas Airport
...,...,...,...,...,...,...,...,...,...
308,48c224,RYR9922,Poland,13:45:48,17.3966,50.2789,10972.80,722.736,Kraków John Paul II International Airport
309,48c221,RYR652B,Poland,13:45:48,23.8422,54.7927,3101.34,481.320,Vilnius International Airport
310,48c222,RYR946,Poland,13:45:25,16.5596,48.1173,,48.168,Vienna International Airport
311,4caa57,RYR58LG,Ireland,13:45:44,33.8950,29.2583,10363.20,641.304,Ramon International Airport


In [4]:
flights.head(50)

Unnamed: 0,icao24,callsign,departingFrom,timePosition,longitude,latitude,altitude,speedKmh,estimatedArrivalAt
0,4ca9c1,RYR9ND,Ireland,13:45:47,-6.1738,46.1286,11475.72,910.728,Santiago-Rosalía de Castro Airport
1,4ca9a9,RYR32JR,Ireland,13:45:47,-3.1138,46.1809,11277.6,781.596,Bordeaux-Mérignac Airport
2,4ca9ed,RYR571Z,Ireland,13:45:46,17.8079,49.3029,8001.0,744.516,M. R. Štefánik Airport
3,4ca9ea,RYR29VV,Ireland,13:45:48,-2.4941,54.4559,10629.9,795.852,Manchester Airport
4,4ca9cf,RYR2RM,Ireland,13:45:38,-2.8107,40.9169,3200.4,409.608,Adolfo Suárez Madrid–Barajas Airport
5,4ca9cc,RYR802R,Ireland,13:45:47,-6.6968,45.96,11277.6,772.38,Santiago-Rosalía de Castro Airport
6,4ca97b,RYR44QX,Ireland,13:45:48,-2.0235,53.2331,11277.6,868.572,Manchester Airport
7,4ca8e8,RYR4MT,Ireland,13:45:48,17.2416,49.5469,11765.28,871.344,M. R. Štefánik Airport
8,4ca8af,RYR1SV,Ireland,13:45:48,8.2663,41.54,10226.04,878.652,Nice-Côte d'Azur Airport
9,4ca8d6,RYR52DK,Ireland,13:45:48,-7.0473,53.2096,7391.4,635.508,Dublin Airport


# Column Descriptions for OpenSky Network API Data

1. **icao24**:
   - A unique 24-bit address assigned to the aircraft by the International Civil Aviation Organization (ICAO).
   - This acts as a unique identifier for each aircraft.

2. **callsign**:
   - The flight number or callsign used for communication.
   - Example: "RYR123" for a Ryanair flight.

3. **origin_country**:
   - The country where the aircraft is registered.
   - Example: "Ireland" for Ryanair flights.

4. **time_position**:
   - The timestamp (in seconds since Unix epoch) of the last known position.
   - If `null`, no position was reported.

5. **last_contact**:
   - The timestamp (in seconds since Unix epoch) of the last message received by the server from the transponder.
   - Useful for determining the recency of the data.

6. **longitude**:
   - The geographic longitude of the aircraft's position in degrees.
   - Example: `-6.2675` for a position near Dublin, Ireland.

7. **latitude**:
   - The geographic latitude of the aircraft's position in degrees.
   - Example: `53.4273` for a position near Dublin, Ireland.

8. **baro_altitude**:
   - The altitude of the aircraft as measured by barometric pressure, in meters.
   - May be `null` if the altitude data is unavailable.

9. **on_ground**:
   - A boolean indicating whether the aircraft is currently on the ground.
   - `true` if the aircraft is on the ground; `false` otherwise.

10. **velocity**:
    - The ground speed of the aircraft in meters per second (m/s).
    - Example: `200.5` for an aircraft moving at 200.5 m/s.

11. **true_track**:
    - The aircraft's heading in degrees clockwise from true north.
    - Example: `90.0` for an eastward heading.

12. **vertical_rate**:
    - The rate of climb or descent in meters per second (m/s).
    - Positive values indicate climbing; negative values indicate descending.

13. **sensors**:
    - A list of IDs of sensors that contributed to the state vector.
    - May be `null` if no specific sensors are associated.

14. **geo_altitude**:
    - The geometric altitude of the aircraft above mean sea level, in meters.
    - Example: `10500.0` for an aircraft flying at 10,500 meters.

15. **squawk**:
    - The transponder squawk code assigned to the aircraft by air traffic control.
    - Example: `7500` for a hijack alert.

16. **spi**:
    - A boolean indicating whether the aircraft has a Special Purpose Indicator (SPI) set.
    - `true` if SPI is set; `false` otherwise.

17. **position_source**:
    - The source of the aircraft's position data:
      - `0`: ADS-B.
      - `1`: ASTERIX.
      - `2`: MLAT (Multilateration).
