# Experiment of ASK on CO2 Emission

## Definition
ASK (Available Seat Kilometers) is a key metric in aviation used to quantify the total passenger capacity offered by an airline or across the global air transport network. Here's a detailed breakdown:

Available Seats: The number of seats on an aircraft that are available for passengers (not accounting for whether they are actually occupied).

Kilometers: The distance flown for a specific route.

ASK = Number of Available Seats × Distance Flown (in kilometers).
For example:

A flight with 200 seats on a 1,000 km route has an ASK of 200 × 1,000 = 200,000 ASK.

## Capacity Measurement:
ASK reflects the total "supply" of air travel capacity. It helps airlines and researchers understand how much seating capacity is offered across routes, regions, or globally.

## Environmental Impact:
In the context of the study, ASK is used to estimate CO₂ emissions. Longer flights (higher ASK) generally consume more fuel, so ASK helps link traffic volume to environmental impact.

## Validation of Data:
The study compares ASK values from their open-source dataset with commercial references (e.g., OAG, IATA) to assess accuracy (Section 3 of the paper). For instance:

Their ASK total (10,664 billion) closely matches OAG’s reference data (101.6% of OAG’s value).

ASK is more reliable than raw seat counts because it accounts for flight distance, which is critical for emissions calculations.

In [None]:
import pandas as pd
import numpy as np
import joblib
import matplotlib.pyplot as plt

In [2]:
# Load trained model
model = joblib.load("../2.Models/random_forest_model.pkl")

FileNotFoundError: [Errno 2] No such file or directory: '../2.Models/random_forest_model.pkl'

In [3]:
# Training columns (from X_train)
X_train = pd.read_parquet(
    "https://github.com/monatagelsir7/enivornmental_impact_of_aviation/raw/refs/heads/main/Test-Train-Validation%20Data/X_train.parquet"
)
expected_columns = X_train.columns.tolist()


In [4]:
print(X_train.columns)

Index(['seats', 'n_flights', 'domestic', 'ask', 'rpk', 'fuel_burn',
       'acft_class_NB', 'acft_class_OTHER', 'acft_class_PJ', 'acft_class_PP',
       ...
       'arrival_country_YE', 'arrival_country_YT', 'arrival_country_ZA',
       'arrival_country_ZM', 'arrival_country_ZW', 'arrival_continent_AS',
       'arrival_continent_EU', 'arrival_continent_NA', 'arrival_continent_OC',
       'arrival_continent_SA'],
      dtype='object', length=490)


In [None]:


# Input preparation function
def prepare_input(raw_dict, expected_columns):
    df = pd.DataFrame([raw_dict])
    df_encoded = pd.get_dummies(df)

    # Add missing columns
    for col in expected_columns:
        if col not in df_encoded.columns:
            df_encoded[col] = 0

    # Remove unexpected (extra) columns
    extra_cols = set(df_encoded.columns) - set(expected_columns)
    if extra_cols:
        df_encoded = df_encoded.drop(columns=extra_cols)

    # Ensure correct column order
    df_encoded = df_encoded[expected_columns]
    return df_encoded


# Aircrafts to test
aircraft_list = [
    "PA32",
    "B712",
    "A320",
    "B737",
    "B752",
    "C402",
    "B739",
    "DH8D",
    "BN2P",
    "B738",
    "E75L",
    "A321",
    "E190",
    "C208",
    "MD87",
    "AT76",
    "DH8A",
    "LJ75",
    "A20N",
    "DHC6",
    "A319",
    "E170",
    "RJ1H",
    "A388",
    "E145",
    "CRJ2",
    "CRJ9",
    "C172",
    "E135",
    "CRJ7",
    "AT75",
    "CRJX",
    "MD90",
    "SF34",
    "B190",
    "B753",
    "PC12",
    "RJ85",
    "DHC3",
    "AT43",
    "PA31",
    "A318",
    "AC90",
    "E195",
    "CRJ1",
    "AT72",
    "B763",
    "B77W",
    "B772",
    "E75S",
    "DH8C",
    "B788",
    "A332",
    "B744",
    "AT45",
    "BCS3",
    "A333",
    "A346",
    "A359",
    "B736",
    "B764",
    "BCS1",
    "GA8",
    "B735",
    "A21N",
    "B789",
    "D328",
    "SB20",
    "DHC2",
    "B77L",
    "B748",
    "B78X",
    "B733",
    "B38M",
    "J328",
    "CL60",
    "DH8B",
    "A343",
    "B734",
    "B74R",
    "JS31",
    "SU95",
    "A342",
    "B762",
    "A35K",
    "A339",
    "B39M",
    "BE20",
    "BE99",
    "A306",
    "R44",
    "MD11",
    "C180",
    "F100",
    "A310",
    "B350",
    "C30J",
    "A400",
    "T37",
    "A139",
    "C56X",
    "BE35",
    "C212",
    "E45X",
    "H500",
    "AS50",
    "GLEX",
    "A345",
    "GL5T",
    "BE9L",
    "C441",
    "FA7X",
    "PA30",
    "CL30",
    "A119",
    "BE55",
    "F2TH",
    "P180",
    "F900",
    "LJ45",
    "GLF6",
    "DC10",
    "SW4",
    "H25B",
    "B732",
    "B722",
    "FA8X",
    "TBM9",
    "GA5C",
    "C500",
    "G150",
    "BE10",
    "LJ60",
    "MU2",
    "GALX",
    "GLF4",
    "C130",
    "KODI",
    "E120",
    "BE40",
    "LJ35",
    "TBM7",
    "DC93",
    "D228",
    "E35L",
    "FA5X",
    "DC91",
    "P46T",
    "MD88",
    "TBM8",
    "SW3",
    "BE30",
    "DH2T",
    "C425",
]
predictions = []

# Base sample input
base_input = {
    "airline_iata": "AF",
    "acft_class": "NB",
    "departure_country": "France",
    "departure_continent": "Europe",
    "arrival_country": "Germany",
    "arrival_continent": "Europe",
    "domestic": 0,
    "ask": 200000,
    "fuel_burn": 12000,
    "iata_departure": "CDG",
    "iata_arrival": "FRA",
    "acft_icao": "A320",  # will vary this one
}

# Run predictions
for acft in aircraft_list:
    row = base_input.copy()
    row["acft_icao"] = acft
    df_row = prepare_input(row, expected_columns)
    pred = model.predict(df_row)[0]
    predictions.append(pred)

# Plotting
plt.figure(figsize=(10, 6))
plt.bar(aircraft_list, predictions)
plt.xlabel("Aircraft Type (ICAO Code)")
plt.ylabel("Predicted CO₂ per km")
plt.title("CO₂/km by Aircraft Type")
plt.xticks(rotation=45)
plt.grid(True, axis="y")
plt.tight_layout()
plt.show()