<a href="https://colab.research.google.com/github/mnpoliakov/MGMT467_Team7/blob/main/Unit2_Flights/Individual/Louis/Unit2_Louis_Baseline_BQML_Regression.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

In [5]:

# --- Minimal setup (edit 3 vars) ---
from google.colab import auth
auth.authenticate_user()

import os
from google.cloud import bigquery

PROJECT_ID = "mgmt467-71800"      # e.g., mgmt-467-47888
REGION     = "us-central1"
TABLE_PATH = "flights_data.flights_raw"   # or your `bigquery-public-data.flights` table/view

os.environ["PROJECT_ID"] = PROJECT_ID
os.environ["REGION"]     = REGION
bq = bigquery.Client(project=PROJECT_ID)

print("BQ Project:", PROJECT_ID)
print("Source table:", TABLE_PATH)


BQ Project: mgmt467-71800
Source table: flights_data.flights_raw


In [6]:

preview_sql = f"SELECT * FROM `{TABLE_PATH}` LIMIT 5"
bq.query(preview_sql).result().to_dataframe()


Unnamed: 0,Year,Quarter,Month,DayofMonth,DayOfWeek,FlightDate,Reporting_Airline,DOT_ID_Reporting_Airline,IATA_CODE_Reporting_Airline,Tail_Number,...,Div5Airport,Div5AirportID,Div5AirportSeqID,Div5WheelsOn,Div5TotalGTime,Div5LongestGTime,Div5WheelsOff,Div5TailNum,string_field_109,string_field_110
0,2024,1,3,15,5,2024-03-15,9E,20363,9E,N906XJ,...,,,,,,,,,,
1,2024,1,1,7,7,2024-01-07,9E,20363,9E,N326PQ,...,,,,,,,,,,
2,2024,1,2,23,5,2024-02-23,9E,20363,9E,N310PQ,...,,,,,,,,,,
3,2024,1,3,27,3,2024-03-27,9E,20363,9E,N354CA,...,,,,,,,,,,
4,2024,1,2,12,1,2024-02-12,9E,20363,9E,N272PQ,...,,,,,,,,,,


# Task
Create a pre-departure baseline model using BigQuery ML from the `flights_data.flights_cleaned` table. The model should incorporate 'route' and 'month' as new features, train a logistic regression model with 'carrier', 'route', 'distance', 'day_of_week', and 'month' as features and 'is_arrival_delayed' as the target, then evaluate its performance using AUC and log_loss, generate a confusion matrix at a 0.5 threshold, analyze its calibration, and finally summarize all performance metrics.

## Prepare Features for Pre-departure Model

### Subtask:
Create the required features for the pre-departure model, including a 'route' feature (Origin-Dest) and 'month' from FL_DATE, which will then be used to create the baseline model. These operations will be performed directly in BigQuery SQL.


**Reasoning**:
Construct a BigQuery SQL query to create the `route` and `month` features, and select other required columns for the pre-departure model. This query will be stored in a variable for later use.



## Prepare Features for Pre-departure Model

### Subtask:
Create the required features for the pre-departure model, including a 'route' feature (Origin-Dest) and 'month' from FL_DATE, which will then be used to create the baseline model. These operations will be performed directly in BigQuery SQL.


In [13]:
create_features_sql = f"""SELECT
    ReportingAirline,
    Distance,
    DepDelay,
    ArrDelay,
    Origin,
    CONCAT(OriginCityName, '-', DestCityName) AS Route,
    Diverted,
    Month,
    DayOfWeek
FROM
    `{TABLE_PATH}`
WHERE
    Diverted IS 1.0
"""

print(create_features_sql)

SELECT
    ReportingAirline,
    Distance,
    DepDelay,
    ArrDelay,
    Origin,
    CONCAT(OriginCityName, '-', DestCityName) AS Route,
    Diverted,
    Month,
    DayOfWeek
FROM
    `flights_data.flights_cleaned`
WHERE
    Diverted IS 1.0



### Train Pre-departure Baseline Model

#### Subtask:
Train a BigQuery ML logistic regression model using the prepared features (`carrier`, `route`, `distance`, `day_of_week`, and `month`) and `is_arrival_delayed` as the target variable.

#### Instructions:
1. Construct a BigQuery SQL query to create a logistic regression model named `pre_departure_baseline_model` within a specified dataset. This model should use the `create_features_sql` as its training data.
2. Specify the model options, including `MODEL_TYPE='LOGISTIC_REG'` and `INPUT_LABEL_COLS=['is_arrival_delayed']`.
3. Store this SQL query in a Python variable, for example, `model_training_sql`.


**Reasoning**:
To train the BigQuery ML model, I will construct a SQL query using the `CREATE MODEL` statement. This query will specify the model type, input label column, and use the previously defined `create_features_sql` as the training data, storing the resulting query in the `model_training_sql` variable.



In [17]:
model_training_sql = f"""CREATE OR REPLACE MODEL `{PROJECT_ID}.flights_data.pre_departure_baseline_model`
OPTIONS(
    model_type='LOGISTIC_REG',
    input_label_cols=['Diverted']
)
AS
{create_features_sql}
"""

print(model_training_sql)

CREATE OR REPLACE MODEL `mgmt467-71800.flights_data.pre_departure_baseline_model`
OPTIONS(
    model_type='LOGISTIC_REG',
    input_label_cols=['Diverted']
)
AS
SELECT
    ReportingAirline,
    Distance,
    DepDelay,
    ArrDelay,
    Origin,
    CONCAT(OriginCityName, '-', DestCityName) AS Route,
    Diverted,
    Month,
    DayOfWeek
FROM
    `flights_data.flights_cleaned`
WHERE
    Diverted IS 1.0




# Task
Prepare features for the pre-departure model by modifying the `create_features_sql` to select `ReportingAirline` as `carrier`, `Distance`, `CONCAT(OriginCityName, '-', DestCityName)` as `route`, `Month`, `DayOfWeek`, and `Diverted` from the `flights_data.flights_raw` table, including all rows where `Diverted` is not null.

## Prepare Features for Pre-departure Model (Revised)

### Subtask:
Refine the `create_features_sql` to select only schedule-level features and the target variable from the raw data, including all rows where `Diverted` is not null.


**Reasoning**:
I need to update the `create_features_sql` variable with the refined SQL query as per the instructions, changing the FROM clause, SELECT columns with aliases, and the WHERE clause.



In [19]:
create_features_sql = f"""SELECT
    Reporting_Airline AS carrier,
    Distance,
    CONCAT(Origin, '-', Dest) AS route,
    Month,
    DayOfWeek,
    Diverted
FROM
    `flights_data.flights_raw`
WHERE
    Diverted IS NOT NULL
"""

print(create_features_sql)

SELECT
    Reporting_Airline AS carrier,
    Distance,
    CONCAT(Origin, '-', Dest) AS route,
    Month,
    DayOfWeek,
    Diverted
FROM
    `flights_data.flights_raw`
WHERE
    Diverted IS NOT NULL



## Create BigQuery ML Model A (Pre-departure Baseline)

### Subtask:
Train a BigQuery ML logistic regression model using `ML.CREATE_MODEL` with 'carrier', 'route', 'distance', 'day_of_week', and 'month' as features, and 'Diverted' as the target variable.


**Reasoning**:
Construct the SQL query for training the BigQuery ML logistic regression model using the `create_features_sql` as the training data, specifying the model type and input label. This query will be stored in the `model_training_sql` variable.



In [20]:
model_training_sql = f"""CREATE OR REPLACE MODEL `{PROJECT_ID}.flights_data.pre_departure_baseline_model`
OPTIONS(
    model_type='LOGISTIC_REG',
    input_label_cols=['Diverted'],
    auto_select_all_features=TRUE,
    L1_REG=0.1,
    LS_INIT_WEIGHTS='AUTO_ZERO'
)
AS
{create_features_sql}
"""

print(model_training_sql)

CREATE OR REPLACE MODEL `mgmt467-71800.flights_data.pre_departure_baseline_model`
OPTIONS(
    model_type='LOGISTIC_REG',
    input_label_cols=['Diverted'],
    auto_select_all_features=TRUE,
    L1_REG=0.1,
    LS_INIT_WEIGHTS='AUTO_ZERO'
)
AS
SELECT
    Reporting_Airline AS carrier,
    Distance,
    CONCAT(Origin, '-', Dest) AS route,
    Month,
    DayOfWeek,
    Diverted
FROM
    `flights_data.flights_raw`
WHERE
    Diverted IS NOT NULL




**Reasoning**:
Execute the `model_training_sql` using the BigQuery client to train the model as per the instructions.



In [21]:
print('Training BigQuery ML model (this may take a few minutes)...')
bq.query(model_training_sql).result()
print('Model training complete!')

Training BigQuery ML model (this may take a few minutes)...


BadRequest: 400 unsupported option auto_select_all_features; reason: invalidQuery, location: query, message: unsupported option auto_select_all_features

Location: US
Job ID: 6e373565-c9cf-46fe-b12a-c41e813ec279


**Reasoning**:
The previous code failed because `auto_select_all_features` and `LS_INIT_WEIGHTS` are unsupported options for `LOGISTIC_REG` model type in BigQuery ML. I need to remove these invalid options from the `model_training_sql`.



In [22]:
model_training_sql = f"""CREATE OR REPLACE MODEL `{PROJECT_ID}.flights_data.pre_departure_baseline_model`
OPTIONS(
    model_type='LOGISTIC_REG',
    input_label_cols=['Diverted'],
    L1_REG=0.1
)
AS
{create_features_sql}
"""

print(model_training_sql)

CREATE OR REPLACE MODEL `mgmt467-71800.flights_data.pre_departure_baseline_model`
OPTIONS(
    model_type='LOGISTIC_REG',
    input_label_cols=['Diverted'],
    L1_REG=0.1
)
AS
SELECT
    Reporting_Airline AS carrier,
    Distance,
    CONCAT(Origin, '-', Dest) AS route,
    Month,
    DayOfWeek,
    Diverted
FROM
    `flights_data.flights_raw`
WHERE
    Diverted IS NOT NULL




**Reasoning**:
Now that the `model_training_sql` has been corrected by removing unsupported options, the next step is to execute this SQL query to train the BigQuery ML model as per the instructions.



In [23]:
print('Training BigQuery ML model (this may take a few minutes)...')
bq.query(model_training_sql).result()
print('Model training complete!')

Training BigQuery ML model (this may take a few minutes)...
Model training complete!


## Evaluate Model A Performance (AUC, Log_Loss)

### Subtask:
Use `ML.EVALUATE` to assess the model's performance, focusing on Area Under the Curve (AUC) and log_loss, which are key metrics for classification models. This will provide an initial understanding of the model's predictive power on the diversion risk.


**Reasoning**:
Construct a BigQuery SQL query to evaluate the `pre_departure_baseline_model` using `ML.EVALUATE`, applying the same filtering criteria as used for training data, and store it in the `evaluate_model_sql` variable.



In [24]:
evaluate_model_sql = f"""SELECT
  * FROM ML.EVALUATE(
    MODEL `{PROJECT_ID}.flights_data.pre_departure_baseline_model`,
    (SELECT
        Reporting_Airline AS carrier,
        Distance,
        CONCAT(Origin, '-', Dest) AS route,
        Month,
        DayOfWeek,
        Diverted
    FROM
        `flights_data.flights_raw`
    WHERE
        Diverted IS NOT NULL
    )
)"""

print(evaluate_model_sql)

SELECT
  * FROM ML.EVALUATE(
    MODEL `mgmt467-71800.flights_data.pre_departure_baseline_model`,
    (SELECT
        Reporting_Airline AS carrier,
        Distance,
        CONCAT(Origin, '-', Dest) AS route,
        Month,
        DayOfWeek,
        Diverted
    FROM
        `flights_data.flights_raw`
    WHERE
        Diverted IS NOT NULL
    )
)


**Reasoning**:
Now that the `evaluate_model_sql` has been constructed, I need to execute this SQL query using the BigQuery client to get the evaluation metrics as a Pandas DataFrame and display them.



In [25]:
print('Evaluating BigQuery ML model...')
evaluation_results_df = bq.query(evaluate_model_sql).result().to_dataframe()
print('Model evaluation complete!')

print(evaluation_results_df)

Evaluating BigQuery ML model...
Model evaluation complete!
   precision    recall  accuracy  f1_score  log_loss   roc_auc
0   0.227273  0.001052  0.982763  0.002095  0.076178  0.785921


## Generate Confusion Matrix for Model A

### Subtask:
Calculate and display the confusion matrix for Model A at a classification threshold of 0.5 using `ML.CONFUSION_MATRIX`.


**Reasoning**:
Construct a BigQuery SQL query to generate the confusion matrix for the `pre_departure_baseline_model` using `ML.CONFUSION_MATRIX` with a classification threshold of 0.5 and the same input data used for training, storing it in the `confusion_matrix_sql` variable.



In [26]:
confusion_matrix_sql = f"""SELECT
  * FROM ML.CONFUSION_MATRIX(
    MODEL `{PROJECT_ID}.flights_data.pre_departure_baseline_model`,
    (SELECT
        Reporting_Airline AS carrier,
        Distance,
        CONCAT(Origin, '-', Dest) AS route,
        Month,
        DayOfWeek,
        Diverted
    FROM
        `flights_data.flights_raw`
    WHERE
        Diverted IS NOT NULL
    ),
    STRUCT(0.5 AS threshold)
)"""

print(confusion_matrix_sql)

SELECT
  * FROM ML.CONFUSION_MATRIX(
    MODEL `mgmt467-71800.flights_data.pre_departure_baseline_model`,
    (SELECT
        Reporting_Airline AS carrier,
        Distance,
        CONCAT(Origin, '-', Dest) AS route,
        Month,
        DayOfWeek,
        Diverted
    FROM
        `flights_data.flights_raw`
    WHERE
        Diverted IS NOT NULL
    ),
    STRUCT(0.5 AS threshold)
)


**Reasoning**:
Now that the `confusion_matrix_sql` has been constructed, I need to execute this SQL query using the BigQuery client to get the confusion matrix as a Pandas DataFrame and display it.



In [27]:
print('Generating confusion matrix...')
confusion_matrix_df = bq.query(confusion_matrix_sql).result().to_dataframe()
print('Confusion matrix generated!')

print(confusion_matrix_df)

Generating confusion matrix...
Confusion matrix generated!
   expected_label       _0   _1
0             0.0  1629646  102
1             1.0    28481   30


## Interpretation of Confusion Matrix

### At a 0.5 threshold):

- True Negatives (TN): 1629646 - The model correctly predicted a vast majority of non-diverted flights.
- False Positives (FP): 102 - The model incorrectly predicted 102 flights as diverted when they were not. This number is very low, indicating high precision for positive predictions.
- False Negatives (FN): 28481 - The model incorrectly predicted 28,481 flights as not diverted when they actually were. This is a significant number, leading to a very low recall for the positive class.
- True Positives (TP): 30 - The model correctly identified only 30 actual diversions.

The confusion matrix highlights a critical issue: the model's recall for diversions is extremely low (0.001052). While its precision (0.227273) for predicting diversions is higher, the model struggles to identify actual diversion events. This suggests that at a 0.5 threshold, the model is overly cautious and misses most of the positive cases.

## Analyze Model A Calibration

### Subtask:
Perform an analysis of the model's calibration by comparing predicted probabilities against actual outcomes. This step aims to understand how well the predicted probabilities align with the true likelihood of diversion, providing 2-3 lines of analysis.


**Reasoning**:
To analyze the model's calibration, I will construct a BigQuery SQL query using `ML.PREDICT` to get predicted probabilities, then group these predictions into 10 bins using `NTILE` and calculate the average predicted probability and actual diversion rate for each bin. This query will be stored in `calibration_sql`.



In [28]:
calibration_sql = f"""WITH Predictions AS (
    SELECT
        predicted_Diverted_probs[OFFSET(1)].prob AS predicted_probability,
        Diverted
    FROM
        ML.PREDICT(
            MODEL `{PROJECT_ID}.flights_data.pre_departure_baseline_model`,
            (SELECT
                Reporting_Airline AS carrier,
                Distance,
                CONCAT(Origin, '-', Dest) AS route,
                Month,
                DayOfWeek,
                Diverted
            FROM
                `flights_data.flights_raw`
            WHERE
                Diverted IS NOT NULL
            )
        )
)
SELECT
    NTILE(10) OVER (ORDER BY predicted_probability) AS probability_bin,
    AVG(predicted_probability) AS avg_predicted_probability,
    AVG(Diverted) AS actual_diversion_rate
FROM
    Predictions
GROUP BY
    probability_bin
ORDER BY
    probability_bin
"""

print(calibration_sql)

WITH Predictions AS (
    SELECT
        predicted_Diverted_probs[OFFSET(1)].prob AS predicted_probability,
        Diverted
    FROM
        ML.PREDICT(
            MODEL `mgmt467-71800.flights_data.pre_departure_baseline_model`,
            (SELECT
                Reporting_Airline AS carrier,
                Distance,
                CONCAT(Origin, '-', Dest) AS route,
                Month,
                DayOfWeek,
                Diverted
            FROM
                `flights_data.flights_raw`
            WHERE
                Diverted IS NOT NULL
            )
        )
)
SELECT
    NTILE(10) OVER (ORDER BY predicted_probability) AS probability_bin,
    AVG(predicted_probability) AS avg_predicted_probability,
    AVG(Diverted) AS actual_diversion_rate
FROM
    Predictions
GROUP BY
    probability_bin
ORDER BY
    probability_bin



**Reasoning**:
Now that the `calibration_sql` has been constructed, I need to execute this SQL query using the BigQuery client to get the calibration metrics as a Pandas DataFrame and display them, as per the instructions.



In [30]:
calibration_sql = f"""WITH Predictions AS (
    SELECT
        predicted_Diverted_probs[OFFSET(1)].prob AS predicted_probability,
        Diverted
    FROM
        ML.PREDICT(
            MODEL `{PROJECT_ID}.flights_data.pre_departure_baseline_model`,
            (SELECT
                Reporting_Airline AS carrier,
                Distance,
                CONCAT(Origin, '-', Dest) AS route,
                Month,
                DayOfWeek,
                Diverted
            FROM
                `flights_data.flights_raw`
            WHERE
                Diverted IS NOT NULL
            )
        )
),
BinnedPredictions AS (
    SELECT
        NTILE(10) OVER (ORDER BY predicted_probability) AS probability_bin,
        predicted_probability,
        Diverted
    FROM
        Predictions
)
SELECT
    probability_bin,
    AVG(predicted_probability) AS avg_predicted_probability,
    AVG(Diverted) AS actual_diversion_rate
FROM
    BinnedPredictions
GROUP BY
    probability_bin
ORDER BY
    probability_bin
"""

print(calibration_sql)

WITH Predictions AS (
    SELECT
        predicted_Diverted_probs[OFFSET(1)].prob AS predicted_probability,
        Diverted
    FROM
        ML.PREDICT(
            MODEL `mgmt467-71800.flights_data.pre_departure_baseline_model`,
            (SELECT
                Reporting_Airline AS carrier,
                Distance,
                CONCAT(Origin, '-', Dest) AS route,
                Month,
                DayOfWeek,
                Diverted
            FROM
                `flights_data.flights_raw`
            WHERE
                Diverted IS NOT NULL
            )
        )
),
BinnedPredictions AS (
    SELECT
        NTILE(10) OVER (ORDER BY predicted_probability) AS probability_bin,
        predicted_probability,
        Diverted
    FROM
        Predictions
)
SELECT
    probability_bin,
    AVG(predicted_probability) AS avg_predicted_probability,
    AVG(Diverted) AS actual_diversion_rate
FROM
    BinnedPredictions
GROUP BY
    probability_bin
ORDER BY
    probability_bin



**Reasoning**:
Now that the `calibration_sql` has been corrected to avoid using an analytic function in the `GROUP BY` clause, I need to execute this SQL query using the BigQuery client to get the calibration metrics as a Pandas DataFrame and display them, as per the instructions.



In [31]:
print('Analyzing model calibration...')
calibration_df = bq.query(calibration_sql).result().to_dataframe()
print('Model calibration analysis complete!')

print(calibration_df)

Analyzing model calibration...
Model calibration analysis complete!
   probability_bin  avg_predicted_probability  actual_diversion_rate
0                1                   0.939825               0.083425
1                2                   0.968772               0.026425
2                3                   0.978592               0.014931
3                4                   0.983946               0.011241
4                5                   0.987640               0.007900
5                6                   0.990490               0.006175
6                7                   0.992681               0.007417
7                8                   0.994541               0.006724
8                9                   0.996212               0.005644
9               10                   0.997795               0.002050


The calibration analysis shows that for lower probability bins (e.g., bin 1), the model's predicted probability (0.939825) is significantly higher than the actual diversion rate (0.083425), indicating overconfidence in predicting non-diversion. As the predicted probability increases across the bins, the model consistently overestimates the likelihood of non-diversion, with predicted probabilities often much higher than the observed actual diversion rates. This suggests the model is poorly calibrated and tends to be overly confident in its low-probability predictions.

## Summarize Baseline Model Performance

### Subtask:
Provide a comprehensive summary of Model A's performance, including the AUC, log_loss, insights from the confusion matrix, and observations on calibration. This will establish the initial no-real-time baseline for diversion risk.


### Model A Performance Summary (Pre-departure Baseline)

This summary establishes the initial no-real-time baseline for diversion risk using the `pre_departure_baseline_model`.

**1. AUC and Log_Loss:**

*   **AUC:** The model achieved an Area Under the Curve (AUC) of `0.785921`. An AUC of approximately 0.79 indicates that the model has a decent ability to distinguish between flights that will be diverted and those that will not. However, it's not excellent, suggesting there's room for improvement.
*   **Log_Loss:** The log_loss is `0.076178`. A lower log_loss value is better, indicating that the model's predicted probabilities are close to the actual outcomes. While this value seems low, it must be considered in the context of a highly imbalanced dataset (very few diversions).

**2. Confusion Matrix (at a 0.5 threshold):**

*   **True Negatives (TN):** `1629646` - The model correctly predicted a vast majority of non-diverted flights.
*   **False Positives (FP):** `102` - The model incorrectly predicted 102 flights as diverted when they were not. This number is very low, indicating high precision for positive predictions.
*   **False Negatives (FN):** `28481` - The model incorrectly predicted 28,481 flights as not diverted when they actually were. This is a significant number, leading to a very low recall for the positive class.
*   **True Positives (TP):** `30` - The model correctly identified only 30 actual diversions.

The confusion matrix highlights a critical issue: the model's **recall** for diversions is extremely low (`0.001052`). While its precision (`0.227273`) for predicting diversions is higher, the model struggles to identify actual diversion events. This suggests that at a 0.5 threshold, the model is overly cautious and misses most of the positive cases.

**3. Calibration Analysis:**

The calibration analysis (comparing `avg_predicted_probability` with `actual_diversion_rate` across probability bins) shows discrepancies:

*   For lower probability bins (e.g., `probability_bin` 1: average predicted `0.939825`, actual `0.083425`), the model is significantly overconfident, predicting very high probabilities while the actual diversion rate is much lower. This trend continues across most bins, where the `avg_predicted_probability` is considerably higher than the `actual_diversion_rate`.
*   Even in the highest probability bin (bin 10: average predicted `0.997795`, actual `0.002050`), the actual diversion rate remains very low, indicating that even when the model is most confident, the event of diversion is still rare, and the model's high confidence doesn't perfectly align with the actual observed rate.

Overall, the model is **over-confident** in its predictions, especially for the positive class (diversion), as its predicted probabilities are consistently higher than the observed actual diversion rates. This suggests the model's probabilities are not well-calibrated.

**Conclusion:**

This initial pre-departure baseline model demonstrates a moderate discriminative ability (AUC ~0.79) but struggles significantly with **recall** for diversion events, identifying only a tiny fraction of actual diversions. Furthermore, the model exhibits poor **calibration**, tending to be over-confident in its probability estimates. While it serves as a starting point, its practical utility as an early warning system for diversions is limited due to the high number of false negatives and poor calibration. Future iterations should aim to improve recall and calibration, potentially through techniques like resampling, cost-sensitive learning, or threshold adjustment, to make it a more effective tool for diversion risk assessment.

## Summary:

### Data Analysis Key Findings

*   The `create_features_sql` was successfully modified to select `Reporting_Airline` as `carrier`, `Distance`, `CONCAT(Origin, '-', Dest)` as `route`, `Month`, `DayOfWeek`, and `Diverted` from `flights_data.flights_raw`, including all rows where `Diverted` is not null.
*   A BigQuery ML logistic regression model, named `flights_data.pre_departure_baseline_model`, was successfully trained. Initial training attempts failed due to unsupported options (`auto_select_all_features`, `LS_INIT_WEIGHTS`) for the `LOGISTIC_REG` model type, which were subsequently removed.
*   The model achieved an Area Under the Curve (AUC) of 0.785921 and a log\_loss of 0.076178.
*   At a classification threshold of 0.5, the confusion matrix revealed: 1,629,646 True Negatives, 102 False Positives, 28,481 False Negatives, and only 30 True Positives. This indicates an extremely low recall for actual diversions (0.001052).
*   The model is poorly calibrated, consistently overestimating the probability of non-diversion. For instance, in the lowest probability bin, the average predicted probability was 0.939825, while the actual diversion rate was 0.083425.

### Insights or Next Steps

*   While the model shows moderate discriminative ability (AUC $\sim$0.79), its practical utility as an early warning system for diversions is significantly limited by its extremely low recall (missing most actual diversions) and poor calibration (over-confidence in predictions).
*   Future model development should prioritize improving recall and calibration, potentially through techniques such as resampling, cost-sensitive learning, or adjusting the classification threshold, to make it a more effective tool for diversion risk assessment.


# Task
Prepare features for the engineered model by creating a BigQuery SQL query that selects 'carrier' (as `Reporting_Airline`), 'route' (as `CONCAT(Origin, '-', Dest)`), 'distance' (as `Distance`), 'day_of_week' (as `DayOfWeek`), 'month' (as `Month`), and a new bucketized `dep_delay` feature (categorized as 'no_delay', 'short_delay', 'medium_delay', or 'long_delay' based on the `DepDelay` column) from the `flights_data.flights_raw` table, with 'Diverted' as the target variable, including all rows where `Diverted` is not null.

## Prepare Features for Engineered Model

### Subtask:
Construct a BigQuery SQL query to prepare features for the engineered model. This will include 'carrier', 'route' (Origin-Dest), 'distance', 'day_of_week', and 'month' (as used in the baseline), plus a new feature: dep_delay bucketized into categories (e.g., 'no_delay', 'short_delay', 'medium_delay', 'long_delay'). The target variable will remain 'Diverted'.


**Reasoning**:
Construct a BigQuery SQL query to prepare features for the engineered model as instructed, including the new 'dep_delay_bucket' feature using a CASE statement, and store it in the `create_engineered_features_sql` variable.



In [32]:
create_engineered_features_sql = f"""SELECT
    Reporting_Airline AS carrier,
    Distance,
    CONCAT(Origin, '-', Dest) AS route,
    Month,
    DayOfWeek,
    CASE
        WHEN DepDelay <= 0 THEN 'no_delay'
        WHEN DepDelay > 0 AND DepDelay <= 15 THEN 'short_delay'
        WHEN DepDelay > 15 AND DepDelay <= 60 THEN 'medium_delay'
        ELSE 'long_delay'
    END AS dep_delay_bucket,
    Diverted
FROM
    `flights_data.flights_raw`
WHERE
    Diverted IS NOT NULL
"""

print(create_engineered_features_sql)

SELECT
    Reporting_Airline AS carrier,
    Distance,
    CONCAT(Origin, '-', Dest) AS route,
    Month,
    DayOfWeek,
    CASE
        WHEN DepDelay <= 0 THEN 'no_delay'
        WHEN DepDelay > 0 AND DepDelay <= 15 THEN 'short_delay'
        WHEN DepDelay > 15 AND DepDelay <= 60 THEN 'medium_delay'
        ELSE 'long_delay'
    END AS dep_delay_bucket,
    Diverted
FROM
    `flights_data.flights_raw`
WHERE
    Diverted IS NOT NULL



## Create BigQuery ML Engineered Model

### Subtask:
Train a new BigQuery ML logistic regression model, named `pre_departure_engineered_model`, using the newly prepared features. The model will use the bucketized `dep_delay` feature alongside the existing schedule-level features, with 'Diverted' as the target variable.


**Reasoning**:

To train the BigQuery ML engineered model, I will construct a SQL query using the `CREATE MODEL` statement. This query will specify the model type, input label column, and use the previously defined `create_engineered_features_sql` as the training data, storing the resulting query in the `engineered_model_training_sql` variable.

**Reasoning**:
Construct the BigQuery SQL query to create the engineered logistic regression model using the `create_engineered_features_sql` and store it in `engineered_model_training_sql`, then print it for verification.



In [33]:
engineered_model_training_sql = f"""CREATE OR REPLACE MODEL `{PROJECT_ID}.flights_data.pre_departure_engineered_model`
OPTIONS(
    model_type='LOGISTIC_REG',
    input_label_cols=['Diverted']
)
AS
{create_engineered_features_sql}
"""

print(engineered_model_training_sql)

CREATE OR REPLACE MODEL `mgmt467-71800.flights_data.pre_departure_engineered_model`
OPTIONS(
    model_type='LOGISTIC_REG',
    input_label_cols=['Diverted']
)
AS
SELECT
    Reporting_Airline AS carrier,
    Distance,
    CONCAT(Origin, '-', Dest) AS route,
    Month,
    DayOfWeek,
    CASE
        WHEN DepDelay <= 0 THEN 'no_delay'
        WHEN DepDelay > 0 AND DepDelay <= 15 THEN 'short_delay'
        WHEN DepDelay > 15 AND DepDelay <= 60 THEN 'medium_delay'
        ELSE 'long_delay'
    END AS dep_delay_bucket,
    Diverted
FROM
    `flights_data.flights_raw`
WHERE
    Diverted IS NOT NULL




**Reasoning**:
Now that the `engineered_model_training_sql` has been constructed, I need to execute this SQL query using the BigQuery client to train the model.



In [34]:
print('Training BigQuery ML engineered model (this may take a few minutes)...')
bq.query(engineered_model_training_sql).result()
print('Engineered model training complete!')

Training BigQuery ML engineered model (this may take a few minutes)...
Engineered model training complete!


## Evaluate Engineered Model Performance (AUC, Log_Loss)

### Subtask:
Use `ML.EVALUATE` to assess the engineered model's performance, focusing on Area Under the Curve (AUC) and log_loss. This will allow for comparison with the baseline model and provide insights into the impact of the new `dep_delay_bucket` feature.

<b>Reasoning</b>:
Construct a BigQuery SQL query to evaluate the `pre_departure_engineered_model` using `ML.EVALUATE`, applying the same feature engineering logic as used for training data, and store it in the `evaluate_engineered_model_sql` variable.

**Reasoning**:
Construct the BigQuery SQL query to evaluate the engineered logistic regression model using the `create_engineered_features_sql` and store it in `evaluate_engineered_model_sql`, then print it for verification.



In [35]:
evaluate_engineered_model_sql = f"""SELECT
  * FROM ML.EVALUATE(
    MODEL `{PROJECT_ID}.flights_data.pre_departure_engineered_model`,
    ({create_engineered_features_sql})
)"""

print(evaluate_engineered_model_sql)

SELECT
  * FROM ML.EVALUATE(
    MODEL `mgmt467-71800.flights_data.pre_departure_engineered_model`,
    (SELECT
    Reporting_Airline AS carrier,
    Distance,
    CONCAT(Origin, '-', Dest) AS route,
    Month,
    DayOfWeek,
    CASE
        WHEN DepDelay <= 0 THEN 'no_delay'
        WHEN DepDelay > 0 AND DepDelay <= 15 THEN 'short_delay'
        WHEN DepDelay > 15 AND DepDelay <= 60 THEN 'medium_delay'
        ELSE 'long_delay'
    END AS dep_delay_bucket,
    Diverted
FROM
    `flights_data.flights_raw`
WHERE
    Diverted IS NOT NULL
)
)


**Reasoning**:
Now that the `evaluate_engineered_model_sql` has been constructed, I need to execute this SQL query using the BigQuery client to get the evaluation metrics as a Pandas DataFrame and display them.



# Task
Build and evaluate an engineered BigQuery ML logistic regression model for predicting flight diversions, incorporating a bucketized departure delay feature, and then compare its performance against the pre-departure baseline model.

## Evaluate Engineered Model Performance (AUC, Log_Loss)

### Subtask:
Use `ML.EVALUATE` to assess the engineered model's performance, focusing on Area Under the Curve (AUC) and log_loss. This will allow for comparison with the baseline model and provide insights into the impact of the new `dep_delay_bucket` feature.


**Reasoning**:
Execute the `evaluate_engineered_model_sql` using the BigQuery client to get the evaluation metrics as a Pandas DataFrame and display them.



In [36]:
print('Evaluating BigQuery ML engineered model...')
engineered_evaluation_results_df = bq.query(evaluate_engineered_model_sql).result().to_dataframe()
print('Engineered model evaluation complete!')

print(engineered_evaluation_results_df)

Evaluating BigQuery ML engineered model...
Engineered model evaluation complete!
   precision    recall  accuracy  f1_score  log_loss   roc_auc
0        1.0  0.000035  0.982807   0.00007  0.076117  0.784524


## Generate Confusion Matrix for Engineered Model

### Subtask:
Calculate and display the confusion matrix for the engineered model at a classification threshold of 0.5 using `ML.CONFUSION_MATRIX`. This will provide insights into its true positives, true negatives, false positives, and false negatives, similar to the baseline analysis.


**Reasoning**:
Construct a BigQuery SQL query to generate the confusion matrix for the `pre_departure_engineered_model` using `ML.CONFUSION_MATRIX` with a classification threshold of 0.5, and store it in the `engineered_confusion_matrix_sql` variable.



In [37]:
engineered_confusion_matrix_sql = f"""SELECT
  * FROM ML.CONFUSION_MATRIX(
    MODEL `{PROJECT_ID}.flights_data.pre_departure_engineered_model`,
    ({create_engineered_features_sql}),
    STRUCT(0.5 AS threshold)
)"""

print(engineered_confusion_matrix_sql)

SELECT
  * FROM ML.CONFUSION_MATRIX(
    MODEL `mgmt467-71800.flights_data.pre_departure_engineered_model`,
    (SELECT
    Reporting_Airline AS carrier,
    Distance,
    CONCAT(Origin, '-', Dest) AS route,
    Month,
    DayOfWeek,
    CASE
        WHEN DepDelay <= 0 THEN 'no_delay'
        WHEN DepDelay > 0 AND DepDelay <= 15 THEN 'short_delay'
        WHEN DepDelay > 15 AND DepDelay <= 60 THEN 'medium_delay'
        ELSE 'long_delay'
    END AS dep_delay_bucket,
    Diverted
FROM
    `flights_data.flights_raw`
WHERE
    Diverted IS NOT NULL
),
    STRUCT(0.5 AS threshold)
)


**Reasoning**:
Now that the `engineered_confusion_matrix_sql` has been constructed, I need to execute this SQL query using the BigQuery client to get the confusion matrix as a Pandas DataFrame and display it.



In [38]:
print('Generating engineered model confusion matrix...')
engineered_confusion_matrix_df = bq.query(engineered_confusion_matrix_sql).result().to_dataframe()
print('Engineered model confusion matrix generated!')

print(engineered_confusion_matrix_df)

Generating engineered model confusion matrix...
Engineered model confusion matrix generated!
   expected_label       _0  _1
0             0.0  1629748   0
1             1.0    28510   1


## Interpretation of Confusion Matrix for Engineered Model

### At a 0.5 threshold):

- True Negatives (TN): `1629748` - The model correctly predicted a vast majority of non-diverted flights.
- False Positives (FP): `0` - The model incorrectly predicted 0 flights as diverted when they were not. This is an improvement over the baseline model's 102 FPs, indicating perfect precision for positive predictions at this threshold.
- False Negatives (FN): `28510` - The model incorrectly predicted 28,510 flights as not diverted when they actually were. This is a very high number, similar to the baseline, indicating extremely low recall for the positive class.
- True Positives (TP): `1` - The model correctly identified only 1 actual diversion. This is a significant drop compared to the baseline's 30 TPs.

The confusion matrix for the engineered model shows that while it achieves perfect precision (0 False Positives) at the 0.5 threshold, its ability to identify actual diversions (True Positives) has severely degraded, with only 1 TP. This results in an even lower recall (0.000035) than the baseline model (0.001052). The model is extremely conservative in predicting diversions, effectively missing almost all actual events. The addition of the `dep_delay_bucket` feature has not improved the model's ability to capture positive cases at this threshold; in fact, it has made it more cautious, leading to an even worse recall.

**Reasoning**:
Now that the engineered model's confusion matrix has been interpreted, the next step is to analyze its calibration by constructing a BigQuery SQL query to predict probabilities, bin them, and calculate average predicted probability and actual diversion rates.



In [39]:
engineered_calibration_sql = f"""WITH Predictions AS (
    SELECT
        predicted_Diverted_probs[OFFSET(1)].prob AS predicted_probability,
        Diverted
    FROM
        ML.PREDICT(
            MODEL `{PROJECT_ID}.flights_data.pre_departure_engineered_model`,
            ({create_engineered_features_sql})
        )
),
BinnedPredictions AS (
    SELECT
        NTILE(10) OVER (ORDER BY predicted_probability) AS probability_bin,
        predicted_probability,
        Diverted
    FROM
        Predictions
)
SELECT
    probability_bin,
    AVG(predicted_probability) AS avg_predicted_probability,
    AVG(Diverted) AS actual_diversion_rate
FROM
    BinnedPredictions
GROUP BY
    probability_bin
ORDER BY
    probability_bin
"""

print(engineered_calibration_sql)

WITH Predictions AS (
    SELECT
        predicted_Diverted_probs[OFFSET(1)].prob AS predicted_probability,
        Diverted
    FROM
        ML.PREDICT(
            MODEL `mgmt467-71800.flights_data.pre_departure_engineered_model`,
            (SELECT
    Reporting_Airline AS carrier,
    Distance,
    CONCAT(Origin, '-', Dest) AS route,
    Month,
    DayOfWeek,
    CASE
        WHEN DepDelay <= 0 THEN 'no_delay'
        WHEN DepDelay > 0 AND DepDelay <= 15 THEN 'short_delay'
        WHEN DepDelay > 15 AND DepDelay <= 60 THEN 'medium_delay'
        ELSE 'long_delay'
    END AS dep_delay_bucket,
    Diverted
FROM
    `flights_data.flights_raw`
WHERE
    Diverted IS NOT NULL
)
        )
),
BinnedPredictions AS (
    SELECT
        NTILE(10) OVER (ORDER BY predicted_probability) AS probability_bin,
        predicted_probability,
        Diverted
    FROM
        Predictions
)
SELECT
    probability_bin,
    AVG(predicted_probability) AS avg_predicted_probability,
    AVG(Diverted) AS ac

**Reasoning**:
Now that the `engineered_calibration_sql` has been constructed, I need to execute this SQL query using the BigQuery client to get the calibration metrics as a Pandas DataFrame and display them.



In [40]:
print('Analyzing engineered model calibration...')
engineered_calibration_df = bq.query(engineered_calibration_sql).result().to_dataframe()
print('Engineered model calibration analysis complete!')

print(engineered_calibration_df)

Analyzing engineered model calibration...
Engineered model calibration analysis complete!
   probability_bin  avg_predicted_probability  actual_diversion_rate
0                1                   0.934499               0.082780
1                2                   0.967856               0.026733
2                3                   0.977454               0.016131
3                4                   0.983026               0.010409
4                5                   0.986746               0.007767
5                6                   0.989649               0.006284
6                7                   0.991890               0.007309
7                8                   0.993795               0.006712
8                9                   0.995587               0.005813
9               10                   0.997328               0.001996


## Analyze Engineered Model Calibration

The calibration analysis for the engineered model shows a similar pattern to the baseline model, but with some subtle differences:

*   **Overconfidence continues**: For lower probability bins (e.g., bin 1: average predicted `0.934499`, actual `0.082780`), the model is still significantly overconfident, predicting very high probabilities for non-diversion while the actual diversion rate is much lower. This indicates a consistent overestimation of certainty.
*   **Slightly different magnitudes**: While the overall trend of overconfidence persists, the specific values in each bin have changed slightly compared to the baseline. For instance, in bin 1, the engineered model's `avg_predicted_probability` is slightly lower than the baseline (0.934499 vs 0.939825), and the `actual_diversion_rate` is also slightly lower (0.082780 vs 0.083425). This suggests that the addition of `dep_delay_bucket` has not fundamentally fixed the calibration issue, but rather slightly altered the probability distribution.
*   **Highest confidence, still low actuals**: In the highest probability bin (bin 10: average predicted `0.997328`, actual `0.001996`), the model is highly confident in its prediction of non-diversion, but the actual diversion rate remains very low. The actual rate here is also marginally lower than the baseline's bin 10 (0.001996 vs 0.002050).

In summary, the engineered model, even with the addition of the `dep_delay_bucket` feature, continues to exhibit poor calibration. It remains overly confident in its predictions, particularly for the negative class (non-diversion), where its predicted probabilities are consistently higher than the observed actual diversion rates. The new feature did not significantly improve the model's ability to provide well-calibrated probabilities.

## Analyze Engineered Model Calibration

The calibration analysis for the engineered model shows a similar pattern to the baseline model, but with some subtle differences:

*   **Overconfidence continues**: For lower probability bins (e.g., bin 1: average predicted `0.934499`, actual `0.082780`), the model is still significantly overconfident, predicting very high probabilities for non-diversion while the actual diversion rate is much lower. This indicates a consistent overestimation of certainty.
*   **Slightly different magnitudes**: While the overall trend of overconfidence persists, the specific values in each bin have changed slightly compared to the baseline. For instance, in bin 1, the engineered model's `avg_predicted_probability` is slightly lower than the baseline (0.934499 vs 0.939825), and the `actual_diversion_rate` is also slightly lower (0.082780 vs 0.083425). This suggests that the addition of `dep_delay_bucket` has not fundamentally fixed the calibration issue, but rather slightly altered the probability distribution.
*   **Highest confidence, still low actuals**: In the highest probability bin (bin 10: average predicted `0.997328`, actual `0.001996`), the model is highly confident in its prediction of non-diversion, but the actual diversion rate remains very low. The actual rate here is also marginally lower than the baseline's bin 10 (0.001996 vs 0.002050).

In summary, the engineered model, even with the addition of the `dep_delay_bucket` feature, continues to exhibit poor calibration. It remains overly confident in its predictions, particularly for the negative class (non-diversion), where its predicted probabilities are consistently higher than the observed actual diversion rates. The new feature did not significantly improve the model's ability to provide well-calibrated probabilities.

## Compare Engineered Model to Baseline

### Subtask:
Compare the performance metrics (AUC, log_loss, confusion matrix insights, and calibration) of the `pre_departure_engineered_model` against the `pre_departure_baseline_model`. Highlight any improvements or degradations in performance due to the new engineered features, especially the bucketized `dep_delay`.


## Compare Engineered Model to Baseline

### Subtask:
Compare the performance metrics (AUC, log_loss, confusion matrix insights, and calibration) of the `pre_departure_engineered_model` against the `pre_departure_baseline_model`. Highlight any improvements or degradations in performance due to the new engineered features, especially the bucketized `dep_delay`.

### Comparison and Insights

**1. AUC (Area Under the Curve):**

*   **Baseline Model AUC:** 0.785921
*   **Engineered Model AUC:** 0.784524

The AUC for the engineered model is slightly lower than that of the baseline model (0.784524 vs 0.785921). This marginal decrease suggests that the addition of the `dep_delay_bucket` feature did not improve the model's overall ability to distinguish between positive and negative classes. The discriminative power remained largely the same, or even slightly degraded.

**2. Log_Loss:**

*   **Baseline Model Log_Loss:** 0.076178
*   **Engineered Model Log_Loss:** 0.076117

The engineered model shows a slightly lower log_loss (0.076117 vs 0.076178) compared to the baseline. A lower log_loss indicates that the engineered model's predicted probabilities are marginally closer to the actual outcomes. While a small improvement, it's not substantial enough to suggest a significant practical advantage on its own.

**3. Confusion Matrix (at a 0.5 threshold):**

| Metric              | Baseline Model       | Engineered Model     | Change/Implication                                                                                                      |
| :------------------ | :------------------- | :------------------- | :---------------------------------------------------------------------------------------------------------------------- |
| **True Negatives (TN)** | 1,629,646            | 1,629,748            | Slight increase (+102). Both models are excellent at identifying non-diverted flights.                                  |
| **False Positives (FP)**| 102                  | 0                    | Significant decrease (-102). Engineered model has perfect precision (no false alarms) at this threshold.                |
| **False Negatives (FN)**| 28,481               | 28,510               | Slight increase (+29). Both models struggle significantly with recall, missing most actual diversions.                  |
| **True Positives (TP)** | 30                   | 1                    | Drastic decrease (-29). The engineered model identified almost no actual diversions, leading to extremely low recall.   |
| **Recall (Diverted)**   | 0.001052 (30/28511)  | 0.000035 (1/28511)   | Severely degraded. The engineered model is even more conservative, effectively missing almost all actual diversion events. |
| **Precision (Diverted)**| 0.227273 (30/132)    | 1.0 (1/1)            | Improved significantly. Engineered model has perfect precision at 0.5 threshold due to 0 FPs.                           |

**Impact of `dep_delay_bucket`:**

The addition of the `dep_delay_bucket` feature had a detrimental effect on the model's ability to identify actual diversions (True Positives) at the 0.5 threshold. While it eliminated False Positives, leading to perfect precision, it severely reduced True Positives from 30 to just 1. This resulted in an even lower recall for the engineered model, making it practically useless for identifying diversions. The model became overly cautious and conservative in its predictions for the positive class.

**4. Calibration Analysis:**

Both models exhibit similar calibration issues:

*   **Baseline Calibration:**
    *   For the lowest probability bin (bin 1), `avg_predicted_probability` is 0.939825, while `actual_diversion_rate` is 0.083425.
    *   For the highest probability bin (bin 10), `avg_predicted_probability` is 0.997795, while `actual_diversion_rate` is 0.002050.

*   **Engineered Calibration:**
    *   For the lowest probability bin (bin 1), `avg_predicted_probability` is 0.934499, while `actual_diversion_rate` is 0.082780.
    *   For the highest probability bin (bin 10), `avg_predicted_probability` is 0.997328, while `actual_diversion_rate` is 0.001996.

Both models consistently show a significant overestimation of the probability of non-diversion (`avg_predicted_probability` is much higher than `actual_diversion_rate` across most bins). The engineered model's calibration profile remains largely similar to the baseline, suggesting that the `dep_delay_bucket` feature did not significantly improve or worsen the model's calibration; the problem of overconfidence persists.

**Conclusion:**

While the engineered model achieved a slightly better log_loss and perfect precision at the 0.5 threshold, the introduction of the `dep_delay_bucket` feature ultimately led to a **less effective model** for predicting flight diversions. The drastic drop in True Positives (from 30 to 1) and the resulting abysmal recall demonstrate that the engineered model is excessively conservative, missing almost all actual diversion events. This makes it unsuitable as an early warning system. The slight change in AUC and similar calibration issues further confirm that the feature engineering, at least in this form, did not yield a more practical or reliable model. Future efforts should explore other feature engineering techniques or modeling approaches that can better capture the rare diversion events and improve recall without sacrificing too much precision.

## Final Task

### Subtask:
Conclude the task by summarizing the findings for the engineered model, comparing it comprehensively to the baseline model, and discussing whether the feature engineering led to a more effective model for diversion risk prediction.


## Summary:

### Q&A
The feature engineering, specifically the addition of the `dep_delay_bucket` feature, did **not** lead to a more effective model for diversion risk prediction. While it achieved perfect precision (0 false positives) at a 0.5 threshold and a marginally lower log_loss, it drastically reduced the model's ability to identify actual diversions (True Positives dropped from 30 to 1), resulting in severely degraded recall (from 0.001052 to 0.000035). This made the model excessively conservative and practically ineffective as an early warning system.

### Data Analysis Key Findings
*   The engineered model showed a slightly lower Area Under the Curve (AUC) of 0.784524 compared to the baseline model's 0.785921, indicating a marginal decrease in its overall discriminative power.
*   The engineered model achieved a marginally better log_loss of 0.076117, slightly lower than the baseline's 0.076178, suggesting its predicted probabilities were slightly closer to actual outcomes.
*   At a 0.5 classification threshold, the engineered model exhibited perfect precision (1.0) by reducing False Positives from 102 in the baseline to 0.
*   However, the engineered model drastically reduced True Positives from 30 (baseline) to only 1, leading to a severe degradation in recall for the 'Diverted' class from 0.001052 to 0.000035.
*   Both the baseline and engineered models exhibited similar calibration issues, consistently overestimating the probability of non-diversion across most probability bins (e.g., for bin 1, engineered model's average predicted probability was 0.934499 versus an actual diversion rate of 0.082780). The `dep_delay_bucket` feature did not significantly improve the model's calibration.

### Insights or Next Steps
*   The current feature engineering approach, using `dep_delay_bucket`, made the model excessively conservative, prioritizing the avoidance of false positives over the detection of actual diversions. This is problematic for an early warning system where identifying as many true diversions as possible is crucial.
*   Future work should explore alternative feature engineering strategies or model architectures that can improve recall for the positive class (diversions) without completely sacrificing precision. This might involve different ways of encoding `dep_delay`, investigating other relevant features, or adjusting model thresholds and loss functions to better handle the imbalanced nature of diversion events.
