<a href="https://colab.research.google.com/github/garci843/Unit1_TheLook_Team1/blob/main/Unit2_BQML_Titanic_Classification_Template.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>


# Ethan Garcia Unit 2 — Team 1 (Titanic, BQML)

**Goal (team):** Build an *ops-ready* classifier in **BigQuery ML** to predict **`survived`** on the Titanic dataset. Requirements mirror the Flights notebook for comparability.
    
**Dataset:** `bigquery-public-data.ml_datasets.titanic`

**Deliver (inside this notebook):**
- One **LOGISTIC_REG** baseline + one **engineered** model (`TRANSFORM`)
- **Evaluation** via `ML.EVALUATE` and **confusion matrices** (0.5 + custom threshold)
- **Threshold choice** + 3–5 sentence ops justification (e.g., lifeboat allocation policy in a hypothetical ops setting)
- Embedded **rubric** below


In [1]:

# --- Minimal setup (edit 2 vars) ---
from google.colab import auth
auth.authenticate_user()

import os
from google.cloud import bigquery

PROJECT_ID = "mgmt467-4889"   # e.g., mgmt-467-47888
REGION     = "us-central1"
TABLE_PATH = 'mgmt467-4889.Titanic.Titanic-dataset'

os.environ["PROJECT_ID"] = PROJECT_ID
os.environ["REGION"]     = REGION
bq = bigquery.Client(project=PROJECT_ID)

print("BQ Project:", PROJECT_ID)
print("Source table:", TABLE_PATH)


BQ Project: mgmt467-4889
Source table: mgmt467-4889.Titanic.Titanic-dataset


### Quick sanity check

In [2]:

bq.query(f"SELECT * FROM `{TABLE_PATH}` LIMIT 5").result().to_dataframe()


Unnamed: 0,PassengerId,Survived,Pclass,Name,Sex,Age,SibSp,Parch,Ticket,Fare,Cabin,Embarked
0,180,0,3,"Leonard, Mr. Lionel",male,36.0,0,0,LINE,0.0,,S
1,264,0,1,"Harrison, Mr. William",male,40.0,0,0,112059,0.0,B94,S
2,278,0,2,"Parkes, Mr. Francis ""Frank""",male,,0,0,239853,0.0,,S
3,303,0,3,"Johnson, Mr. William Cahoone Jr",male,19.0,0,0,LINE,0.0,,S
4,414,0,2,"Cunningham, Mr. Alfred Fleming",male,,0,0,239853,0.0,,S



## 1) Canonical mapping (minimal)
We map to:
- `survived` (BOOL), `pclass` (INT), `sex` (STRING), `age` (NUM), `sibsp` (INT), `parch` (INT), `fare` (NUM), `embarked` (STRING)


In [3]:

CANONICAL_BASE_SQL = f'''
WITH titanic_c AS (
  SELECT
    CAST(Survived AS BOOL) AS survived,
    CAST(Pclass AS INT64)  AS pclass,
    CAST(Sex AS STRING)    AS sex,
    CAST(Age AS FLOAT64)   AS age,
    CAST(SibSp AS INT64)   AS sibsp,
    CAST(Parch AS INT64)   AS parch,
    CAST(Fare AS FLOAT64)  AS fare,
    CAST(Embarked AS STRING) AS embarked
  FROM `{TABLE_PATH}`
  WHERE age IS NOT NULL AND fare IS NOT NULL
)
'''
print(CANONICAL_BASE_SQL[:500] + "\n...")



WITH titanic_c AS (
  SELECT
    CAST(Survived AS BOOL) AS survived,
    CAST(Pclass AS INT64)  AS pclass,
    CAST(Sex AS STRING)    AS sex,
    CAST(Age AS FLOAT64)   AS age,
    CAST(SibSp AS INT64)   AS sibsp,
    CAST(Parch AS INT64)   AS parch,
    CAST(Fare AS FLOAT64)  AS fare,
    CAST(Embarked AS STRING) AS embarked
  FROM `mgmt467-4889.Titanic.Titanic-dataset`
  WHERE age IS NOT NULL AND fare IS NOT NULL
)

...


### 2) Split (80/20)

In [16]:
SPLIT_CLAUSE = r'''
, split AS (
  SELECT t.*,
         CASE WHEN RAND() < 0.8 THEN 'TRAIN' ELSE 'EVAL' END AS split_col
  FROM titanic_c t
)
'''
print(SPLIT_CLAUSE)


, split AS (
  SELECT t.*,
         CASE WHEN RAND() < 0.8 THEN 'TRAIN' ELSE 'EVAL' END AS split_col
  FROM titanic_c t
)




## 3) Baseline model — LOGISTIC_REG (`survived`)
Use a small set of signals (keep parity with Flights complexity).


In [29]:
# I split this into 3 different cells to make this easier to understand
# SCHEMA = f"{PROJECT_ID}.unit2_titanic"
# MODEL_BASE = f"{SCHEMA}.clf_survived_base"

# sql_baseline = f'''
# {CANONICAL_BASE_SQL}
# {SPLIT_CLAUSE}

# CREATE SCHEMA IF NOT EXISTS `{SCHEMA}`;

# CREATE OR REPLACE MODEL `{MODEL_BASE}`
# OPTIONS (MODEL_TYPE='LOGISTIC_REG', INPUT_LABEL_COLS=['survived']) AS
# SELECT
#   survived, pclass, sex, age, sibsp, parch, fare, embarked
# FROM split
# WHERE split='TRAIN'
# ;

# SELECT * FROM ML.EVALUATE(
#   MODEL `{MODEL_BASE}`,
#   (SELECT survived, pclass, sex, age, sibsp, parch, fare, embarked
#    FROM split WHERE split='EVAL')
# );
# '''
# job = bq.query(sql_baseline); _ = job.result()
# print("Baseline model trained:", MODEL_BASE)


In [9]:
SCHEMA = f"{PROJECT_ID}.unit2_titanic"
MODEL_BASE = f"{SCHEMA}.clf_survived_base"

# Create schema
sql_create_schema = f'''
CREATE SCHEMA IF NOT EXISTS `{SCHEMA}`;
'''
job = bq.query(sql_create_schema); _ = job.result()
print(f"Schema created or exists: {SCHEMA}")

Schema created or exists: mgmt467-4889.unit2_titanic


In [13]:
# Create or replace model
sql_create_model = f'''
CREATE OR REPLACE MODEL `{MODEL_BASE}`
OPTIONS (MODEL_TYPE='LOGISTIC_REG', INPUT_LABEL_COLS=['survived']) AS
{CANONICAL_BASE_SQL.replace('WITH titanic_c AS (', 'WITH titanic_c AS (')[:-1]} -- Remove trailing newline and closing parenthesis
, split AS (
  SELECT t.*,
         CASE WHEN RAND() < 0.8 THEN 'TRAIN' ELSE 'EVAL' END AS split_col
  FROM titanic_c t
)
SELECT
  survived, pclass, sex, age, sibsp, parch, fare, embarked
FROM split
WHERE split_col='TRAIN'
;
'''
job = bq.query(sql_create_model); _ = job.result()
print("Baseline model trained:", MODEL_BASE)

Baseline model trained: mgmt467-4889.unit2_titanic.clf_survived_base


In [19]:
# Evaluate model
sql_evaluate_model = f'''
{CANONICAL_BASE_SQL}
{SPLIT_CLAUSE}

SELECT * FROM ML.EVALUATE(
  MODEL `{MODEL_BASE}`,
  (SELECT survived, pclass, sex, age, sibsp, parch, fare, embarked
   FROM split WHERE split_col='EVAL')
);
'''
job = bq.query(sql_evaluate_model)
evaluation_results = job.result().to_dataframe()
print("\nBaseline model evaluation:")
display(evaluation_results)


Baseline model evaluation:


Unnamed: 0,precision,recall,accuracy,f1_score,log_loss,roc_auc
0,0.866667,0.753623,0.821429,0.806202,0.409095,0.8911


Based on the evaluation results, the baseline model achieved an accuracy of 82% and an AUC of 89%. The precision is 87% and recall is 75%, indicating a good balance between correctly identifying positive cases and minimizing false positives.

### Confusion matrix — default 0.5 threshold

In [21]:
cm_default_sql = f'''
{CANONICAL_BASE_SQL}
{SPLIT_CLAUSE}

SELECT
  SUM(CASE WHEN scored.label=TRUE  AND scored.pred_label=TRUE  THEN 1 ELSE 0 END) AS TP,
  SUM(CASE WHEN scored.label=FALSE AND scored.pred_label=TRUE  THEN 1 ELSE 0 END) AS FP,
  SUM(CASE WHEN scored.label=TRUE  AND scored.pred_label=FALSE THEN 1 ELSE 0 END) AS FN,
  SUM(CASE WHEN scored.label=FALSE AND scored.pred_label=FALSE THEN 1 ELSE 0 END) AS TN
FROM (
  SELECT
    t.survived AS label,
    p.predicted_survived AS pred_label,
    p.predicted_survived_probs[OFFSET(0)].prob AS score
  FROM split t
  JOIN ML.PREDICT(MODEL `{MODEL_BASE}`,
      (SELECT pclass, sex, age, sibsp, parch, fare, embarked FROM split WHERE split_col='EVAL')) AS p
  ON TRUE
  WHERE split_col='EVAL'
) AS scored;
'''
bq.query(cm_default_sql).result().to_dataframe()

Unnamed: 0,TP,FP,FN,TN
0,3180,3710,5160,6020


This baseline model gives me an accuracy of 51%, so the model is almost as well off if it just guessed randomly.

### Confusion matrix — your custom threshold

In [23]:
CUSTOM_THRESHOLD = 0.6   # TODO: justify in ops (e.g., conservative rescue policy)

cm_thresh_sql = f'''
{CANONICAL_BASE_SQL}
{SPLIT_CLAUSE}

SELECT
  SUM(CASE WHEN scored.label=TRUE  AND scored.pred_label=TRUE  THEN 1 ELSE 0 END) AS TP,
  SUM(CASE WHEN scored.label=FALSE AND scored.pred_label=TRUE  THEN 1 ELSE 0 END) AS FP,
  SUM(CASE WHEN scored.label=TRUE  AND scored.pred_label=FALSE THEN 1 ELSE 0 END) AS FN,
  SUM(CASE WHEN scored.label=FALSE AND scored.pred_label=FALSE THEN 1 ELSE 0 END) AS TN
FROM (
  SELECT
    t.survived AS label,
    CAST(p.predicted_survived_probs[OFFSET(0)].prob >= {CUSTOM_THRESHOLD} AS BOOL) AS pred_label,
    p.predicted_survived_probs[OFFSET(0)].prob AS score
  FROM split t
  JOIN ML.PREDICT(MODEL `{MODEL_BASE}`,
      (SELECT pclass, sex, age, sibsp, parch, fare, embarked FROM split WHERE split_col='EVAL')) AS p
  ON TRUE
  WHERE split_col='EVAL'
) AS scored;
'''
bq.query(cm_thresh_sql).result().to_dataframe()

Unnamed: 0,TP,FP,FN,TN
0,3180,4664,5220,7656


This model gives me 60% accuracy, which is a significant boost. I do have some more false negatives, which is to be expected when increasing the threshold.


## 4) Engineered model — `TRANSFORM`
Create **family_size**, **fare_bucket**, and a **sex_pclass** interaction (categorical). Compare with baseline.


In [30]:
# I split this into multiple cells to make it easier to understand
# MODEL_XFORM = f"{SCHEMA}.clf_survived_xform"

# sql_xform = f'''
# {CANONICAL_BASE_SQL}
# {SPLIT_CLAUSE}

# CREATE OR REPLACE MODEL `{MODEL_XFORM}`
# TRANSFORM (
#   -- engineered
#   (sibsp + parch + 1) AS family_size,
#   CASE
#     WHEN fare < 10 THEN 'low'
#     WHEN fare < 50 THEN 'mid'
#     ELSE 'high'
#   END AS fare_bucket,
#   CONCAT(sex, '_', CAST(pclass AS STRING)) AS sex_pclass,
#   -- include base features too
#   pclass, sex, age, sibsp, parch, fare, embarked
# )
# OPTIONS (MODEL_TYPE='LOGISTIC_REG', INPUT_LABEL_COLS=['survived']) AS
# SELECT * FROM split WHERE split='TRAIN'
# ;

# SELECT 'baseline' AS model_version, * FROM ML.EVALUATE(
#   MODEL `{MODEL_BASE}`,
#   (SELECT survived, pclass, sex, age, sibsp, parch, fare, embarked FROM split WHERE split='EVAL')
# )
# UNION ALL
# SELECT 'engineered' AS model_version, * FROM ML.EVALUATE(
#   MODEL `{MODEL_XFORM}`,
#   (SELECT * FROM split WHERE split='EVAL')
# );
# '''
# job = bq.query(sql_xform); _ = job.result()
# print("Engineered model trained:", MODEL_XFORM)


In [27]:
MODEL_XFORM = f"{SCHEMA}.clf_survived_xform"

sql_create_xform_model = f'''
CREATE OR REPLACE MODEL `{MODEL_XFORM}`
TRANSFORM (
  -- engineered
  (sibsp + parch + 1) AS family_size,
  CASE
    WHEN fare < 10 THEN 'low'
    WHEN fare < 50 THEN 'mid'
    ELSE 'high'
  END AS fare_bucket,
  CONCAT(sex, '_', CAST(pclass AS STRING)) AS sex_pclass,
  -- include base features too
  pclass, sex, age, sibsp, parch, fare, embarked, survived
)
OPTIONS (MODEL_TYPE='LOGISTIC_REG', INPUT_LABEL_COLS=['survived']) AS
{CANONICAL_BASE_SQL}
{SPLIT_CLAUSE}
SELECT * FROM split WHERE split_col='TRAIN'
;
'''
job = bq.query(sql_create_xform_model); _ = job.result()
print("Engineered model trained:", MODEL_XFORM)

Engineered model trained: mgmt467-4889.unit2_titanic.clf_survived_xform


In [28]:
sql_compare_eval = f'''
{CANONICAL_BASE_SQL}
{SPLIT_CLAUSE}

SELECT 'baseline' AS model_version, * FROM ML.EVALUATE(
  MODEL `{MODEL_BASE}`,
  (SELECT survived, pclass, sex, age, sibsp, parch, fare, embarked FROM split WHERE split_col='EVAL')
)
UNION ALL
SELECT 'engineered' AS model_version, * FROM ML.EVALUATE(
  MODEL `{MODEL_XFORM}`,
  (SELECT survived, pclass, sex, age, sibsp, parch, fare, embarked FROM split WHERE split_col='EVAL')
);
'''
job = bq.query(sql_compare_eval)
comparison_results = job.result().to_dataframe()
print("\nBaseline vs Engineered Model Evaluation:")
display(comparison_results)


Baseline vs Engineered Model Evaluation:


Unnamed: 0,model_version,precision,recall,accuracy,f1_score,log_loss,roc_auc
0,baseline,0.745763,0.721311,0.782313,0.733333,0.453096,0.862351
1,engineered,0.804348,0.596774,0.725806,0.685185,0.527149,0.821465


Based on the comparison results, the engineered model achieved higher precision of 80% compared to the baseline model of 75%. However, the engineered model showed lower recall of 60% and a lower ROC AUC 82% than the baseline model of 72% recall and 86 AUC%,. Overall, the engineered model appears more precise in its positive predictions but less effective at identifying all actual positive cases across different thresholds compared to the baseline.


### Write-up (concise)
- **Threshold chosen & ops rationale:** …  
- **Baseline vs engineered — changes in AUC/precision/recall:** …  
- **Risk framing:** FP vs FN trade in a rescue/triage-like context: what error hurts more and why? …


### Threshold Chosen & Ops Rationale
Custom Threshold Chosen: 0.6

The confusion matrices show that using the default 0.5 threshold yields a low accuracy of 51%, meaning the model is nearly as good as a random guess. By increasing the threshold to 0.6, the model's accuracy improves to 60% on the evaluation set.

Operational Rationale: In a hypothetical lifeboat allocation setting, a higher threshold means the model is more conservative in predicting survival, prioritizing Precision (fewer false alarms) over Recall. This translates to fewer False Positives (FP). A False Positive would mean allocating a limited lifeboat spot to someone the model predicted would survive, but who actually would not have survived. By minimizing FP, we ensure that a higher percentage of the people the model flags for rescue (predicted survivors) are actually those with a high probability of survival, making the limited rescue resources more effective.

### Model Comparison
The Engineered Model (which includes family_size, fare_bucket, and sex_pclass) significantly increased Precision (from 0.746 to 0.804). This means that when the engineered model predicts someone will survive, it's more often correct.

However, this gain came at a cost to Recall (dropping from 0.721 to 0.597) and ROC AUC (dropping from 0.862 to 0.821). The lower Recall indicates the engineered model missed identifying more actual survivors (False Negatives) compared to the baseline. The lower ROC AUC suggests the engineered model is less effective at ranking survival probability across all possible thresholds.

In short, the engineered model is more confident but less comprehensive in identifying survivors.

### Risk Framing: FP vs FN Trade in a Rescue/Triage-like Context
The risk trade-off depends on the operational goal of the model.

False Positives (FP): Predicting a passenger will survive, but they actually do not. This is a Type I Error.

Consequence: Misallocation of a limited rescue resource (like a spot on a lifeboat). An FP means a spot went to someone who wouldn't have survived anyway, denying it to a potential True Negative (someone who wouldn't have survived and wasn't predicted to).

False Negatives (FN): Predicting a passenger will not survive, but they actually do. This is a Type II Error.

Consequence: A missed rescue opportunity. An FN means the model failed to identify a person who would have survived with intervention, leading to an avoidable death.

Justification: In a severe, limited-resource scenario like the Titanic, a False Negative (FN) hurts more. The ethical and operational mandate is to save the maximum number of people who can be saved. An FN represents a preventable loss of life. Therefore, a rescue policy should generally aim to maximize Recall (minimizing FN) even if it means accepting a slightly lower Precision (more FP/false alarms), provided the false alarms don't completely exhaust resources on those with zero chance of survival.


---

## Rubric (Titanic, 100 pts)
**Team-only deliverable in this notebook**

- Baseline LOGISTIC_REG + evaluation (AUC + confusion @0.5) — **20**  
- Custom threshold confusion matrix + ops justification — **20**  
- Engineered model with `TRANSFORM` (family_size, fare_bucket, sex_pclass) — **20**  
- Comparison table (baseline vs engineered) + 3–5 sentence interpretation — **20**  
- Reproducibility: parameters clear, no hidden magic; mapping documented — **10**  
- Governance notes: assumptions/limitations + slices you would monitor — **10**

> **Strictness:** No screenshots; use actual results cells. Keep explanations concise (bullet points OK).
