
# 🤖 MGMT 467 - Unit 2 Lab 2: Prompt Studio for AI-Assisted SQL + ML

**Date:** 2025-10-16  
**Objective:** Build and refine a complete ML pipeline for churn prediction using BigQuery — but with **Gemini-style prompts** guiding SQL generation.

You'll learn to:
- Frame SQL goals as clear prompts
- Generate, test, and debug queries with an AI assistant
- Reflect on each modeling step and your prompt design



## Task 0: Connect to BigQuery

**🎯 Goal:** Verify BigQuery access from Colab.  
**📌 Requirements:** Use `%%bigquery`, get current date and user session.

---

### 🧠 Prompt Template  
> Write a SQL query that returns CURRENT_DATE() and SESSION_USER(). I will run it with %%bigquery in Colab.

---

### 👩‍🏫 Example Prompt  
> Write a SQL query using BigQuery syntax that returns today’s date and the current session user.

---

### ✅ Expected SQL Output
```sql
SELECT CURRENT_DATE() AS today, SESSION_USER() AS user;
```

---

### 🔍 Checkpoint  
Query should return a single row with today's date and your user.


In [None]:
%%bigquery --project mgmt467-lab
SELECT CURRENT_DATE() AS today, SESSION_USER() AS user;

Query is running:   0%|          |

Downloading:   0%|          |

Unnamed: 0,today,user
0,2025-10-25,susanchen848@gmail.com



## Task 1: Prepare ML Table

**🎯 Goal:** Create a clean features table for modeling churn.  
**📌 Requirements:** Use cleaned_features as source, select relevant columns, filter rows with churn_label IS NOT NULL.

---

### 🧠 Prompt Template  
> Write a query that creates a new table with columns: [region, plan_tier, age_band, ...] and churn_label from [source_table]. Filter to rows where churn_label IS NOT NULL.

---

### 👩‍🏫 Example Prompt  
> Create a BigQuery table named churn_features from cleaned_features with selected features and where churn_label IS NOT NULL.

---

### ✅ Expected SQL Output
```sql
CREATE OR REPLACE TABLE `your_dataset.churn_features` AS
SELECT region, plan_tier, age_band, avg_rating, total_minutes, churn_label
FROM `your_dataset.cleaned_features`
WHERE churn_label IS NOT NULL;
```

---

### 🔍 Checkpoint  
Table should appear in BigQuery and contain non-null labels.


In [None]:
# prompt: Create a clean features table for modeling churn.
# Requirements: Use cleaned_features as source, select relevant columns, filter rows with churn_label IS NOT NULL.  Write a query that creates a new table with columns: [region, plan_tier, age_band, ...] and churn_label from [source_table]. Filter to rows where churn_label IS NOT NULL.

import pandas_gbq

sql = """
CREATE OR REPLACE TABLE `mgmt467-lab.netflix.churn_features` AS
SELECT
    t1.user_id,
    t1.month,
    t1.r3_sess,
    t1.r3_min,
    t1.unique_days_watched,
    t1.avg_watch_duration,
    t1.days_since_last_month_start,
    t1.subscription_plan,
    t1.country,
    t1.age,
    t2.active_next_month AS churn_label
FROM
    `mgmt467-lab.netflix.feat_churn_lite` AS t1
JOIN
    `mgmt467-lab.netflix.labels_next_month` AS t2
ON
    t1.user_id = t2.user_id AND t1.month = t2.month
WHERE
    t2.active_next_month IS NOT NULL
"""

pandas_gbq.read_gbq(sql, project_id="mgmt467-lab", dialect="standard")

Downloading: |[32m          [0m|


Unnamed: 0,user_id,month,r3_sess,r3_min,unique_days_watched,avg_watch_duration,days_since_last_month_start,subscription_plan,country,age,churn_label


In [None]:
# prompt: checkpoint:Table should appear in BigQuery and contain non-null labels.

sql = """
SELECT
    COUNT(*) AS total_rows,
    COUNTIF(churn_label IS NULL) AS null_churn_labels
FROM
    `mgmt467-lab.netflix.churn_features`
"""
df = pandas_gbq.read_gbq(sql, project_id="mgmt467-lab", dialect="standard")
df

Downloading: 100%|[32m██████████[0m|


Unnamed: 0,total_rows,null_churn_labels
0,4974900,0



## Task 2: Train Logistic Regression Model

**🎯 Goal:** Train a basic BQML logistic regression model.  
**📌 Requirements:** Use churn_features table, predict churn_label from features.

---

### 🧠 Prompt Template  
> Write a CREATE MODEL SQL for logistic regression using churn_label as label and [features] as inputs.

---

### 👩‍🏫 Example Prompt  
> Train a logistic regression model to predict churn_label using region, plan_tier, total_minutes, avg_rating.

---

### ✅ Expected SQL Output
```sql
CREATE OR REPLACE MODEL `your_dataset.churn_model`
OPTIONS(model_type='logistic_reg') AS
SELECT region, plan_tier, total_minutes, avg_rating, churn_label
FROM `your_dataset.churn_features`;
```

---

### 🔍 Checkpoint  
Model appears in BigQuery under Models. Training completes.


In [None]:
# prompt: Write a CREATE MODEL SQL for logistic regression using churn_label as label and [features] as inputs. checkpoint: Model appears in BigQuery under Models. Training completes. example: CREATE OR REPLACE MODEL `your_dataset.churn_model`
# OPTIONS(model_type='logistic_reg') AS
# SELECT region, plan_tier, total_minutes, avg_rating, churn_label
# FROM `your_dataset.churn_features`;

sql = """
CREATE OR REPLACE MODEL `mgmt467-lab.netflix.churn_logreg_lite`
OPTIONS(model_type='logistic_reg',
        input_label_cols=['churn_label']) AS
SELECT
    r3_sess,
    r3_min,
    unique_days_watched,
    avg_watch_duration,
    days_since_last_month_start,
    subscription_plan,
    country,
    age,
    churn_label
FROM
    `mgmt467-lab.netflix.churn_features`
"""
pandas_gbq.read_gbq(sql, project_id="mgmt467-lab", dialect="standard")


## Task 3: Evaluate Model

**🎯 Goal:** Evaluate the logistic regression model.  
**📌 Requirements:** Use ML.EVALUATE.

---

### 🧠 Prompt Template  
> Write a query to evaluate my logistic regression model using ML.EVALUATE.

---

### 👩‍🏫 Example Prompt  
> Evaluate the churn_model using ML.EVALUATE to get accuracy, precision, recall.

---

### ✅ Expected SQL Output
```sql
SELECT * FROM ML.EVALUATE(MODEL `your_dataset.churn_model`);
```

---

### 🔍 Checkpoint  
View performance metrics: accuracy, log_loss, precision, recall.


In [None]:
# prompt: Goal: Evaluate the logistic regression model.
# Requirements: Use ML.EVALUATE. Evaluate the churn_model using ML.EVALUATE to get accuracy, precision, recall.

sql = """
SELECT
  *
FROM
  ML.EVALUATE(MODEL `mgmt467-lab.netflix.churn_logreg_lite`)
"""
df = pandas_gbq.read_gbq(sql, project_id="mgmt467-lab", dialect="standard")
df

Downloading: 100%|[32m██████████[0m|


Unnamed: 0,precision,recall,accuracy,f1_score,log_loss,roc_auc
0,0.0,0.0,0.661051,0.0,0.640383,0.498698



## Task 4: Predict Churn

**🎯 Goal:** Use ML.PREDICT to generate churn predictions.  
**📌 Requirements:** Apply model to same input table.

---

### 🧠 Prompt Template  
> Generate SQL to use ML.PREDICT on churn_model and return predictions by user_id.

---

### 👩‍🏫 Example Prompt  
> Predict churn using churn_model. Include user_id, predicted_churn_label, and prediction probability.

---

### ✅ Expected SQL Output
```sql
SELECT user_id, predicted_churn_label, predicted_churn_label_probs
FROM ML.PREDICT(MODEL `your_dataset.churn_model`,
      (SELECT * FROM `your_dataset.churn_features`));
```

---

### 🔍 Checkpoint  
Inspect top churn risk users. Validate probabilities.


In [None]:
# prompt: Goal: Use ML.PREDICT to generate churn predictions.
# Requirements: Apply model to same input table. Predict churn using churn_model. Include user_id, predicted_churn_label, and prediction probability. Checkpoint
# Inspect top churn risk users. Validate probabilities.

sql = """
SELECT
    t1.user_id,
    t1.month AS score_month,
    predicted_churn_label as yhat,
    predicted_churn_label_probs[OFFSET(0)].prob as prob_churn
FROM
    ML.PREDICT(MODEL `mgmt467-lab.netflix.churn_logreg_lite`,
        (SELECT * FROM `mgmt467-lab.netflix.churn_features` AS t1)) AS t1
ORDER BY prob_churn DESC
LIMIT 10
"""
df = pandas_gbq.read_gbq(sql, project_id="mgmt467-lab", dialect="standard")
df

Downloading: 100%|[32m██████████[0m|


Unnamed: 0,user_id,score_month,yhat,prob_churn
0,user_07283,2024-03-01,0,0.341964
1,user_07283,2024-03-01,0,0.341964
2,user_07283,2024-03-01,0,0.341964
3,user_07283,2024-03-01,0,0.341964
4,user_07283,2024-03-01,0,0.341964
5,user_07283,2024-03-01,0,0.341964
6,user_07283,2024-03-01,0,0.341964
7,user_07283,2024-03-01,0,0.341964
8,user_07283,2024-03-01,0,0.341964
9,user_07283,2024-03-01,0,0.341964
