# Lab: Vertex AI–Assisted BigQuery Analytics — Example Prompts
**Goal:** Practice moving from simple SQL to complex analytics in BigQuery using *only* carefully engineered prompts with Vertex AI (Gemini).  
**Important:** This notebook contains **prompts only** (no starter code). Paste the prompts into **Vertex AI Studio**, **Vertex AI in Colab Enterprise**, or your chosen chat interface, and then run the generated SQL directly in **BigQuery**. If you decide to automate later, you can ask Vertex AI to convert the winning SQL into a Colab pipeline.

## How to use this prompts-only notebook
1. Open **Vertex AI Studio** (or Gemini in Colab Enterprise chat panel).  
2. Copy a prompt from this notebook and paste it into the model. Do **not** paste any code from here; let the model generate it.  
3. Run the generated SQL in **BigQuery** (Console → BigQuery Studio).  
4. Iterate: refine the prompt when results aren’t what you expect.  
5. Document: capture your final SQL, plus a one-sentence takeaway, in your notes/README.

## Dataset assumptions
Use one of these sources (adjust table paths accordingly):
- **Global Superstore (Kaggle)** loaded into BigQuery (e.g., `[YOUR_PROJECT].superstore_data.sales`)  
- **TheLook eCommerce** public dataset: `bigquery-public-data.thelook_ecommerce`  
If you are using *Global Superstore*, make sure column names match your schema (e.g., `Order_Date`, `Region`, `Category`, `Sub_Category`, `Sales`, `Profit`, `Discount`, `State`, `Customer_ID`, `Ship_Mode`).

---
## Prompting guardrails (quick checklist)
- **Be explicit**: table path, column names, filters, output columns, sort order, and limits.  
- **Ask for runnable SQL**: “Return a BigQuery SQL block only.”  
- **Control cost**: ask for `LIMIT` during exploration and remove it for the final run.  
- **Validate**: request a brief explanation of why each clause is present and how you can sanity-check results.
---

## Install Dependencies

In [1]:
# Install the Google Cloud BigQuery client library
!pip install google-cloud-bigquery==3.17.0 pandas==2.1.4

# Authenticate your Colab environment




In [2]:
from google.colab import auth
auth.authenticate_user()
print('Authenticated')

Authenticated


## Copy Schema to a dataframe

In [69]:
from google.cloud import bigquery
import pandas as pd

# Replace with your Google Cloud Project ID
project_id = 'my-project-mgmt-467' # This is derived from your provided table name
dataset_id = 'LAB1_Fundation'
table_id = 'superstore'

# Construct a BigQuery client object.
client = bigquery.Client(project=project_id)

# Get the table object
table_ref = client.dataset(dataset_id).table(table_id)
table = client.get_table(table_ref)

# Extract schema information
schema_list = []
for field in table.schema:
    schema_list.append({
        'name': field.name,
        'field_type': field.field_type,
        'mode': field.mode,
        'description': field.description
    })

# Convert to Pandas DataFrame
schema_df = pd.DataFrame(schema_list)

# Display the schema DataFrame (optional, for verification)
print("Schema DataFrame created:")
# To see the output, run the code.


Schema DataFrame created:


## CLean Column Names

In [68]:
# --- 1. Clean the Column Names ---
# Create a 'clean_name' column with standard naming conventions:
# lowercase, with spaces and hyphens replaced by underscores.
schema_df['clean_name'] = schema_df['name'].str.lower().str.replace(' ', '_').str.replace('-', '_')


# --- 2. Generate the Aliases for the SELECT Clause ---
column_expressions = []
for index, row in schema_df.iterrows():
    original_name = row['name']
    clean_name = row['clean_name']

    # If the original name contains a space or special character, it needs to be
    # enclosed in backticks (`) in the SQL statement.
    if ' ' in original_name or '-' in original_name:
        expression = f'`{original_name}` AS {clean_name}'
    else:
        # If the name is already clean, we still alias it for consistency.
        expression = f'{original_name} AS {clean_name}'
    column_expressions.append(expression)

# Join all the individual column expressions into a single, formatted string.
select_clause = ",\n  ".join(column_expressions)


# --- 3. Construct the Final CREATE VIEW Statement ---
new_view_id = 'superstore_clean' # You can change this if you like

create_view_sql = f"""
CREATE OR REPLACE VIEW `{project_id}.{dataset_id}.{new_view_id}` AS
SELECT
  {select_clause}
FROM
  `{project_id}.{dataset_id}.{table_id}`;
"""

# --- 4. Print the Final SQL ---
print("--- Copy the SQL below and run it in your BigQuery Console ---")
print(create_view_sql)

--- Copy the SQL below and run it in your BigQuery Console ---

CREATE OR REPLACE VIEW `my-project-mgmt-467.LAB1_Fundation.superstore_clean` AS
SELECT
  `Row ID` AS row_id,
  `Order ID` AS order_id,
  `Order Date` AS order_date,
  `Ship Date` AS ship_date,
  `Ship Mode` AS ship_mode,
  `Customer ID` AS customer_id,
  `Customer Name` AS customer_name,
  Segment AS segment,
  Country AS country,
  City AS city,
  State AS state,
  `Postal Code` AS postal_code,
  Region AS region,
  `Product ID` AS product_id,
  Category AS category,
  `Sub-Category` AS sub_category,
  `Product Name` AS product_name,
  Sales AS sales,
  Quantity AS quantity,
  Discount AS discount,
  Profit AS profit
FROM
  `my-project-mgmt-467.LAB1_Fundation.superstore`;



## Generate View with standard column naming convention

In [67]:
from google.cloud import bigquery
import pandas as pd

# Replace with your Google Cloud Project ID
project_id = 'my-project-mgmt-467' # This is derived from your provided table name
dataset_id = 'LAB1_Fundation'
table_id = 'superstore'

# Construct a BigQuery client object.
client = bigquery.Client(project=project_id)

# Get the table object
table_ref = client.dataset(dataset_id).table(table_id)
table = client.get_table(table_ref)

# Extract schema information
schema_list = []
for field in table.schema:
    schema_list.append({
        'name': field.name,
        'field_type': field.field_type,
        'mode': field.mode,
        'description': field.description
    })

# Convert to Pandas DataFrame
schema_df = pd.DataFrame(schema_list)

# Display the schema DataFrame (optional, for verification)
print("Schema DataFrame created:")
# To see the output, run the code.


# --- 1. Clean the Column Names ---
# Create a 'clean_name' column with standard naming conventions:
# lowercase, with spaces and hyphens replaced by underscores.
schema_df['clean_name'] = schema_df['name'].str.lower().str.replace(' ', '_').str.replace('-', '_')


# --- 2. Generate the Aliases for the SELECT Clause ---
column_expressions = []
for index, row in schema_df.iterrows():
    original_name = row['name']
    clean_name = row['clean_name']

    # If the original name contains a space or special character, it needs to be
    # enclosed in backticks (`) in the SQL statement.
    if ' ' in original_name or '-' in original_name:
        expression = f'`{original_name}` AS {clean_name}'
    else:
        # If the name is already clean, we still alias it for consistency.
        expression = f'{original_name} AS {clean_name}'
    column_expressions.append(expression)

# Join all the individual column expressions into a single, formatted string.
select_clause = ",\n  ".join(column_expressions)


# --- 3. Construct the Final CREATE VIEW Statement ---
new_view_id = 'superstore_clean' # You can change this if you like

create_view_sql = f"""
CREATE OR REPLACE VIEW `{project_id}.{dataset_id}.{new_view_id}` AS
SELECT
  {select_clause}
FROM
  `{project_id}.{dataset_id}.{table_id}`;
"""

# --- 4. Print the Final SQL ---
print("\n--- Copy the SQL below and run it in your BigQuery Console ---")
print(create_view_sql)


# Execute the CREATE VIEW SQL query
try:
    query_job = client.query(create_view_sql)  # API request
    query_job.result()  # Waits for the query to finish
    print(f"\nView '{new_view_id}' created/replaced successfully in dataset '{dataset_id}'.")
except Exception as e:
    print(f"\nAn error occurred while creating the view: {e}")

# Now, let's run a SELECT query to verify the view content
print(f"\n--- First 10 rows from the new view '{new_view_id}' ---")
try:
    verify_query = f"""
    SELECT
        *
    FROM
        `{project_id}.{dataset_id}.{new_view_id}`
    LIMIT 10;
    """
    verify_job = client.query(verify_query)
    verify_results_df = verify_job.to_dataframe()
    display(verify_results_df)

except Exception as e:
    print(f"An error occurred while querying the view: {e}")

Schema DataFrame created:

--- Copy the SQL below and run it in your BigQuery Console ---

CREATE OR REPLACE VIEW `my-project-mgmt-467.LAB1_Fundation.superstore_clean` AS
SELECT
  `Row ID` AS row_id,
  `Order ID` AS order_id,
  `Order Date` AS order_date,
  `Ship Date` AS ship_date,
  `Ship Mode` AS ship_mode,
  `Customer ID` AS customer_id,
  `Customer Name` AS customer_name,
  Segment AS segment,
  Country AS country,
  City AS city,
  State AS state,
  `Postal Code` AS postal_code,
  Region AS region,
  `Product ID` AS product_id,
  Category AS category,
  `Sub-Category` AS sub_category,
  `Product Name` AS product_name,
  Sales AS sales,
  Quantity AS quantity,
  Discount AS discount,
  Profit AS profit
FROM
  `my-project-mgmt-467.LAB1_Fundation.superstore`;


View 'superstore_clean' created/replaced successfully in dataset 'LAB1_Fundation'.

--- First 10 rows from the new view 'superstore_clean' ---
row_id | order_id | order_date | ship_date | ship_mode | customer_id | customer_na

In [65]:
# This assumes your 'client' object from the previous cell is still active
# and correctly authenticated.

print("✅ Step 1: Defining the query string...")

query_string = """
SELECT
  order_id,
  customer_name,
  product_name,
  sales,
  profit
FROM
  `my-project-mgmt-467.LAB1_Fundation.superstore_clean`
LIMIT 10;
"""

print("✅ Step 2: Sending the query to BigQuery. This may take a moment...")

# Use a try-except block to catch potential errors
try:
    query_job = client.query(query_string)

    print("✅ Step 3: Waiting for query to complete and fetching results...")
    results_df = query_job.to_dataframe()

    print(f"✅ Step 4: Query finished. Found {len(results_df)} rows.")

    if results_df.empty:
        print("\n⚠️ The query ran successfully but returned an empty result. Please double-check that your 'superstore_clean' view exists and the original table has data.")
    else:
        print("\n--- Displaying Results ---")
        display(results_df)

except Exception as e:
    print(f"\n❌ An error occurred: {e}")

✅ Step 1: Defining the query string...
✅ Step 2: Sending the query to BigQuery. This may take a moment...
✅ Step 3: Waiting for query to complete and fetching results...
✅ Step 4: Query finished. Found 10 rows.

--- Displaying Results ---


Unnamed: 0,order_id,customer_name,product_name,sales,profit
0,CA-2015-154900,Sung Shariari,Avery 518,3.15,1.512
1,CA-2015-154900,Sung Shariari,Adams Telephone Message Book W/Dividers/Space ...,22.72,10.224
2,US-2016-152415,Patrick O'Donnell,"C-Line Magnetic Cubicle Keepers, Clear Polypro...",14.82,6.2244
3,US-2016-152415,Patrick O'Donnell,"Howard Miller 14-1/2"" Diameter Chrome Round Wa...",191.82,61.3824
4,CA-2016-153269,Pamela Stobb,"Personal Folder Holder, Ebony",11.21,3.363
5,CA-2016-153269,Pamela Stobb,"Situations Contoured Folding Chairs, 4/Set",354.9,88.725
6,CA-2016-153269,Pamela Stobb,Xerox 193,17.94,8.7906
7,CA-2016-153269,Pamela Stobb,GBC Binding covers,51.8,23.31
8,CA-2015-158792,Brian Dahlen,Staples,22.2,10.434
9,CA-2016-141082,Fred McMath,Avery 517,3.69,1.7343


## Part A — SQL Warm‑Up (SELECT, WHERE, ORDER BY, LIMIT, DISTINCT)
**Aim:** Build confidence with precise, unambiguous prompts that yield clean, runnable SQL.

### A1. Unique values (DISTINCT)
**Prompt (paste in Vertex AI):**
```
Act as a senior BigQuery analyst. Produce a **single runnable BigQuery SQL** (no commentary) for:
- Task: List all unique `Sub_Category` values sold in the 'West' region.
- Table: `mgmt-467-47888.lab1_foundation.superstore`
- Filter: `Region = 'West'`
- Output: a single column named `Sub_Category`
- Sort: alphabetically A→Z
- Add: `LIMIT 100` to control cost during exploration.
```
**Reflection:** Did the result match your expectations? If not, what ambiguity in your prompt might have caused the mismatch?

In [None]:
query_string = """
SELECT
    DISTINCT `Sub-Category` AS Sub_Category
FROM
    `mgmt-467-47888.lab1_foundation.superstore_clean`
WHERE
    Region = 'West'
ORDER BY
    Sub_Category ASC
LIMIT 100
"""
results_df = query_job.to_dataframe()
display(results_df)

Unnamed: 0,order_id,customer_name,product_name,sales,profit
0,CA-2015-154900,Sung Shariari,Avery 518,3.15,1.512
1,CA-2015-154900,Sung Shariari,Adams Telephone Message Book W/Dividers/Space ...,22.72,10.224
2,US-2016-152415,Patrick O'Donnell,"C-Line Magnetic Cubicle Keepers, Clear Polypro...",14.82,6.2244
3,US-2016-152415,Patrick O'Donnell,"Howard Miller 14-1/2"" Diameter Chrome Round Wa...",191.82,61.3824
4,CA-2016-153269,Pamela Stobb,"Personal Folder Holder, Ebony",11.21,3.363
5,CA-2016-153269,Pamela Stobb,"Situations Contoured Folding Chairs, 4/Set",354.9,88.725
6,CA-2016-153269,Pamela Stobb,Xerox 193,17.94,8.7906
7,CA-2016-153269,Pamela Stobb,GBC Binding covers,51.8,23.31
8,CA-2015-158792,Brian Dahlen,Staples,22.2,10.434
9,CA-2016-141082,Fred McMath,Avery 517,3.69,1.7343


### A2. Top‑N by metric (ORDER BY … DESC)
**Prompt:**
```
BigQuery SQL only.
Task: Return the top 10 customers by total profit.
Table: `mgmt-467-47888.lab_foundation.superstore`
Columns used: `Customer_ID`, `Profit`
Output columns: `Customer_ID`, `total_profit`
Logic: SUM Profit per customer, order by `total_profit` DESC
Add `LIMIT 10`.
```
**Tip:** If your schema uses different identifiers (e.g., `Customer Name`), restate column names explicitly.

In [64]:
query_string_a2 = """
SELECT
    customer_id,
    SUM(profit) AS total_profit
FROM
    `my-project-mgmt-467.LAB1_Fundation.superstore_clean`
GROUP BY
    customer_id
ORDER BY
    total_profit DESC
LIMIT 10;
"""

print("--- Query for A2: Top 10 Customers by Profit ---")


# Execute the query and display results
print("\n--- Results for A2 ---")
try:
    query_job_a2 = client.query(query_string_a2)  # API request
    results_df_a2 = query_job_a2.to_dataframe()
    display(results_df_a2)
except Exception as e:
    print(f"An error occurred while executing the query for A2: {e}")

--- Query for A2: Top 10 Customers by Profit ---

--- Results for A2 ---


Unnamed: 0,customer_id,total_profit
0,TC-20980,8981.3239
1,RB-19360,6976.0959
2,SC-20095,5757.4119
3,HL-15040,5622.4292
4,AB-10105,5444.8055
5,TA-21385,4703.7883
6,CM-12385,3899.8904
7,KD-16495,3038.6254
8,AR-10540,2884.6208
9,DR-12940,2869.076


### A3. Basic filtering (WHERE) + sanity checks
**Prompt:**
```
BigQuery SQL only.
Task: Count orders shipped with each `Ship_Mode`, but only for orders in the 'Technology' category.
Table: `[YOUR_PROJECT].superstore_data.sales`
Output: `Ship_Mode`, `order_count`
Logic: COUNT(*) grouped by `Ship_Mode`
Sort by `order_count` DESC
```
**Validation ask:** “Also list two quick sanity checks to verify the numbers.”

In [63]:
query_string_a3 = """
SELECT
    ship_mode,
    COUNT(*) AS order_count
FROM
    `my-project-mgmt-467.LAB1_Fundation.superstore_clean`
WHERE
    category = 'Technology'
GROUP BY
    ship_mode
ORDER BY
    order_count DESC;
"""

print("--- Query for A3: Order Count by Ship Mode for Technology Category ---")


# Execute the query and display results
print("\n--- Results for your query ---")
try:
    # Construct a new query job from the provided SQL string
    query_job = client.query(query_string)  # Your SQL query string goes here

    # Wait for the query to complete and get the results as a Pandas DataFrame
    results_df = query_job.to_dataframe()

    # Print the DataFrame to display the results
    print(results_df)

except Exception as e:
    # Catch and print any errors that occur during the query execution
    print(f"An error occurred while executing the query: {e}")

--- Query for A3: Order Count by Ship Mode for Technology Category ---

--- Results for your query ---
   Sub_Category  sales_2016  sales_2017
0   Furnishings         0.0         0.0
1       Storage         0.0         0.0
2        Chairs         0.0         0.0
3         Paper         0.0         0.0
4       Binders         0.0         0.0
5        Labels         0.0         0.0
6           Art         0.0         0.0
7    Appliances         0.0         0.0
8        Phones         0.0         0.0
9      Supplies         0.0         0.0
10       Tables         0.0         0.0
11    Bookcases         0.0         0.0
12    Fasteners         0.0         0.0
13  Accessories         0.0         0.0
14    Envelopes         0.0         0.0
15     Machines         0.0         0.0
16      Copiers         0.0         0.0


## Part B — Grouped Analytics (GROUP BY, HAVING)
**Aim:** Turn raw facts into grouped metrics and filtered aggregations.

### B1. KPI aggregation with WHERE + GROUP BY
**Prompt:**
```
BigQuery SQL only.
Task: Compute monthly revenue for the last 12 full months.
Table: `[YOUR_PROJECT].superstore_data.sales`
Assume: `Order_Date` is a DATE or TIMESTAMP column named exactly `Order_Date`.
Output: `year_month` (YYYY-MM format), `monthly_revenue`
Logic: Truncate date to month, SUM `Sales`, filter to last 12 full months.
Sort by `year_month` ascending.
Include a `LIMIT` safeguard for exploration.
```

In [None]:
# Assuming your client is initialized
# client = bigquery.Client()

query_string = """
WITH MaxDate AS (
    SELECT MAX(Order_Date) AS max_date FROM `my-project-mgmt-467.LAB1_Fundation.superstore_clean`
),
FilteredSales AS (
    SELECT
        Order_Date,
        Sales
    FROM
        `my-project-mgmt-467.LAB1_Fundation.superstore_clean`, MaxDate
    WHERE
        Order_Date >= DATE_TRUNC(DATE_SUB(MaxDate.max_date, INTERVAL 13 MONTH), MONTH)
        AND Order_Date < DATE_TRUNC(MaxDate.max_date, MONTH)
)
SELECT
    FORMAT_DATE('%Y-%m', DATE_TRUNC(Order_Date, MONTH)) AS year_month,
    SUM(Sales) AS monthly_revenue
FROM
    FilteredSales
GROUP BY
    year_month
ORDER BY
    year_month ASC
LIMIT 100
"""

query_job = client.query(query_string)
results_df = query_job.to_dataframe()
display(results_df)

Unnamed: 0,year_month,monthly_revenue
0,2016-11,79411.9658
1,2016-12,96999.043
2,2017-01,43971.374
3,2017-02,20301.1334
4,2017-03,58872.3528
5,2017-04,36521.5361
6,2017-05,44261.1102
7,2017-06,52981.7257
8,2017-07,45264.416
9,2017-08,63120.888


### B2. Post‑aggregation filter (HAVING)
**Prompt:**
```
BigQuery SQL only.
Task: Find sub-categories whose total profit over the entire dataset is negative.
Table: `[YOUR_PROJECT].superstore_data.sales`
Output: `Sub_Category`, `total_profit`
Logic: SUM `Profit` GROUP BY `Sub_Category`, HAVING SUM(Profit) < 0
Sort by `total_profit` ASC (most negative first).
```
**Why HAVING?** Ask the model to include a 1-sentence explanation of why HAVING is used instead of WHERE here.

In [62]:
query_string_b2 = """
SELECT
    sub_category,
    SUM(profit) AS total_profit
FROM
    `my-project-mgmt-467.LAB1_Fundation.superstore_clean`
GROUP BY
    sub_category
HAVING
    SUM(profit) < 0
ORDER BY
    total_profit ASC;
"""

print("--- Query for B2: Sub-Categories with Negative Total Profit ---")


# Execute the query and display results
print("\n--- Results for B2 ---")
try:
    query_job_b2 = client.query(query_string_b2)  # API request
    results_df_b2 = query_job_b2.to_dataframe()
    display(results_df_b2)
except Exception as e:
    print(f"An error occurred while executing the query for B2: {e}")

--- Query for B2: Sub-Categories with Negative Total Profit ---

--- Results for B2 ---


Unnamed: 0,sub_category,total_profit
0,Tables,-17725.4811
1,Bookcases,-3472.556
2,Supplies,-1189.0995


## Part C — Joins (dimension enrichment)
**Aim:** Use joins to enhance facts with attributes.

### C1. Join facts to a small dimension
*(If you have a customer or product dimension in your schema, use it. Otherwise, request a synthetic example.)*  
**Prompt:**
```
BigQuery SQL only.
Task: Join the sales table to a product dimension to report `Product_ID`, `Product_Name`, and total sales.
Tables: `[YOUR_PROJECT].superstore_data.sales` as s, `[YOUR_PROJECT].superstore_data.products` as p
Join key: `s.Product_ID = p.Product_ID`
Output: `Product_ID`, `Product_Name`, `total_sales`
Sort by `total_sales` DESC
```
**If you lack a dimension table:** Ask the model how to simulate one temporarily via a CTE.

In [61]:
query_string_c1 = """
WITH
  product_dimension AS (
    SELECT DISTINCT
      product_id,
      product_name
    FROM
      `my-project-mgmt-467.LAB1_Fundation.superstore_clean`
  )
SELECT
  s.product_id,
  p.product_name,
  SUM(s.sales) AS total_sales
FROM
  `my-project-mgmt-467.LAB1_Fundation.superstore_clean` AS s
JOIN
  product_dimension AS p
ON
  s.product_id = p.product_id
GROUP BY
  s.product_id,
  p.product_name
ORDER BY
  total_sales DESC
LIMIT 100; -- Added LIMIT for exploration as suggested in the notebook
"""

print("--- Query for C1: Total Sales by Product (Simulated Join) ---")


# Execute the query and display results
print("\n--- Results for C1 ---")
try:
    query_job_c1 = client.query(query_string_c1)  # API request
    results_df_c1 = query_job_c1.to_dataframe()
    display(results_df_c1)
except Exception as e:
    print(f"An error occurred while executing the query for C1: {e}")

--- Query for C1: Total Sales by Product (Simulated Join) ---

--- Results for C1 ---


Unnamed: 0,product_id,product_name,total_sales
0,TEC-CO-10004722,Canon imageCLASS 2200 Advanced Copier,61599.824
1,OFF-BI-10003527,Fellowes PB500 Electric Punch Plastic Comb Bin...,27453.384
2,TEC-MA-10002412,Cisco TelePresence System EX90 Videoconferenci...,22638.480
3,FUR-CH-10002024,HON 5400 Series Task Chairs for Big and Tall,21870.576
4,OFF-BI-10001359,GBC DocuBind TL300 Electric Binding System,19823.479
...,...,...,...
95,OFF-ST-10001526,Iceberg Mobile Mega Data/Printer Cart,5751.774
96,FUR-CH-10000595,Safco Contoured Stacking Chairs,5697.760
97,TEC-CO-10001766,Canon PC940 Copier,5669.874
98,FUR-TA-10004256,Bretford “Just In Time” Height-Adjustable Mult...,5634.900


## Part D — Common Table Expressions (CTEs)
**Aim:** Make complex logic readable and testable in steps.

### D1. Multi‑step ranking with CTEs
**Prompt:**
```
BigQuery SQL only.
Goal: Within each `Region`, rank states by total sales and return top 3 per region.
Table: `[YOUR_PROJECT].superstore_data.sales`
CTE 1 (`state_sales`): SUM(Sales) by `Region`, `State`
CTE 2 (`ranked_state_sales`): Add `RANK() OVER (PARTITION BY Region ORDER BY total_sales DESC)` as `sales_rank`
Final SELECT: rows where `sales_rank <= 3`
Output columns: `Region`, `State`, `total_sales`, `sales_rank`
Sort: by `Region`, then `sales_rank`
```
**Ask for**: a one-paragraph explanation of each step, then **provide only the final runnable SQL**.

In [60]:
query_string_d1 = """
WITH
  state_sales AS (
    SELECT
      region,
      state,
      SUM(sales) AS total_sales
    FROM
      `my-project-mgmt-467.LAB1_Fundation.superstore_clean`
    GROUP BY
      region,
      state
  ),
  ranked_state_sales AS (
    SELECT
      region,
      state,
      total_sales,
      RANK() OVER (PARTITION BY region ORDER BY total_sales DESC) AS sales_rank
    FROM
      state_sales
  )
SELECT
  region,
  state,
  total_sales,
  sales_rank
FROM
  ranked_state_sales
WHERE
  sales_rank <= 3
ORDER BY
  region,
  sales_rank;
"""

print("--- Query for D1: Top 3 States by Sales per Region ---")


# Execute the query and display results
print("\n--- Results for D1 ---")
try:
    query_job_d1 = client.query(query_string_d1)  # API request
    results_df_d1 = query_job_d1.to_dataframe()
    display(results_df_d1)
except Exception as e:
    print(f"An error occurred while executing the query for D1: {e}")

--- Query for D1: Top 3 States by Sales per Region ---

--- Results for D1 ---


Unnamed: 0,region,state,total_sales,sales_rank
0,Central,Texas,170188.0458,1
1,Central,Illinois,80166.101,2
2,Central,Michigan,76269.614,3
3,East,New York,310876.271,1
4,East,Pennsylvania,116511.914,2
5,East,Ohio,78258.136,3
6,South,Florida,89473.708,1
7,South,Virginia,70636.72,2
8,South,North Carolina,55603.164,3
9,West,California,457687.6315,1


### D2. Time‑boxed “most improved” analysis
**Prompt:**
```
BigQuery SQL only.
Goal: Identify the top 5 sub-categories with the largest YoY revenue increase from 2023 to 2024.
Table: `[YOUR_PROJECT].superstore_data.sales`
CTE `yr_sales`: SUM(Sales) by `Sub_Category` and `year` extracted from `Order_Date`
Final: pivot or self-join to compute delta (2024 minus 2023) as `yoy_delta`
Output: `Sub_Category`, `sales_2023`, `sales_2024`, `yoy_delta`
Order by `yoy_delta` DESC
Limit 5
```
**Validation:** Ask the model for two quick failure modes (e.g., missing years) and how to handle them.

In [None]:
from google.cloud import bigquery
client = bigquery.Client(project="my-project-mgmt-467")  # set project!
query_string_d2 = """
SELECT DISTINCT EXTRACT(YEAR FROM order_date) AS yr
FROM `my-project-mgmt-467.LAB1_Fundation.superstore_clean`
ORDER BY yr;
"""
client.query(query_string_d2).to_dataframe()

Unnamed: 0,yr
0,2014
1,2015
2,2016
3,2017


In [None]:
#only 2014-2017 contains data

In [None]:

query_string_d2 = """
-- Top 5 sub-categories with largest YoY revenue increase (2024 vs 2023)
WITH yr_sales AS (
  SELECT
    sub_category,
    EXTRACT(YEAR FROM order_date) AS yr,
    SUM(sales) AS sales
  FROM `my-project-mgmt-467.LAB1_Fundation.superstore_clean`
  WHERE EXTRACT(YEAR FROM order_date) IN (2023, 2024)
  GROUP BY sub_category, yr
),
wide AS (
  SELECT
    COALESCE(s23.sub_category, s24.sub_category) AS sub_category,
    IFNULL(s23.sales, 0) AS sales_2023,
    IFNULL(s24.sales, 0) AS sales_2024
  FROM (SELECT sub_category, sales FROM yr_sales WHERE yr = 2023) AS s23
  FULL OUTER JOIN (SELECT sub_category, sales FROM yr_sales WHERE yr = 2024) AS s24
  ON s23.sub_category = s24.sub_category
)
SELECT
  sub_category,
  sales_2023,
  sales_2024,
  sales_2024 - sales_2023 AS yoy_delta
FROM wide
ORDER BY yoy_delta DESC
LIMIT 5;
"""
client.query(query_string_d2).to_dataframe()

Unnamed: 0,sub_category,sales_2023,sales_2024,yoy_delta


In [None]:
from google.cloud import bigquery
client = bigquery.Client(project="my-project-mgmt-467")  # set project!
query_string_d2 = """
WITH
  yr_sales AS (
    SELECT
      sub_category,
      EXTRACT(YEAR FROM order_date) AS year,
      SUM(sales) AS yearly_revenue
    FROM
      `my-project-mgmt-467.LAB1_Fundation.superstore_clean`
    WHERE
      EXTRACT(YEAR FROM order_date) IN (2016, 2017) -- Filter for relevant years (using available years from data)
    GROUP BY
      1, 2
  )
SELECT
  ys2017.sub_category,
  ys2016.yearly_revenue AS sales_2016,
  ys2017.yearly_revenue AS sales_2017,
  ys2017.yearly_revenue - ys2016.yearly_revenue AS yoy_delta
FROM
  yr_sales AS ys2017
JOIN
  yr_sales AS ys2016
ON
  ys2017.sub_category = ys2016.sub_category
WHERE
  ys2017.year = 2017
  AND ys2016.year = 2016
ORDER BY
  yoy_delta DESC
LIMIT 5;
"""
client.query(query_string_d2).to_dataframe()

Unnamed: 0,sub_category,sales_2016,sales_2017,yoy_delta
0,Phones,78962.03,105340.516,26378.486
1,Binders,49683.325,72788.045,23104.72
2,Accessories,41895.854,59946.232,18050.378
3,Appliances,26050.315,42926.932,16876.617
4,Copiers,49599.41,62899.388,13299.978


## Part E — Window Functions (ROW_NUMBER, RANK, DENSE_RANK, LAG/LEAD, moving averages)
**Aim:** Compare rows across partitions and time; compute trends and ranks without collapsing rows.

### E1. Top product per region (ROW_NUMBER)
**Prompt:**
```
BigQuery SQL only.
Task: For each `Region`, return only the single highest-revenue `Sub_Category`.
Table: `[YOUR_PROJECT].superstore_data.sales`
CTE `subcat_sales`: SUM(Sales) by `Region`, `Sub_Category`
Add `ROW_NUMBER() OVER (PARTITION BY Region ORDER BY total_sales DESC)` as rn
Final: filter `rn = 1`
Output: `Region`, `Sub_Category`, `total_sales`
Sort by `Region`
```
**Why `ROW_NUMBER` instead of `RANK`?** Ask the model to add a 2-sentence contrast.

In [59]:
from google.cloud import bigquery
client = bigquery.Client(project="my-project-mgmt-467") # Replace with your project ID

query_string_e1 = """
WITH
  subcat_sales AS (
    SELECT
      region,
      sub_category,
      SUM(sales) AS total_sales
    FROM
      `my-project-mgmt-467.LAB1_Fundation.superstore_clean`
    GROUP BY
      region,
      sub_category
  ),
  ranked_subcat_sales AS (
    SELECT
      region,
      sub_category,
      total_sales,
      ROW_NUMBER() OVER (PARTITION BY region ORDER BY total_sales DESC) AS rn
    FROM
      subcat_sales
  )
SELECT
  region,
  sub_category,
  total_sales
FROM
  ranked_subcat_sales
WHERE
  rn = 1
ORDER BY
  region;
"""

print("--- Query for E1: Top Product per Region ---")


# Execute the query and display results
print("\n--- Results for E1 ---")
try:
    query_job_e1 = client.query(query_string_e1)  # API request
    results_df_e1 = query_job_e1.to_dataframe()
    display(results_df_e1)
except Exception as e:
    print(f"An error occurred while executing the query for E1: {e}")

--- Query for E1: Top Product per Region ---

--- Results for E1 ---


Unnamed: 0,region,sub_category,total_sales
0,Central,Chairs,85230.646
1,East,Phones,100614.982
2,South,Phones,58304.438
3,West,Chairs,101781.328


### E2. YoY growth with LAG
**Prompt:**
```
BigQuery SQL only.
Task: Compute year-over-year revenue growth for 'Phones' sub-category.
Table: `[YOUR_PROJECT].superstore_data.sales`
Steps:
- Filter to `Sub_Category = 'Phones'`
- Aggregate yearly revenue using EXTRACT(YEAR FROM Order_Date)
- Add `LAG(yearly_revenue) OVER (ORDER BY year)` as `prev_revenue`
- Compute `yoy_pct = 100.0 * (yearly_revenue - prev_revenue) / prev_revenue`
Output: `year`, `yearly_revenue`, `prev_revenue`, `yoy_pct`
Sort by `year` ASC
```
**Ask for**: a guard against divide-by-zero or NULL previous year.

In [58]:
from google.cloud import bigquery
client = bigquery.Client(project="my-project-mgmt-467") # Replace with your project ID

query_string_e2 = """
WITH
  yearly_sales AS (
    SELECT
      EXTRACT(YEAR FROM order_date) AS year,
      SUM(sales) AS yearly_revenue
    FROM
      `my-project-mgmt-467.LAB1_Fundation.superstore_clean`
    WHERE
      sub_category = 'Phones'
    GROUP BY
      year
  ),
  lagged_sales AS (
    SELECT
      year,
      yearly_revenue,
      LAG(yearly_revenue) OVER (ORDER BY year) AS prev_revenue
    FROM
      yearly_sales
  )
SELECT
  year,
  yearly_revenue,
  prev_revenue,
  SAFE_DIVIDE(100.0 * (yearly_revenue - prev_revenue), prev_revenue) AS yoy_pct
FROM
  lagged_sales
ORDER BY
  year ASC;
"""

print("--- Query for E2: YoY Revenue Growth for 'Phones' ---")


# Execute the query and display results
print("\n--- Results for E2 ---")
try:
    query_job_e2 = client.query(query_string_e2)  # API request
    results_df_e2 = query_job_e2.to_dataframe()
    display(results_df_e2)
except Exception as e:
    print(f"An error occurred while executing the query for E2: {e}")

--- Query for E2: YoY Revenue Growth for 'Phones' ---

--- Results for E2 ---


Unnamed: 0,year,yearly_revenue,prev_revenue,yoy_pct
0,2014,77390.806,,
1,2015,68313.702,77390.806,-11.728918
2,2016,78962.03,68313.702,15.587397
3,2017,105340.516,78962.03,33.406545


### E3. 3‑month moving average (MA)
**Prompt:**
```
BigQuery SQL only.
Task: For the 'Corporate' segment, compute a 3-month moving average of monthly revenue.
Table: `[YOUR_PROJECT].superstore_data.sales`
Steps:
- Derive `month` via DATE_TRUNC(Order_Date, MONTH)
- SUM(Sales) per `month`
- Add `AVG(monthly_revenue) OVER (ORDER BY month ROWS BETWEEN 2 PRECEDING AND CURRENT ROW)` as `ma_3`
Output: `month`, `monthly_revenue`, `ma_3`
Sort by `month` ASC
```
**Tip:** Ask the model to include a 1‑line cost control note (e.g., restrict date range while iterating).

In [57]:
from google.cloud import bigquery
client = bigquery.Client(project="my-project-mgmt-467") # Replace with your project ID

query_string_e3 = """
WITH
  monthly_sales AS (
    SELECT
      DATE_TRUNC(order_date, MONTH) AS month,
      SUM(sales) AS monthly_revenue
    FROM
      `my-project-mgmt-467.LAB1_Fundation.superstore_clean`
    WHERE
      segment = 'Corporate'
    GROUP BY
      month
  )
SELECT
  month,
  monthly_revenue,
  AVG(monthly_revenue) OVER (ORDER BY month ROWS BETWEEN 2 PRECEDING AND CURRENT ROW) AS ma_3
FROM
  monthly_sales
ORDER BY
  month ASC;
"""

print("--- Query for E3: 3-Month Moving Average for Corporate Segment ---")


# Execute the query and display results
print("\n--- Results for E3 ---")
try:
    query_job_e3 = client.query(query_string_e3)  # API request
    results_df_e3 = query_job_e3.to_dataframe()
    display(results_df_e3)
except Exception as e:
    print(f"An error occurred while executing the query for E3: {e}")

--- Query for E3: 3-Month Moving Average for Corporate Segment ---

--- Results for E3 ---


Unnamed: 0,month,monthly_revenue,ma_3
0,2014-01-01,1701.528,1701.528
1,2014-02-01,1183.668,1442.598
2,2014-03-01,11106.799,4663.998333
3,2014-04-01,14131.729,8807.398667
4,2014-05-01,9142.0,11460.176
5,2014-06-01,3970.914,9081.547667
6,2014-07-01,10032.988,7715.300667
7,2014-08-01,7451.774,7151.892
8,2014-09-01,15507.745,10997.502333
9,2014-10-01,12637.678,11865.732333


## Part F — Debugging & Optimization Prompts
**Aim:** Use the model as a rubber duck for error handling and performance.

### F1. Explain the error, propose a fix
**Prompt:**
```
I ran this BigQuery SQL and got an error:
[PASTE ERROR MESSAGE and the exact SQL here]
Act as a BigQuery trouble‑shooter.
1) Identify the root cause.
2) Propose the smallest possible fix.
3) Suggest a quick sanity check query to verify the fix.
Return only the corrected SQL and a 2‑sentence rationale.
```

In [56]:
#IndentationError: unexpected indent with product_name,
#Fix by using one consistent variable name everywhere (e.g., query_string) and keeping the entire SQL inside the triple quotes before calling client.query(query_string).
#
from google.cloud import bigquery
client = bigquery.Client(project="my-project-mgmt-467") # Replace with your project ID

query_string_e2 = """

SELECT
    product_name,
    SUM(sales) AS total_sales
FROM
    `my-project-mgmt-467.LAB1_Fundation.superstore_clean`
GROUP BY
    product_name
ORDER BY
    total_sales DESC
LIMIT 10
"""

print("--- Query for E2: Top 10 Products by Sales ---")


# Execute the query and display results
print("\n--- Results for E2 ---")
try:
    query_job_e2 = client.query(query_string_e2)  # API request
    results_df_e2 = query_job_e2.to_dataframe()
    display(results_df_e2)
except Exception as e:
    print(f"An error occurred while executing the query for E2: {e}")

--- Query for E2: Top 10 Products by Sales ---

--- Results for E2 ---


Unnamed: 0,product_name,total_sales
0,Canon imageCLASS 2200 Advanced Copier,61599.824
1,Fellowes PB500 Electric Punch Plastic Comb Bin...,27453.384
2,Cisco TelePresence System EX90 Videoconferenci...,22638.48
3,HON 5400 Series Task Chairs for Big and Tall,21870.576
4,GBC DocuBind TL300 Electric Binding System,19823.479
5,GBC Ibimaster 500 Manual ProClick Binding System,19024.5
6,Hewlett Packard LaserJet 3310 Copier,18839.686
7,HP Designjet T520 Inkjet Large Format Printer ...,18374.895
8,GBC DocuBind P400 Electric Binding System,17965.068
9,High Speed Automatic Electric Letter Opener,17030.312


### F2. Reduce cost / improve speed
**Prompt:**
```
Act as a BigQuery cost optimizer.
Given this query (below), list 3 ways to reduce scanned bytes and improve performance without changing the business logic.
[PASTE YOUR SQL HERE]
Prioritize: partition filters, column pruning, pre-aggregations, and temporary results via CTEs.
```

In [48]:


from google.colab import auth
auth.authenticate_user()

from google.cloud import bigquery
client = bigquery.Client(project="my-project-mgmt-467")


query_string = """
WITH filtered AS (
  SELECT order_date, Sub_Category, sales
  FROM `my-project-mgmt-467.LAB1_Fundation.superstore_clean` -- Replace with your actual table/view path
  WHERE order_date >= DATE '2016-01-01'
    AND order_date <  DATE '2018-01-01'   -- better partition pruning than BETWEEN
)
SELECT Sub_Category,
       SUM(CASE WHEN EXTRACT(YEAR FROM order_date)=2023 THEN sales ELSE 0 END) AS sales_2016,
       SUM(CASE WHEN EXTRACT(YEAR FROM order_date)=2024 THEN sales ELSE 0 END) AS sales_2017
FROM filtered
GROUP BY Sub_Category;
"""

print("--- Query for Optimized YoY Analysis ---")


# Execute the query and display results
print("\n--- Results ---")
try:
    query_job = client.query(query_string)  # API request
    results_df = query_job.to_dataframe()
    display(results_df)
except Exception as e:
    print(f"An error occurred while executing the query: {e}")

--- Query for Optimized YoY Analysis ---

--- Results ---


Unnamed: 0,Sub_Category,sales_2016,sales_2017
0,Furnishings,0.0,0.0
1,Storage,0.0,0.0
2,Chairs,0.0,0.0
3,Paper,0.0,0.0
4,Binders,0.0,0.0
5,Labels,0.0,0.0
6,Art,0.0,0.0
7,Appliances,0.0,0.0
8,Phones,0.0,0.0
9,Supplies,0.0,0.0


In [47]:
# --- Auth & client ---
from google.colab import auth
auth.authenticate_user()

from google.cloud import bigquery
import pandas as pd

PROJECT = "my-project-mgmt-467"
DATASET = "LAB1_Fundation"
TABLE   = "superstore_clean"


YEAR_A = 2016
YEAR_B = 2017

client = bigquery.Client(project=PROJECT)

query = f"""
WITH filtered AS (
  SELECT
    DATE(Order_Date) AS order_date,     -- 如果你的列名是 `Order Date`（有空格），改为 DATE(`Order Date`)
    Sub_Category,
    Sales AS sales
  FROM `{PROJECT}.{DATASET}.{TABLE}`
  WHERE DATE(Order_Date) >= DATE '{YEAR_A}-01-01'
    AND DATE(Order_Date) <  DATE '{YEAR_B + 1}-01-01'
)
SELECT
  Sub_Category,
  SUM(CASE WHEN EXTRACT(YEAR FROM order_date) = {YEAR_A} THEN sales ELSE 0 END)
  """

print("--- Query for Optimized YoY Analysis ---")


# Execute the query and display results
print("\n--- Results ---")
try:
    query_job = client.query(query_string)  # API request
    results_df = query_job.to_dataframe()
    display(results_df)
except Exception as e:
    print(f"An error occurred while executing the query: {e}")


--- Query for Optimized YoY Analysis ---

--- Results ---


Unnamed: 0,Sub_Category,sales_2016,sales_2017
0,Furnishings,0.0,0.0
1,Storage,0.0,0.0
2,Chairs,0.0,0.0
3,Paper,0.0,0.0
4,Binders,0.0,0.0
5,Labels,0.0,0.0
6,Art,0.0,0.0
7,Appliances,0.0,0.0
8,Phones,0.0,0.0
9,Supplies,0.0,0.0


In [45]:

query = f"""

WITH filtered AS (
  SELECT order_date, Sub_Category, sales
  FROM `project.ds.mv_sales_by_day` -- NOTE: This table/view name might be incorrect.
  WHERE order_date >= DATE '2023-01-01'
    AND order_date <  DATE '2025-01-01'
),
by_year AS (
  SELECT Sub_Category,
         EXTRACT(YEAR FROM order_date) AS year,
         SUM(sales) AS sales
  FROM filtered
  GROUP BY Sub_Category, year
)
SELECT Sub_Category,
       SUM(CASE WHEN year=2023 THEN sales ELSE 0 END) AS sales_2023,
       SUM(CASE WHEN year=2024 THEN sales ELSE 0 END) AS sales_2024
FROM by_year
GROUP BY Sub_Category;
"""


# Execute the query and display results
print("\n--- Results ---")
try:
    query_job = client.query(query_string)  # API request
    results_df = query_job.to_dataframe()
    display(results_df)
except Exception as e:
    print(f"An error occurred while executing the query: {e}")


--- Results ---


Unnamed: 0,Sub_Category,sales_2016,sales_2017
0,Furnishings,0.0,0.0
1,Storage,0.0,0.0
2,Chairs,0.0,0.0
3,Paper,0.0,0.0
4,Binders,0.0,0.0
5,Labels,0.0,0.0
6,Art,0.0,0.0
7,Appliances,0.0,0.0
8,Phones,0.0,0.0
9,Supplies,0.0,0.0


## Part G — Validation & Counter‑examples (DIVE: Validate)
**Aim:** Avoid “first‑answer fallacy” by testing alternatives.

### G1. Ask for counter‑queries
**Prompt:**
```
I concluded that 'Tables' is a high‑sales but negative‑profit sub-category due to high discounts.
Create two alternative BigQuery SQL queries that could falsify or nuance this finding:
- One that slices by region and time
- One that controls for order priority or ship mode
Return BigQuery SQL only, then a one-paragraph note on how to compare outcomes.
```

In [43]:
# Alternative Query 1: Slice by Region and Time
from google.colab import auth
auth.authenticate_user()

from google.cloud import bigquery
import pandas as pd
query_alt1 = """
SELECT
    EXTRACT(YEAR FROM order_date) AS order_year,
    region,
    SUM(sales) AS total_sales,
    SUM(profit) AS total_profit,
    AVG(discount) AS average_discount
FROM
    `my-project-mgmt-467.LAB1_Fundation.superstore_clean`
WHERE
    sub_category = 'Tables'
GROUP BY
    order_year,
    region
ORDER BY
    order_year,
    region;
"""

print("--- Alternative Query 1: Tables Sales/Profit/Discount by Year and Region ---")

# Execute the query and display results
from google.cloud import bigquery
client = bigquery.Client(project="my-project-mgmt-467") # Replace with your project ID

print("\n--- Results for Alternative Query 1 ---")
try:
    query_job_alt1 = client.query(query_alt1)  # API request
    results_df_alt1 = query_job_alt1.to_dataframe()
    display(results_df_alt1)
except Exception as e:
    print(f"An error occurred while executing Alternative Query 1: {e}")

--- Alternative Query 1: Tables Sales/Profit/Discount by Year and Region ---

--- Results for Alternative Query 1 ---


Unnamed: 0,order_year,region,total_sales,total_profit,average_discount
0,2014,Central,7785.478,-1424.331,0.326667
1,2014,East,10603.704,-3537.8375,0.38
2,2014,South,9940.9445,1107.9902,0.113636
3,2014,West,17758.239,730.1356,0.208
4,2015,Central,6857.26,-265.0939,0.207143
5,2015,East,8884.806,-2275.8641,0.373333
6,2015,South,7370.6745,-2171.3765,0.21875
7,2015,West,16037.683,1202.5326,0.166667
8,2016,Central,13922.926,292.6211,0.205882
9,2016,East,7825.328,-2306.7783,0.368182


In [44]:
# Alternative Query 2: Slice by Ship Mode
from google.colab import auth
auth.authenticate_user()

from google.cloud import bigquery
import pandas as pd
query_alt2 = """
SELECT
    ship_mode,
    SUM(sales) AS total_sales,
    SUM(profit) AS total_profit,
    AVG(discount) AS average_discount,
    COUNT(*) as order_count
FROM
    `my-project-mgmt-467.LAB1_Fundation.superstore_clean`
WHERE
    sub_category = 'Tables'
GROUP BY
    ship_mode
ORDER BY
    total_sales DESC;
"""

print("--- Alternative Query 2: Tables Sales/Profit/Discount by Ship Mode ---")


# Execute the query and display results
from google.cloud import bigquery
client = bigquery.Client(project="my-project-mgmt-467") # Replace with your project ID

print("\n--- Results for Alternative Query 2 ---")
try:
    query_job_alt2 = client.query(query_alt2)  # API request
    results_df_alt2 = query_job_alt2.to_dataframe()
    display(results_df_alt2)
except Exception as e:
    print(f"An error occurred while executing Alternative Query 2: {e}")

--- Alternative Query 2: Tables Sales/Profit/Discount by Ship Mode ---

--- Results for Alternative Query 2 ---


Unnamed: 0,ship_mode,total_sales,total_profit,average_discount,order_count
0,Standard Class,124826.6615,-11910.0122,0.270526,190
1,Second Class,43693.7475,-3320.6799,0.248361,61
2,First Class,28800.776,-1365.3665,0.240426,47
3,Same Day,9644.347,-1129.4225,0.261905,21


**How to compare outcomes:**

Compare the results of these queries to your initial finding. Alternative Query 1 will show if the negative profit is consistent across different regions and years, or if it's concentrated in specific areas or time periods. Alternative Query 2 will reveal if certain shipping modes, which might correlate with urgency or handling costs, have a different impact on the profitability of 'Tables'. Look for variations in total profit and average discount across these different slices of the data to see if they support or contradict your initial conclusion.

## Part H — Synthesis (DIVE: Extend)
**Aim:** Turn analysis into business‑ready insights.

### H1. Executive‑style summary
**Prompt:**
```
Act as a business strategist.
Based on the following metrics/figures (briefly summarize your results here), write a 4-sentence executive summary:
- 1 sentence: what changed and by how much
- 1 sentence: why it likely changed (drivers)
- 1 sentence: recommended action (who/what/when)
- 1 sentence: metric to monitor next
```

What changed and by how much: Tables swung to a 17.7K loss on $207K sales (–8.6% margin), with losses concentrated in Standard Class and certain East year-region slices.

Why it likely changed (drivers): A ship-mode mix skewed to Standard Class for bulky items and higher East-region discounts (~0.37) likely eroded margins more than pricing alone.

Recommended action (who/what/when): Ops + Pricing (within 30 days)—cap/reprice costly Standard Class lanes for Tables, run a ≤20% discount pilot in East, and re-route bulky shipments to lower-cost modes.

Metric to monitor next: Weekly Tables contribution margin by ship mode at ≤20% discount, plus profit per order in East vs. other regions. **bold text**

### H2. Convert final SQL into an automated job (optional)
**Prompt (use only after your SQL is final):**
```
Convert my final BigQuery SQL into a Python script that can run as a scheduled job from Colab or Cloud Functions.
Requirements:
- Use python‑bigquery client
- Parameterize date range
- Write results to a destination table `[YOUR_PROJECT].analytics.outputs_kpi`
- Add basic error handling & logging
Return one complete runnable script.
```

---
## Submission checklist
- [ ] Kept prompts precise and reproducible  
- [ ] Captured at least **one** CTE query and **one** window function query  
- [ ] Documented **two** validation attempts (counter‑queries or alternate slice)  
- [ ] Wrote a 4‑sentence executive summary based on results  
- [ ] (Optional) Converted final query into a scheduled job
---