# Unit 1 - The Look e-commerce analytics (DIVE + dashboards)


---

In [42]:
### Import necesarry libraries ###
import pandas as pd


In [43]:
# Step 1: Authenticate
from google.colab import auth
auth.authenticate_user()

# Step 2: Import and initialize BigQuery client with your project
from google.cloud import bigquery
client = bigquery.Client(project="mgmt467-472519")   # 👈 use your Project ID here

print("✅ Connected to project:", client.project)

✅ Connected to project: mgmt467-472519


## Discover: top growth KPIs and SQL patterns

### Recommended top 3 growth KPIs
- **Revenue trend (90-day rolling):** Detect directional growth and seasonality.  
- **Repeat purchase rate (RPR):** Percent of distinct customers with 2+ orders in a lookback window.  
- **Average order value (AOV):** Revenue per order (helps margin/growth strategy).

### BigQuery SQL: CTE + window functions for trends, MoM, YoY






## Prompt Cells for Revenue Query

#### Prompt 1:

You’re the analytics team for “The Look,” an online retailer. Leadership wants a fast, cost-effective analytics pack that surfaces growth opportunities and risks and proposes specific, near-term actions. You need to query the BigQuery The Look eCommerce dataset for revenue trends with a rolling 90 day average, a month over month (MoM) average, and a year over year (YoY) average. Please make sure to use BigQuery SQL and utilze a common table expression (CTE) and window functions like LAG().

#### Response

```sql
-- Revenue trend with 90-day rolling, MoM, YoY
WITH daily_revenue AS (
  SELECT
    DATE(date_created_at) AS order_date,
    SUM(sale_price) AS revenue
  FROM `bigquery-public-data.thelook_ecommerce.order_items`
  GROUP BY order_date
),
monthly_revenue AS (
  SELECT
    FORMAT_DATE('%Y-%m', order_date) AS ym,
    DATE_TRUNC(order_date, MONTH) AS month_start,
    SUM(revenue) AS monthly_rev
  FROM daily_revenue
  GROUP BY ym, month_start
),
metrics AS (
  SELECT
    ym,
    month_start,
    monthly_rev,
    LAG(monthly_rev, 1) OVER (ORDER BY month_start) AS prev_month_rev,
    LAG(monthly_rev, 12) OVER (ORDER BY month_start) AS prev_year_rev,
    SUM(monthly_rev) OVER (ORDER BY month_start ROWS BETWEEN 2 PRECEDING AND CURRENT ROW) AS rolling_3m_rev
  FROM monthly_revenue
)
SELECT
  ym,
  monthly_rev,
  rolling_3m_rev,
  SAFE_DIVIDE(monthly_rev - prev_month_rev, prev_month_rev) AS mom_growth,
  SAFE_DIVIDE(monthly_rev - prev_year_rev, prev_year_rev) AS yoy_growth
FROM metrics
ORDER BY month_start;
```

#### My Edits

The query it gave me used `date_created_at`, which is not a valid column in the table. So, I went to BigQuery and checked the tables to find the equivalent column, which was `created_at`.


In [44]:
revenue_query = """
-- Revenue trend with 90-day rolling, MoM, YoY
WITH daily_revenue AS (
  SELECT
    DATE(created_at) AS order_date,
    SUM(sale_price) AS revenue
  FROM `bigquery-public-data.thelook_ecommerce.order_items`
  GROUP BY order_date
),
monthly_revenue AS (
  SELECT
    FORMAT_DATE('%Y-%m', order_date) AS ym,
    DATE_TRUNC(order_date, MONTH) AS month_start,
    SUM(revenue) AS monthly_rev
  FROM daily_revenue
  GROUP BY ym, month_start
),
metrics AS (
  SELECT
    ym,
    month_start,
    monthly_rev,
    LAG(monthly_rev, 1) OVER (ORDER BY month_start) AS prev_month_rev,
    LAG(monthly_rev, 12) OVER (ORDER BY month_start) AS prev_year_rev,
    SUM(monthly_rev) OVER (ORDER BY month_start ROWS BETWEEN 2 PRECEDING AND CURRENT ROW) AS rolling_3m_rev
  FROM monthly_revenue
)
SELECT
  ym,
  monthly_rev,
  rolling_3m_rev,
  SAFE_DIVIDE(monthly_rev - prev_month_rev, prev_month_rev) AS mom_growth,
  SAFE_DIVIDE(monthly_rev - prev_year_rev, prev_year_rev) AS yoy_growth
FROM metrics
ORDER BY month_start;

"""

revenue_df = client.query(revenue_query).to_dataframe()
revenue_df.head(20)

Unnamed: 0,ym,monthly_rev,rolling_3m_rev,mom_growth,yoy_growth
0,2019-01,857.119996,857.119996,,
1,2019-02,1348.980002,2206.099998,0.573852,
2,2019-03,3321.39,5527.489999,1.462149,
3,2019-04,3667.850005,8338.220008,0.104312,
4,2019-05,6174.210005,13163.45001,0.683332,
5,2019-06,10591.240015,20433.300025,0.7154,
6,2019-07,10449.090024,27214.540043,-0.013421,
7,2019-08,13608.110003,34648.440041,0.302325,
8,2019-09,12953.890007,37011.090034,-0.048076,
9,2019-10,14142.710057,40704.710067,0.091773,


## Prompt Cells for Repeat Purchase Query

#### Prompt 1:

You’re the analytics team for “The Look,” an online retailer. Leadership wants a fast, cost-effective analytics pack that surfaces growth opportunities and risks and proposes specific, near-term actions. You need to query the BigQuery The Look eCommerce dataset for the repeat purchase rate (RPR) over the last 90 days. Please use CTE's and BigQuery SQL

Response

```sql
-- Repeat purchase rate (RPR) over last 90 days
WITH orders_90 AS (
  SELECT
    o.user_id,
    DATE(o.created_at) AS order_date,
    o.order_id
  FROM `bigquery-public-data.thelook_ecommerce.orders` o
  WHERE o.created_at >= TIMESTAMP_SUB(CURRENT_TIMESTAMP(), INTERVAL 90 DAY)
),
by_customer AS (
  SELECT
    user_id,
    COUNT(DISTINCT order_id) AS orders_90
  FROM orders_90
  GROUP BY user_id
)
SELECT
  SAFE_DIVIDE(SUM(CASE WHEN orders_90 >= 2 THEN 1 ELSE 0 END), COUNT(*)) AS repeat_purchase_rate_90d
FROM by_customer;
```


In [45]:

repeat_query = """
-- Repeat purchase rate (RPR) over last 90 days
WITH orders_90 AS (
  SELECT
    o.user_id,
    DATE(o.created_at) AS order_date,
    o.order_id
  FROM `bigquery-public-data.thelook_ecommerce.orders` o
  WHERE o.created_at >= TIMESTAMP_SUB(CURRENT_TIMESTAMP(), INTERVAL 90 DAY)
),
by_customer AS (
  SELECT
    user_id,
    COUNT(DISTINCT order_id) AS orders_90
  FROM orders_90
  GROUP BY user_id
)
SELECT
  SAFE_DIVIDE(SUM(CASE WHEN orders_90 >= 2 THEN 1 ELSE 0 END), COUNT(*)) AS repeat_purchase_rate_90d
FROM by_customer;

"""

repeat_df = client.query(repeat_query).to_dataframe()
repeat_df.head()


Unnamed: 0,repeat_purchase_rate_90d
0,0.185221


## Prompt Cells for Average Order Value Query

#### Prompt 1:

You’re the analytics team for “The Look,” an online retailer. Leadership wants a fast, cost-effective analytics pack that surfaces growth opportunities and risks and proposes specific, near-term actions. You need to query the BigQuery The Look eCommerce dataset for the monthlt average order value (AOV) with month over omth caluclations (MoM) using BigQuery SQL.

Response

```sql
-- Average order value (AOV) monthly with MoM
WITH order_level AS (
  SELECT
    oi.order_id,
    DATE_TRUNC(DATE(oi.created_at), MONTH) AS month_start,
    SUM(oi.sale_price) AS order_revenue
  FROM `bigquery-public-data.thelook_ecommerce.order_items` oi
  GROUP BY order_id, month_start
),
monthly_aov AS (
  SELECT
    month_start,
    SUM(order_revenue) AS revenue,
    COUNT(DISTINCT order_id) AS orders
  FROM order_level
  GROUP BY month_start
)
SELECT
  FORMAT_DATE('%Y-%m', month_start) AS ym,
  SAFE_DIVIDE(revenue, orders) AS aov,
  LAG(SAFE_DIVIDE(revenue, orders)) OVER (ORDER BY month_start) AS prev_aov,
  SAFE_DIVIDE(SAFE_DIVIDE(revenue, orders) - LAG(SAFE_DIVIDE(revenue, orders)) OVER (ORDER BY month_start),
              LAG(SAFE_DIVIDE(revenue, orders)) OVER (ORDER BY month_start)) AS mom_aov_growth
FROM monthly_aov
ORDER BY month_start;
```


In [46]:
average_query = """
-- Average order value (AOV) monthly with MoM
WITH order_level AS (
  SELECT
    oi.order_id,
    DATE_TRUNC(DATE(oi.created_at), MONTH) AS month_start,
    SUM(oi.sale_price) AS order_revenue
  FROM `bigquery-public-data.thelook_ecommerce.order_items` oi
  GROUP BY order_id, month_start
),
monthly_aov AS (
  SELECT
    month_start,
    SUM(order_revenue) AS revenue,
    COUNT(DISTINCT order_id) AS orders
  FROM order_level
  GROUP BY month_start
)
SELECT
  FORMAT_DATE('%Y-%m', month_start) AS ym,
  SAFE_DIVIDE(revenue, orders) AS aov,
  LAG(SAFE_DIVIDE(revenue, orders)) OVER (ORDER BY month_start) AS prev_aov,
  SAFE_DIVIDE(SAFE_DIVIDE(revenue, orders) - LAG(SAFE_DIVIDE(revenue, orders)) OVER (ORDER BY month_start),
              LAG(SAFE_DIVIDE(revenue, orders)) OVER (ORDER BY month_start)) AS mom_aov_growth
FROM monthly_aov
ORDER BY month_start;

"""
average_df = client.query(average_query).to_dataframe()
average_df.head(20)

Unnamed: 0,ym,aov,prev_aov,mom_aov_growth
0,2019-01,95.235555,,
1,2019-02,67.449,95.235555,-0.291767
2,2019-03,73.808667,67.449,0.094289
3,2019-04,69.204717,73.808667,-0.062377
4,2019-05,88.203,69.204717,0.274523
5,2019-06,102.827573,88.203,0.165806
6,2019-07,77.978284,102.827573,-0.24166
7,2019-08,88.941895,77.978284,0.140598
8,2019-09,78.035482,88.941895,-0.122624
9,2019-10,81.279943,78.035482,0.041577


---

## Investigate: one product category and one customer segment

 **Category deep dive (e.g., “Women’s Apparel” or “Accessories”):**  
  - **Drivers:** discounts, region, device, potential marketing channel if present.  
  - **Output:** revenue, orders, AOV, discount penetration, conversion proxy.


# Product Query Prompts

### Prompt 1
For the Investigate section, please tailor the product category SQL query to the "Active" category. Please look for driver metrics like discounts, device, and other potential marketing channels. Additionally, ensure that our output has metrics like revenue, orders, average order value (AOV), and conversion proxy

### Response
```sql
-- Investigate drivers for the "Active" category
WITH items AS (
  SELECT
    oi.order_id,
    oi.sale_price,
    oi.retail_price,
    SAFE_DIVIDE(oi.sale_price, oi.retail_price) AS discount_ratio
  FROM `bigquery-public-data.thelook_ecommerce.order_items` oi
  WHERE oi.category = 'Active'
),
orders AS (
  SELECT
    o.order_id,
    o.user_id,
    o.country AS region,
    o.device,
    -- Some versions of the dataset include marketing_channel; if not, this will error out
    -- You can comment/remove this field if unavailable
    o.traffic_source AS marketing_channel
  FROM `bigquery-public-data.thelook_ecommerce.orders` o
)
SELECT
  o.region,
  o.device,
  o.marketing_channel,
  COUNT(DISTINCT i.order_id) AS orders,
  COUNT(DISTINCT o.user_id) AS unique_customers,
  SUM(i.sale_price) AS revenue,
  SAFE_DIVIDE(SUM(i.sale_price), COUNT(DISTINCT i.order_id)) AS aov,
  AVG(i.discount_ratio) AS avg_discount_ratio,
  -- Conversion proxy: orders per unique customer
  SAFE_DIVIDE(COUNT(DISTINCT i.order_id), COUNT(DISTINCT o.user_id)) AS orders_per_customer
FROM items i
JOIN orders o
  ON i.order_id = o.order_id
GROUP BY o.region, o.device, o.marketing_channel
ORDER BY revenue DESC
LIMIT 50;
```


### My Edits
 I added a join statement in the first CTE because it was attempting to filter for `category` in the `order_items` table when `category` is in `products`. So, I joined `products` to `order_items` on `product_id`.


 Additionally, the query was attempting to use `reatil_price` from `order_items` whihc doesn't exist so I changed the table to `products`.

Then I had to change `o.country` to `u.country` and join `users` to `orders` to be able to pull the order region.

Also, there is no `device` column in any table, so I commented it out.




In [47]:
product_query = """
-- Investigate drivers for the "Active" category
WITH items AS (
  SELECT
    oi.order_id,
    oi.sale_price,
    p.retail_price,
    SAFE_DIVIDE(oi.sale_price, p.retail_price) AS discount_ratio
  FROM `bigquery-public-data.thelook_ecommerce.order_items` oi
  join `bigquery-public-data.thelook_ecommerce.products` p
  on oi.product_id = p.id
  WHERE p.category = 'Active'
),
orders AS (
  SELECT
    o.order_id,
    o.user_id,
    u.country AS region,
    -- o.device,
    -- Some versions of the dataset include marketing_channel; if not, this will error out
    -- You can comment/remove this field if unavailable
    u.traffic_source AS marketing_channel
  FROM `bigquery-public-data.thelook_ecommerce.orders` o
  join `bigquery-public-data.thelook_ecommerce.users` u
  on o.user_id = u.id
)
SELECT
  o.region,
  -- o.device,
  o.marketing_channel,
  COUNT(DISTINCT i.order_id) AS orders,
  COUNT(DISTINCT o.user_id) AS unique_customers,
  SUM(i.sale_price) AS revenue,
  SAFE_DIVIDE(SUM(i.sale_price), COUNT(DISTINCT i.order_id)) AS aov,
  AVG(i.discount_ratio) AS avg_discount_ratio,
  -- Conversion proxy: orders per unique customer
  SAFE_DIVIDE(COUNT(DISTINCT i.order_id), COUNT(DISTINCT o.user_id)) AS orders_per_customer
FROM items i
JOIN orders o
  ON i.order_id = o.order_id
GROUP BY o.region, o.marketing_channel
ORDER BY revenue DESC
LIMIT 50;

"""
product_df = client.query(product_query).to_dataframe()
product_df.head(20)

Unnamed: 0,region,marketing_channel,orders,unique_customers,revenue,aov,avg_discount_ratio,orders_per_customer
0,China,Search,2162,2085,110314.909979,51.024473,1.0,1.03693
1,United States,Search,1377,1323,73612.940008,53.458925,1.0,1.040816
2,Brasil,Search,908,876,45177.250044,49.754681,1.0,1.03653
3,China,Organic,458,437,23579.020013,51.482576,1.0,1.048055
4,South Korea,Search,337,323,16565.150024,49.154748,1.0,1.043344
5,United Kingdom,Search,276,266,15344.929993,55.597572,1.0,1.037594
6,France,Search,301,286,15068.340022,50.06093,1.0,1.052448
7,Germany,Search,231,224,13584.74001,58.808398,1.0,1.03125
8,United States,Organic,275,263,12443.669991,45.249709,1.0,1.045627
9,Spain,Search,262,253,11843.650011,45.204771,1.0,1.035573


**Segment deep dive (e.g., “repeat customers,” “new customers,” or “high-LTV cohort”):**  
  - **Drivers:** device mix, region, discount sensitivity.  
  - **Output:** revenue, AOV, repeat rate, discount effects.

  ### Prompt
For the Investigate section, please tailor the segment deep dive SQL query to the repeat vs new level. Please look for driver metrics like discounts, device, and other potential marketing channels. Additionally, ensure that our output has metrics like revenue, orders, average order value (AOV), and conversion proxy

  ### Response
```sql
-- Segment definition: new vs. repeat customers
WITH first_order AS (
  SELECT
    user_id,
    MIN(DATE(created_at)) AS first_order_date
  FROM `bigquery-public-data.thelook_ecommerce.orders`
  GROUP BY user_id
),
orders_enriched AS (
  SELECT
    o.order_id,
    o.user_id,
    DATE(o.created_at) AS order_date,
    o.country AS region,
    o.device,
    CASE WHEN DATE(o.created_at) = fo.first_order_date THEN 'New' ELSE 'Repeat' END AS segment
  FROM `bigquery-public-data.thelook_ecommerce.orders` o
  JOIN first_order fo USING(user_id)
),
order_rev AS (
  SELECT
    oi.order_id,
    SUM(oi.sale_price) AS revenue,
    AVG(SAFE_DIVIDE(oi.sale_price, oi.retail_price)) AS avg_discount_ratio
  FROM `bigquery-public-data.thelook_ecommerce.order_items` oi
  GROUP BY order_id
)
SELECT
  oe.segment,
  oe.region,
  oe.device,
  COUNT(DISTINCT oe.order_id) AS orders,
  SUM(orv.revenue) AS revenue,
  SAFE_DIVIDE(SUM(orv.revenue), COUNT(DISTINCT oe.order_id)) AS aov,
  AVG(orv.avg_discount_ratio) AS avg_discount_ratio
FROM orders_enriched oe
JOIN order_rev orv USING(order_id)
GROUP BY oe.segment, oe.region, oe.device
ORDER BY revenue DESC;
```


  ### My Edits
  The orders enriched CTE was attmepting to find the `country` column in `orders`, so I joined `users` and used `country` from `users` instead.

  Additionally, there is no `device` column, so I commented it out.

  Then, the query was attempting to use `reatil_price` from `order_items` whihc doesn't exist so I changed the table to `products`.

In [48]:
segment_query = """
-- Segment definition: new vs. repeat customers
WITH first_order AS (
  SELECT
    user_id,
    MIN(DATE(created_at)) AS first_order_date
  FROM `bigquery-public-data.thelook_ecommerce.orders`
  GROUP BY user_id
),
orders_enriched AS (
  SELECT
    o.order_id,
    o.user_id,
    DATE(o.created_at) AS order_date,
    u.country AS region,
    --o.device,
    CASE WHEN DATE(o.created_at) = fo.first_order_date THEN 'New' ELSE 'Repeat' END AS segment
  FROM `bigquery-public-data.thelook_ecommerce.orders` o
  JOIN first_order fo USING(user_id)
  JOIN `bigquery-public-data.thelook_ecommerce.users` u
  ON o.user_id = u.id
),
order_rev AS (
  SELECT
    oi.order_id,
    SUM(oi.sale_price) AS revenue,
    AVG(SAFE_DIVIDE(oi.sale_price, p.retail_price)) AS avg_discount_ratio
  FROM `bigquery-public-data.thelook_ecommerce.order_items` oi
  JOIN `bigquery-public-data.thelook_ecommerce.products` p
  ON oi.product_id = p.id
  GROUP BY order_id
)
SELECT
  oe.segment,
  oe.region,
  --oe.device,
  COUNT(DISTINCT oe.order_id) AS orders,
  SUM(orv.revenue) AS revenue,
  SAFE_DIVIDE(SUM(orv.revenue), COUNT(DISTINCT oe.order_id)) AS aov,
  AVG(orv.avg_discount_ratio) AS avg_discount_ratio
FROM orders_enriched oe
JOIN order_rev orv USING(order_id)
GROUP BY oe.segment, oe.region
ORDER BY revenue DESC;

"""
segment_df = client.query(segment_query).to_dataframe()
segment_df.head(50)

Unnamed: 0,segment,region,orders,revenue,aov,avg_discount_ratio
0,New,China,27792,2390206.0,86.003375,1.0
1,New,United States,18059,1589531.0,88.018776,1.0
2,Repeat,China,15244,1327594.0,87.089595,1.0
3,New,Brasil,11804,1025606.0,86.886311,1.0
4,Repeat,United States,9820,854014.8,86.96688,1.0
5,Repeat,Brasil,6467,553547.9,85.595783,1.0
6,New,South Korea,4268,361427.6,84.683119,1.0
7,New,France,3800,333356.4,87.725363,1.0
8,New,United Kingdom,3698,321435.0,86.921312,1.0
9,New,Spain,3309,283744.6,85.749344,1.0


---

## Validate: counter-checks, alternative queries, and a misleading insight to fix

- **Counter-checks:**  
  - Discount-driven growth  
    - Re-run with median discount ratio per group to reduce outlier skew; compare to mean.  






In [52]:
median_validation_query = """
-- Median-based discount check to reduce skew
WITH item_discounts AS (
  SELECT
    p.category,
    SAFE_DIVIDE(oi.sale_price, p.retail_price) AS discount_ratio
  FROM `bigquery-public-data.thelook_ecommerce.order_items` oi
  JOIN `bigquery-public-data.thelook_ecommerce.products` p
  on oi.product_id = p.id
  WHERE p.retail_price IS NOT NULL AND p.retail_price > 0
)
SELECT
  category,
  APPROX_QUANTILES(discount_ratio, 100)[OFFSET(50)] AS median_discount_ratio,
  AVG(discount_ratio) AS mean_discount_ratio
FROM item_discounts
GROUP BY category
ORDER BY median_discount_ratio ASC;
"""
median_validation_df = client.query(median_validation_query).to_dataframe()
median_validation_df.head(20)

Unnamed: 0,category,median_discount_ratio,mean_discount_ratio
0,Accessories,1.0,1.0
1,Intimates,1.0,1.0
2,Plus,1.0,1.0
3,Active,1.0,1.0
4,Socks & Hosiery,1.0,1.0
5,Maternity,1.0,1.0
6,Sleep & Lounge,1.0,1.0
7,Socks,1.0,1.0
8,Tops & Tees,1.0,1.0
9,Leggings,1.0,1.0


- **Counter-checks:**  
  - Repeat order rate timeframe
    - Re-run with 180 day period to see if there are large differences in the rate.  

In [53]:
repeat180_query = """
-- Repeat purchase rate (RPR) over last 180 days
WITH orders_180 AS (
  SELECT
    o.user_id,
    DATE(o.created_at) AS order_date,
    o.order_id
  FROM `bigquery-public-data.thelook_ecommerce.orders` o
  WHERE o.created_at >= TIMESTAMP_SUB(CURRENT_TIMESTAMP(), INTERVAL 180 DAY)
),
by_customer AS (
  SELECT
    user_id,
    COUNT(DISTINCT order_id) AS orders_180
  FROM orders_180
  GROUP BY user_id
)
SELECT
  SAFE_DIVIDE(SUM(CASE WHEN orders_180 >= 2 THEN 1 ELSE 0 END), COUNT(*)) AS repeat_purchase_rate_180d
FROM by_customer;

"""

repeat180_df = client.query(repeat180_query).to_dataframe()
repeat180_df.head()

Unnamed: 0,repeat_purchase_rate_180d
0,0.206587


# Validation Insights
After running the median vs mean discount ratio query, it showed that there was no difference between the mean and median, which is 1.0. This led me to believe there might be some sort of issue with the query or the data. After checking the query, it was valid. So, I went into BigQuery and inspected the data myself to see if there were any instances of discounts or not with SQL using the follwing query:

```sql
SELECT *
from `bigquery-public-data.thelook_ecommerce.products` p
JOIN `bigquery-public-data.thelook_ecommerce.order_items` oi
ON p.id = oi.product_id
where p.retail_price != oi.sale_price
```

This query returned zero results which means there have been zero items using purchased at a discounted price, so there is no need to measure discount ratio.


Additionally, when changing the repeat purchase rate timeframe (90 days vs 180 days), the rate increases from approx. 18% to 20% which is relatively small for doubling time frame. This could suggest that repeat purchases come from a small sample of customers who frequently buy The Look products while most other customers by once or in longer intervals like a year or two. Perhaps The Look has products that last long enough that standard consumers do not purchase again for longer periods.

---

## Extend: interactive Plotly (Colab) and executive Looker Studio dashboard

### Plotly in Colab (interactive example)







In [49]:
import plotly.express as px

query = """
WITH monthly AS (
  SELECT
    DATE_TRUNC(DATE(created_at), MONTH) AS month_start,
    SUM(sale_price) AS revenue
  FROM `bigquery-public-data.thelook_ecommerce.order_items`
  GROUP BY month_start
)
SELECT
  month_start,
  revenue,
  LAG(revenue) OVER (ORDER BY month_start) AS prev_rev,
  SAFE_DIVIDE(revenue - LAG(revenue) OVER (ORDER BY month_start),
              LAG(revenue) OVER (ORDER BY month_start)) AS mom_growth
FROM monthly
ORDER BY month_start
"""

df = client.query(query).to_dataframe()

fig = px.line(
    df,
    x="month_start",
    y="revenue",
    title="Monthly revenue with MoM hover",
    markers=True
)
fig.update_traces(hovertemplate="Month: %{x}<br>Revenue: %{y:$,.0f}<br>MoM: %{customdata:.1%}")
fig.update_traces(customdata=df["mom_growth"])
fig.update_layout(hovermode="x unified")
fig.show()

---

## Recommendations

- **Recommendation 1: Introduce Targeted Discounts to Drive Conversions and Loyalty**
  - **Move:** Launch a structured discount program for the "Active" category, starting with limited-time offers (e.g., 10–15% off) for first-time buyers and loyalty rewards for repeat customers.
  - **Why:** Currently, there are no discounts in place and adding them can lower the barrier for new customers to try the brand while simultaneously rewarding existing customers, encouraging repeat purchases.
  - **Expected effect:** Increase in new customer acquisition (conversion lift) and improved repeat purchase rate, especially in high-potential segments.
  - **Next step:** Pilot discounts in one or two regions or channels, track uplift in orders per customer and AOV, and compare against a control group to validate ROI. Use these findings to lead future discount programs for more conversions.


- **Recommendation 2: Device-specific UX tweaks for mobile-heavy regions**  
  - **Move:** Optimize mobile checkout for regions with high order share but lower AOV; reduce friction (fewer steps, clearer price/discount displays).  
  - **Why:** Revenue leadership on mobile may be population-driven; conversion-per-user and AOV lag indicate UX constraints.  
  - **Expected effect:** Lift AOV and conversion without increasing discounts.  
  - **Next step:** Track device-level KPIs in dashboard; compare MoM after rollout.



# DIVE Reflection

After the Validate stage, I was able to realize there is no way to measure discount impact becuase there were no discounts. This reframed my approach from looking if discounts had been working and how to potentially change them to focusing on experiments with discounts (preferably with A/B testing) to see how they impact revenue and conversions. Changing to this experiment focus, The Look would be able to send discounts to small, random samples of customers to test if discounts work on a small, but representative, sample instead of pushing mass discounts and then analyzing. The experiment style will let The Look determine the value of discounts at a much less harmful level.