<a href="https://colab.research.google.com/github/root-git/stratascratch-sql-challenges/blob/main/3_Marketing_Campaign_Success.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

## Marketing Campaign Success

You have the marketing_campaign table, which records in-app purchases by users. Users making their first in-app purchase enter a marketing campaign, where they see call-to-actions for more purchases. Find how many users made additional purchases due to the campaign's success.

The campaign starts one day after the first purchase. Users with only one or multiple purchases on the first day do not count, nor do users who later buy only the same products from their first day.


**Original Question Link:**  
[StrataScratch ID 514 – First Day Retention Rate](https://platform.stratascratch.com/coding/514-marketing-campaign-success-advanced?code_type=1)

---

# Table Schema

#### `marketing_campaign`

| Column        | Type    | Description                               |
|---------------|---------|-------------------------------------------|
| `created_at`  | `date`  | Date of in-app purchase                   |
| `price`       | `bigint`| Price of the purchased product            |
| `product_id`  | `bigint`| ID of the product                         |
| `quantity`    | `bigint`| Number of products purchased              |
| `user_id`     | `bigint`| ID of the user who made the purchase      |

---


# Thought Process

1. **Find the first purchase date for each user** using `MIN(created_at)`.
2. **Identify the products each user purchased on their first purchase day.**
3. **Filter out any purchases made on or before the first purchase day.**
4. For each user, check if **they purchased any new product(s)** (i.e., product_id not in their first day purchases).
5. **Count users** who meet the above criteria — these users are considered influenced by the marketing campaign.
---

In [11]:
import pandas as pd

# Create mock data with edge cases
data = {
    "created_at": [
        "2024-01-01", "2024-01-01",
        "2024-01-01", "2024-01-02",
        "2024-01-01", "2024-01-02",
        "2024-01-02", "2024-01-03",
        "2024-01-01",
        "2024-01-01", "2024-01-01", "2024-01-03",
        "2024-01-02", "2024-01-03",
    ],
    "price": [
        100, 200,
        150, 150,
        120, 180,
        200, 210,
        300,
        130, 150, 130,
        170, 190
    ],
    "product_id": [
        101, 102,
        103, 103,
        104, 105,
        106, 107,
        108,
        109, 110, 109,
        111, 112
    ],
    "quantity": [
        1, 2,
        1, 1,
        1, 2,
        2, 3,
        1,
        1, 1, 2,
        1, 1
    ],
    "user_id": [
        1, 1,
        2, 2,
        3, 3,
        4, 4,
        5,
        6, 6, 6,
        7, 7
    ]
}

# Create DataFrame
df = pd.DataFrame(data)

# Convert to datetime
df["created_at"] = pd.to_datetime(df["created_at"])

In [12]:
import sqlite3

# Load into SQLite (in-memory)
conn = sqlite3.connect(":memory:")
df.to_sql("marketing_campaign", conn, index=False, if_exists="replace")

# Show preview
print(pd.read_sql("SELECT * FROM marketing_campaign", conn))

             created_at  price  product_id  quantity  user_id
0   2024-01-01 00:00:00    100         101         1        1
1   2024-01-01 00:00:00    200         102         2        1
2   2024-01-01 00:00:00    150         103         1        2
3   2024-01-02 00:00:00    150         103         1        2
4   2024-01-01 00:00:00    120         104         1        3
5   2024-01-02 00:00:00    180         105         2        3
6   2024-01-02 00:00:00    200         106         2        4
7   2024-01-03 00:00:00    210         107         3        4
8   2024-01-01 00:00:00    300         108         1        5
9   2024-01-01 00:00:00    130         109         1        6
10  2024-01-01 00:00:00    150         110         1        6
11  2024-01-03 00:00:00    130         109         2        6
12  2024-01-02 00:00:00    170         111         1        7
13  2024-01-03 00:00:00    190         112         1        7


In [None]:
# Replace with your SQL query below
query = """ SELECT * FROM facebook_posts"""

result_df = pd.read_sql(query, conn)

In [18]:
query = """
WITH first_purchase AS
(
  SELECT
    user_id,
    MIN(created_at) AS first_date
  FROM marketing_campaign
  GROUP BY user_id
),
first_day_products AS
(
  SELECT
    mc.user_id,
    mc.product_id
  FROM marketing_campaign mc
  JOIN first_purchase fp
    ON mc.user_id = fp.user_id
   AND mc.created_at = fp.first_date
),
later_purchases AS
(
  SELECT
    mc.user_id,
    mc.product_id
  FROM marketing_campaign mc
  JOIN first_purchase fp
    ON mc.user_id = fp.user_id
   AND mc.created_at > fp.first_date
),
new_products AS
(
  SELECT
    lp.user_id
  FROM later_purchases lp
  LEFT JOIN first_day_products fdp
    ON lp.user_id = fdp.user_id
   AND lp.product_id = fdp.product_id
  WHERE fdp.product_id IS NULL
  GROUP BY lp.user_id
)
SELECT
  COUNT(user_id) AS num_users
FROM new_products
"""

solution = pd.read_sql(query, conn)

In [19]:
# Compare the two results
are_equal = result_df.equals(solution)

# Print result based on the comparison
if are_equal:
    print("Correct!")
else:
    print("Try again!")

Try again!


### Problem Explanation
### Step 1: Find first purchase date from each user
```sql
SELECT
    user_id,
    MIN(created_at) AS first_date
  FROM marketing_campaign
  GROUP BY user_id
```
### Step 2: Get products purchased on the first purchase day
``` sql
first_day_products AS
(
  SELECT
    mc.user_id,
    mc.product_id
  FROM marketing_campaign mc
  JOIN first_purchase fp
    ON mc.user_id = fp.user_id
   AND mc.created_at = fp.first_date
)
```
### Step 3: Get products purchased after the first purchase day
```sql
later_purchases AS
(
  SELECT
    mc.user_id,
    mc.product_id
  FROM marketing_campaign mc
  JOIN first_purchase fp
    ON mc.user_id = fp.user_id
   AND mc.created_at > fp.first_date
)
```
### Step 4: Find new products purchased after first day
```sql
new_products AS
(
  SELECT
    lp.user_id
  FROM later_purchases lp
  LEFT JOIN first_day_products fdp
    ON lp.user_id = fdp.user_id
   AND lp.product_id = fdp.product_id
  WHERE fdp.product_id IS NULL
  GROUP BY lp.user_id
)
```
### Step 5: Count number of users
```sql
SELECT
  COUNT(user_id) AS num_users
FROM new_products
```
---


