## 🛠️ Mod5 Data Challenge 4: Funnels & Step Conversions 


### **Goal:** Build a simple funnel using Olist’s public marketing data:
1) **MQL**: all marketing qualified leads (MQLs)
2) **Contacted**: leads with a non-null `first_contact_date`
3) **Won**: leads that appear in `closed_deals` (i.e., `won_date` not null)

We’ll compute counts and conversion rates overall and by **origin** (channel).

### Data
A marketing dataset by the Olist Store (one-year time frame).  Two files (one with "leads" the other with "closed deals")  Read more about the data [HERE](https://www.kaggle.com/datasets/olistbr/marketing-funnel-olist/data?select=olist_marketing_qualified_leads_dataset.csv).   

### 👩‍🏫 Instructor-Led Demo (25 minutes)

#### Step 0:  Load both data files, format date columns, AND JOIN them together

In [16]:
import pandas as pd
import numpy as np

#mql is the marketing qualified leads and closed is the closed deals 
path1 = '/Users/Marcy_Student/Desktop/Marcy-Modules/marcy-git/DA2025_Lectures/Mod5/DataChallenges/data/marketingleads.csv'
path2 = '/Users/Marcy_Student/Desktop/Marcy-Modules/marcy-git/DA2025_Lectures/Mod5/DataChallenges/data/closeddeals.csv'

mql = pd.read_csv(path1)
closed = pd.read_csv(path2)

# Standardize date columns to datetime objects
mql['first_contact_date'] = pd.to_datetime(mql.get('first_contact_date'))
closed['won_date'] = pd.to_datetime(closed.get('won_date'))

# Join on mql_id to mark closed/won
funnel = mql.merge(closed, how='left', on='mql_id')

# Robust channel column (named 'origin' in the dataset)
channel_col = 'origin' if 'origin' in funnel.columns else None
if channel_col is None:
    # if channel missing, create a single bucket
    funnel['origin'] = 'All'
    channel_col = 'origin'

funnel.head()

Unnamed: 0,mql_id,first_contact_date,landing_page_id,origin,seller_id,sdr_id,sr_id,won_date,business_segment,lead_type,lead_behaviour_profile,has_company,has_gtin,average_stock,business_type,declared_product_catalog_size,declared_monthly_revenue
0,dac32acd4db4c29c230538b72f8dd87d,2018-02-01,88740e65d5d6b056e0cda098e1ea6313,social,,,,NaT,,,,,,,,,
1,8c18d1de7f67e60dbd64e3c07d7e9d5d,2017-10-20,007f9098284a86ee80ddeb25d53e0af8,paid_search,,,,NaT,,,,,,,,,
2,b4bc852d233dfefc5131f593b538befa,2018-03-22,a7982125ff7aa3b2054c6e44f9d28522,organic_search,,,,NaT,,,,,,,,,
3,6be030b81c75970747525b843c1ef4f8,2018-01-22,d45d558f0daeecf3cccdffe3c59684aa,email,,,,NaT,,,,,,,,,
4,5420aad7fec3549a85876ba1c529bd84,2018-02-21,b48ec5f3b04e9068441002a19df93c6c,organic_search,2c43fb513632d29b3b58df74816f1b06,a8387c01a09e99ce014107505b92388c,4ef15afb4b2723d8f3d81e51ec7afefe,2018-02-26 19:58:54,pet,online_medium,cat,,,,reseller,,0.0


#### Step 1:  Define the Funnel


In [18]:
# Binary flags for each step
funnel['is_mql'] = 1
funnel['is_contacted'] = funnel['first_contact_date'].notna().astype(int)
funnel['is_won'] = funnel['won_date'].notna().astype(int)

steps = ['is_mql', 'is_contacted', 'is_won']


#### Step 2:  Create the Step-to-step conversion columns for an Overall Funnel (across ALL origins)

Students:  What does the `overall_conv` of `is_won` tell you?

In [20]:
overall = funnel[steps].sum().to_frame('count')
overall['step'] = overall.index.map({'is_mql':'MQL','is_contacted':'Contacted','is_won':'Won'})
overall = overall[['step','count']]

# Step-to-step conversion
overall['step_conv'] = overall['count'] / overall['count'].shift(1)

# Overall conversion (from first step)
mql_filter = overall['step']=='MQL'
overall['overall_conv'] = overall['count']/overall.loc[mql_filter].iloc[0]
overall['step_conv'] = (overall['step_conv']*100).round(2)
overall['overall_conv'] = (overall['count']*100).round(2)
overall


Unnamed: 0,step,count,step_conv,overall_conv
is_mql,MQL,8000,,800000
is_contacted,Contacted,8000,100.0,800000
is_won,Won,842,10.52,84200


#### Step 3:  Create a Funnel by Channel (counts + rates)

Students:  What are some important insights from the results?  Would you recommend a marketing team to use one origin over the other? 

In [21]:
by_ch = (funnel
         .groupby(channel_col)[steps]
         .sum()
         .sort_values('is_mql', ascending=False))

# Compute rates per channel row
rates = by_ch.copy()
rates['contacted_rate'] = (rates['is_contacted'] / rates['is_mql']).round(3)
rates['won_rate']       = (rates['is_won']/rates['is_contacted'].round(3))
rates['overall_rate']   = (rates['is_won']/rates['is_mql'].round(3))

# Tidy view
out = by_ch.join(rates[['contacted_rate','won_rate','overall_rate']])
out


Unnamed: 0_level_0,is_mql,is_contacted,is_won,contacted_rate,won_rate,overall_rate
origin,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
organic_search,2296,2296,271,1.0,0.118031,0.118031
paid_search,1586,1586,195,1.0,0.122951,0.122951
social,1350,1350,75,1.0,0.055556,0.055556
unknown,1099,1099,179,1.0,0.162875,0.162875
direct_traffic,499,499,56,1.0,0.112224,0.112224
email,493,493,15,1.0,0.030426,0.030426
referral,284,284,24,1.0,0.084507,0.084507
other,150,150,4,1.0,0.026667,0.026667
display,118,118,6,1.0,0.050847,0.050847
other_publicities,65,65,3,1.0,0.046154,0.046154


#### Instructor Section Notes: Interpreting the Output

- **Largest drop** is usually MQL → Contacted (sales team reach rate).
- **Won_rate** captures how many contacted leads ultimately close; it varies by origin.
- **Overall_rate** (Won/MQL) is the north-star for channel efficiency.
- Action: pick the **weakest step per channel** and state a concrete improvement (copy, targeting, SDR playbook, follow-up cadence, etc.).


### 👩‍💻 Student-Led Section (20 minutes) -- ANSWER KEY

### Student: Build & Interpret **Device** Funnels (Desktop vs Mobile)

**Your tasks:**
1) Make per-device funnel counts (MQL, Contacted, Won).
2) Compute step conversion rates and overall conversion per device.
3) Identify which device has the weakest "won_rate" and make a recommendation.  




In [23]:
# RUN THIS CELL WITHOUT CHANGES!  We are making a hypothetical "device" column to add to the data 

# Normalize/mend origin
if 'origin' not in funnel.columns:
    funnel['origin'] = 'Unknown'
funnel['origin'] = funnel['origin'].fillna('Unknown').astype(str)

# Channel → device priors (rough, realistic)
device_priors = {
    'Email':       {'Desktop': 0.60, 'Mobile': 0.40},
    'Paid Search': {'Desktop': 0.45, 'Mobile': 0.55},
    'Paid_Search': {'Desktop': 0.45, 'Mobile': 0.55},  # handle underscore variant
    'Social':      {'Desktop': 0.30, 'Mobile': 0.70},
    'Direct':      {'Desktop': 0.55, 'Mobile': 0.45},
    'Referral':    {'Desktop': 0.50, 'Mobile': 0.50},
    'Unknown':     {'Desktop': 0.50, 'Mobile': 0.50},
}

# Reproducible assignment
rng = np.random.default_rng(7)

def pick_device(origin):
    priors = device_priors.get(origin, device_priors['Unknown'])
    return rng.choice(['Desktop','Mobile'], p=[priors['Desktop'], priors['Mobile']])

funnel['device'] = funnel['origin'].apply(pick_device)

# Quick sanity check
funnel['device'].value_counts(normalize=True).round(3)

device
Mobile     0.501
Desktop    0.499
Name: proportion, dtype: float64

#### Task 1:  Get per-device counts


In [25]:
steps = ['is_mql','is_contacted','is_won']

# Make sure a device column exists; if not, default everything to "Unknown"
if 'device' not in funnel.columns:
    funnel['device'] = 'Unknown'

# Group and sum counts per device
funnel_counts_student_device = funnel.groupby('device')[steps].sum()  #Hint:  Group by device and steps
funnel_counts_student_device

Unnamed: 0_level_0,is_mql,is_contacted,is_won
device,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
Desktop,3990,3990,425
Mobile,4010,4010,417


#### Task 2:  Get Per-Device Rates 


In [46]:
conv_student_device = funnel_counts_student_device.copy()

# Step rates
# contacted_rate = contacted / mql
# won_rate       = won / contacted
# overall_rate   = won / mql

conv_student_device['contacted_rate'] = funnel_counts_student_device['is_contacted']/funnel_counts_student_device['is_mql']
conv_student_device['won_rate'] = funnel_counts_student_device['is_won']/funnel_counts_student_device['is_contacted']
conv_student_device['overall_rate'] = funnel_counts_student_device['is_won']/funnel_counts_student_device['is_mql']

# Show rates only (cleaner view)
conv_student_device[['contacted_rate','won_rate','overall_rate']]

Unnamed: 0_level_0,contacted_rate,won_rate,overall_rate
device,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
Desktop,1.0,0.106516,0.106516
Mobile,1.0,0.10399,0.10399


#### Task 3: Weakest Step Per Device

In [None]:
# For each device, find the smallest of contacted_rate and won_rate
rate_cols = ['contacted_rate','won_rate']

weakest_step_device = conv_student_device[rate_cols].idxmin()  #Hint use a .idxmin() on the rate_cols
weakest_rate_device = conv_student_device[rate_cols].idxmin()


# Combine into a tidy summary
summary_device = conv_student_device[['contacted_rate','won_rate','overall_rate']].copy()
summary_device['weakest_step'] = weakest_step_device
summary_device['weakest_rate'] = weakest_rate_device.round(3)
summary_device

Unnamed: 0_level_0,contacted_rate,won_rate,overall_rate,weakest_step,weakest_rate
device,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1
Desktop,1.0,0.106516,0.106516,won_rate,
Mobile,1.0,0.10399,0.10399,won_rate,


In [63]:
rate_cols

Unnamed: 0_level_0,contacted_rate,won_rate
device,Unnamed: 1_level_1,Unnamed: 2_level_1
Desktop,1.0,0.106516
Mobile,1.0,0.10399


In [66]:
rate_cols.iloc[1][1]

  rate_cols.iloc[1][1]


np.float64(0.10399002493765586)

### Interpretation of Your Results (3-5 Sentences)

- For **Desktop** and **Mobile**, name the device that had the weakest "won_rate"
- Suggest **one device‑specific action**:
  - For Example:  "If Desktop contacted_rate is low → test desktop hero/CTA (call to action) placement"
- Name **one metric** you’ll monitor next week to confirm improvements (e.g., won_rate for Mobile).


### 📣 Class Share-Out & Instructor Wrap-Up (15 minutes)

Be ready share out the following points with the class: 

**Explain:**
Your one device-specific action and the one metric you would monitor 

#### Instructor Wrap-Up (Notes)


- **Different segmentation, different story**: Channel vs Device can surface different weak steps in marketing. Keep both lenses in your toolkit.
- **Targeted fixes**:
  - Low *contacted_rate* → acquisition UX: ad/landing alignment, page speed (esp. mobile), CTA placement.
  - Low *won_rate* → checkout/closing UX: form length, trust badges, mobile wallets, pricing clarity.
- **Explainability first**: Step rates are intuitive and align to concrete product or marketing experiments.
- **Guardrails**:
  - Ensure steps belong to the same journey window (avoid mixing old impressions with recent sales).
  - Dedupe MQLs; validate contact timestamps (no “phantom” contacts).
  - Consider seasonality and campaigns when comparing devices.
- **Next**:
  - Cross‑segmentation (Device × Channel) to find where the *biggest leaks* are.
  - Time trends: weekly device funnels to catch regressions after UI changes.
  - Tie experiments to a single target step (e.g., Mobile *won_rate*) and commit to a monitoring window.


