## 🛠️ Mod5 Data Challenge 4: Funnels & Step Conversions 


### **Goal:** Build a simple funnel using Olist’s public marketing data:
1) **MQL**: all marketing qualified leads (MQLs)
2) **Contacted**: leads with a non-null `first_contact_date`
3) **Won**: leads that appear in `closed_deals` (i.e., `won_date` not null)

We’ll compute counts and conversion rates overall and by **origin** (channel).

### Data
A marketing dataset by the Olist Store (one-year time frame).  Two files (one with "leads" the other with "closed deals")  Read more about the data [HERE](https://www.kaggle.com/datasets/olistbr/marketing-funnel-olist/data?select=olist_marketing_qualified_leads_dataset.csv).   

### 👩‍🏫 Instructor-Led Demo (25 minutes)

#### Step 0:  Load both data files, format date columns, AND JOIN them together

In [None]:
import pandas as pd
import numpy as np

#mql is the marketing qualified leads and closed is the closed deals 

mql = None
closed = None

# Standardize date columns to datetime objects
mql['first_contact_date'] = pd.to_datetime(mql.get('first_contact_date'))
closed['won_date'] = None

# Join on mql_id to mark closed/won
funnel = mql.merge(None)

# Robust channel column (named 'origin' in the dataset)
channel_col = 'origin' if 'origin' in funnel.columns else None
if channel_col is None:
    # if channel missing, create a single bucket
    funnel['origin'] = 'All'
    channel_col = 'origin'

funnel.head(3)

#### Step 1:  Define the Funnel


In [None]:
# Binary flags for each step
funnel['is_mql'] = 1
funnel['is_contacted'] = funnel['first_contact_date'].notna().astype(int)
funnel['is_won'] = None

steps = ['is_mql', 'is_contacted', 'is_won']


#### Step 2:  Create the Step-to-step conversion columns for an Overall Funnel (across ALL origins)

Students:  What does the `overall_conv` of `is_won` tell you?

In [None]:
overall = funnel[steps].sum().to_frame('count')
overall['step'] = overall.index.map({'is_mql':'MQL','is_contacted':'Contacted','is_won':'Won'})
overall = overall[['step','count']]

# Step-to-step conversion
overall['step_conv'] = overall['count'] / overall['count'].shift(1)

# Overall conversion (from first step)
overall['overall_conv'] = None
overall['step_conv'] = (overall['step_conv']*100).round(2)
overall['overall_conv'] = None
overall


#### Step 3:  Create a Funnel by Channel (counts + rates)

Students:  What are some important insights from the results?  Would you recommend a marketing team to use one origin over the other? 

In [None]:
by_ch = (funnel
         .groupby(channel_col)[steps]
         .sum()
         .sort_values('is_mql', ascending=False))

# Compute rates per channel row
rates = by_ch.copy()
rates['contacted_rate'] = (rates['is_contacted'] / rates['is_mql']).round(3)
rates['won_rate']       = None
rates['overall_rate']   = None

# Tidy view
out = by_ch.join(rates[['contacted_rate','won_rate','overall_rate']])
out


#### Instructor Section Notes: Interpreting the Output

- **Largest drop** is usually MQL → Contacted (sales team reach rate).
- **Won_rate** captures how many contacted leads ultimately close; it varies by origin.
- **Overall_rate** (Won/MQL) is the north-star for channel efficiency.
- Action: pick the **weakest step per channel** and state a concrete improvement (copy, targeting, SDR playbook, follow-up cadence, etc.).


### 👩‍💻 Student-Led Section (20 minutes) -- ANSWER KEY

### Student: Build & Interpret **Device** Funnels (Desktop vs Mobile)

**Your tasks:**
1) Make per-device funnel counts (MQL, Contacted, Won).
2) Compute step conversion rates and overall conversion per device.
3) Identify which device has the weakest "won_rate" and make a recommendation.  




In [None]:
# RUN THIS CELL WITHOUT CHANGES!  We are making a hypothetical "device" column to add to the data 

# Normalize/mend origin
if 'origin' not in funnel.columns:
    funnel['origin'] = 'Unknown'
funnel['origin'] = funnel['origin'].fillna('Unknown').astype(str)

# Channel → device priors (rough, realistic)
device_priors = {
    'Email':       {'Desktop': 0.60, 'Mobile': 0.40},
    'Paid Search': {'Desktop': 0.45, 'Mobile': 0.55},
    'Paid_Search': {'Desktop': 0.45, 'Mobile': 0.55},  # handle underscore variant
    'Social':      {'Desktop': 0.30, 'Mobile': 0.70},
    'Direct':      {'Desktop': 0.55, 'Mobile': 0.45},
    'Referral':    {'Desktop': 0.50, 'Mobile': 0.50},
    'Unknown':     {'Desktop': 0.50, 'Mobile': 0.50},
}

# Reproducible assignment
rng = np.random.default_rng(7)

def pick_device(origin):
    priors = device_priors.get(origin, device_priors['Unknown'])
    return rng.choice(['Desktop','Mobile'], p=[priors['Desktop'], priors['Mobile']])

funnel['device'] = funnel['origin'].apply(pick_device)

# Quick sanity check
funnel['device'].value_counts(normalize=True).round(3)

#### Task 1:  Get per-device counts


In [None]:
steps = ['is_mql','is_contacted','is_won']

# Make sure a device column exists; if not, default everything to "Unknown"
if 'device' not in funnel.columns:
    funnel['device'] = 'Unknown'

# Group and sum counts per device
funnel_counts_student_device = None  #Hint:  Group by device and steps
funnel_counts_student_device

#### Task 2:  Get Per-Device Rates 


In [None]:
conv_student_device = funnel_counts_student_device.copy()

# Step rates
# contacted_rate = contacted / mql
# won_rate       = won / contacted
# overall_rate   = won / mql

conv_student_device['contacted_rate'] = None  
conv_student_device['won_rate'] = None  
conv_student_device['overall_rate'] = None  

# Show rates only (cleaner view)
conv_student_device[['contacted_rate','won_rate','overall_rate']]

#### Task 3: Weakest Step Per Device

In [None]:
# For each device, find the smallest of contacted_rate and won_rate
rate_cols = ['contacted_rate','won_rate']

weakest_step_device = None  #Hint use a .idxmin() on the rate_cols
weakest_rate_device = None  


# Combine into a tidy summary
summary_device = conv_student_device[['contacted_rate','won_rate','overall_rate']].copy()
summary_device['weakest_step'] = weakest_step_device
summary_device['weakest_rate'] = weakest_rate_device.round(3)
summary_device

### Interpretation of Your Results (3-5 Sentences)

- For **Desktop** and **Mobile**, name the device that had the weakest "won_rate"
- Suggest **one device‑specific action**:
  - For Example:  "If Desktop contacted_rate is low → test desktop hero/CTA (call to action) placement"
- Name **one metric** you’ll monitor next week to confirm improvements (e.g., won_rate for Mobile).


### 📣 Class Share-Out & Instructor Wrap-Up (15 minutes)

Be ready share out the following points with the class: 

**Explain:**
Your one device-specific action and the one metric you would monitor 

#### Instructor Wrap-Up (Notes)


- **Different segmentation, different story**: Channel vs Device can surface different weak steps in marketing. Keep both lenses in your toolkit.
- **Targeted fixes**:
  - Low *contacted_rate* → acquisition UX: ad/landing alignment, page speed (esp. mobile), CTA placement.
  - Low *won_rate* → checkout/closing UX: form length, trust badges, mobile wallets, pricing clarity.
- **Explainability first**: Step rates are intuitive and align to concrete product or marketing experiments.
- **Guardrails**:
  - Ensure steps belong to the same journey window (avoid mixing old impressions with recent sales).
  - Dedupe MQLs; validate contact timestamps (no “phantom” contacts).
  - Consider seasonality and campaigns when comparing devices.
- **Next**:
  - Cross‑segmentation (Device × Channel) to find where the *biggest leaks* are.
  - Time trends: weekly device funnels to catch regressions after UI changes.
  - Tie experiments to a single target step (e.g., Mobile *won_rate*) and commit to a monitoring window.


