# Market_Analytics - Regression Analysis
Do higher-efficiency campaigns actually convert better?

What we CAN do?

PART A - Quasi A/B test (Top vs Bottom efficiency groups)

PART B - Regression analysis to quantify relationships


### PART A — Quasi A/B Test (Efficiency Segmentation)
Hypothesis

H₀: No difference in conversion rate between high- and low-efficiency campaigns

H₁: High-efficiency campaigns have higher conversion rates

In [1]:
# Step 1: Load Data
import pandas as pd
import numpy as np
from scipy import stats

campaign_perf = pd.read_csv(
    "../reports/budget_optimization_results.csv"
)

In [2]:
# Step 2: Define High vs Low Efficiency Groups
# Define top and bottom efficiency groups
top_group = campaign_perf.nlargest(20, "conversions_per_dollar")
bottom_group = campaign_perf.nsmallest(20, "conversions_per_dollar")

top_group["conversion_rate"].describe(), bottom_group["conversion_rate"].describe()

(count    20.000000
 mean      0.242669
 std       0.008229
 min       0.229630
 25%       0.238183
 50%       0.241838
 75%       0.246289
 max       0.266215
 Name: conversion_rate, dtype: float64,
 count    20.000000
 mean      0.172459
 std       0.004278
 min       0.163405
 25%       0.170678
 50%       0.173395
 75%       0.174767
 max       0.179004
 Name: conversion_rate, dtype: float64)

In [3]:
# Step 3: Two-Sample t-Test (Welch)
t_stat, p_value = stats.ttest_ind(
    top_group["conversion_rate"],
    bottom_group["conversion_rate"],
    equal_var=False
)

print(f"T-statistic: {t_stat:.3f}")
print(f"P-value: {p_value:.4f}")

T-statistic: 33.854
P-value: 0.0000


### Step 4: Interpretation
High-efficiency campaigns are associated with substantially higher conversion rates, indicating that efficiency is a strong performance discriminator. However, because campaigns were not randomly assigned, this result should be interpreted as correlational rather than causal.

## PART B — Regression Analysis (Operational Drivers of Conversion Rate)

This model evaluates whether traditional campaign levers (cost, CPC, exposure)
explain variation in conversion rates **without using derived efficiency metrics**.
This avoids mathematical dependence and ensures valid statistical inference.

In [6]:
# Step 1: Prepare regression data
import statsmodels.api as sm

reg_df = campaign_perf[[
    "conversion_rate",
    "cpc",
    "cost",
    "impressions"
]].dropna()

X = reg_df[["cpc", "cost", "impressions"]]
y = reg_df["conversion_rate"]

X = sm.add_constant(X)


In [7]:
# Step 3: Fit OLS model
model = sm.OLS(y, X).fit()
model.summary()

0,1,2,3
Dep. Variable:,conversion_rate,R-squared:,0.003
Model:,OLS,Adj. R-squared:,-0.006
Method:,Least Squares,F-statistic:,0.357
Date:,"Wed, 14 Jan 2026",Prob (F-statistic):,0.7
Time:,21:22:26,Log-Likelihood:,546.33
No. Observations:,216,AIC:,-1087.0
Df Residuals:,213,BIC:,-1077.0
Df Model:,2,,
Covariance Type:,nonrobust,,

0,1,2,3,4,5,6
,coef,std err,t,P>|t|,[0.025,0.975]
cpc,0.3958,0.017,22.981,0.000,0.362,0.430
cost,3.784e-05,7.36e-05,0.514,0.608,-0.000,0.000
impressions,8.37e-08,1.38e-07,0.607,0.545,-1.88e-07,3.56e-07

0,1,2,3
Omnibus:,10.4,Durbin-Watson:,1.99
Prob(Omnibus):,0.006,Jarque-Bera (JB):,10.732
Skew:,0.542,Prob(JB):,0.00467
Kurtosis:,3.125,Cond. No.,158000.0


### Interpretation

The regression results indicate that traditional campaign levers such as CPC,
total spend, and impression volume explain virtually none of the variation in
conversion rates (R² ≈ 0.003). The overall model is statistically insignificant,
suggesting that conversion performance is largely driven by structural factors
rather than incremental budget adjustments.

This finding is consistent with the earlier budget optimization results, where
reallocating spend across campaigns produced negligible aggregate lift. Together,
these analyses suggest that improving conversion outcomes likely requires changes
in targeting, creative strategy, or product-market fit rather than budget
reallocation alone.