In [0]:
# Rerun this in dashboard notebook
storage_account = dbutils.secrets.get(scope="local-scope", key="storage-account-name")
client_id = dbutils.secrets.get(scope="local-scope", key="sp-client-id")
tenant_id = dbutils.secrets.get(scope="local-scope", key="sp-tenant-id")
client_secret = dbutils.secrets.get(scope="local-scope", key="sp-client-secret")

spark.conf.set(f"fs.azure.account.auth.type.{storage_account}.dfs.core.windows.net", "OAuth")
spark.conf.set(f"fs.azure.account.oauth.provider.type.{storage_account}.dfs.core.windows.net", 
               "org.apache.hadoop.fs.azurebfs.oauth2.ClientCredsTokenProvider")
spark.conf.set(f"fs.azure.account.oauth2.client.id.{storage_account}.dfs.core.windows.net", client_id)
spark.conf.set(f"fs.azure.account.oauth2.client.secret.{storage_account}.dfs.core.windows.net", client_secret)
spark.conf.set(f"fs.azure.account.oauth2.client.endpoint.{storage_account}.dfs.core.windows.net", 
               f"https://login.microsoftonline.com/{tenant_id}/oauth2/token")


In [0]:
df = spark.read.parquet(f"abfss://curated@{storage_account}.dfs.core.windows.net/uplift-output/")
df.createOrReplaceTempView("uplift_results")


## 🎯 Uplift Modeling Dashboard – Criteo Campaign

This dashboard summarizes the impact of uplift modeling on Criteo's 1M user dataset.

**Goal**: Identify high-value customers who are most likely to respond positively to a targeted campaign — _only because they are treated_ (i.e., receive marketing).

- Model: XGBoost-based Uplift Regressor (causalML)
- Dataset: Criteo 1M (curated & feature engineered)
- Outcome: Conversion uplift score per user


In [0]:
%sql
SELECT treatment, COUNT(*) AS users, ROUND(AVG(conversion), 4) AS avg_conversion_rate
FROM uplift_results
GROUP BY treatment


In [0]:
%sql
SELECT 
  COUNT(*) AS total_users,
  ROUND(MIN(uplift_score), 4) AS min_uplift,
  ROUND(MAX(uplift_score), 4) AS max_uplift,
  ROUND(AVG(uplift_score), 4) AS avg_uplift
FROM uplift_results


In [0]:
%sql
SELECT * 
FROM uplift_results
ORDER BY uplift_score DESC
LIMIT 10


In [0]:
from pyspark.sql.functions import col, when


In [0]:
from pyspark.sql.functions import when

df = df.withColumn("uplift_segment", when(col("uplift_score") > 0.05, "High Uplift")
                                     .when(col("uplift_score") > 0.01, "Medium Uplift")
                                     .otherwise("Low/Negative Uplift"))
df.createOrReplaceTempView("uplift_results_segmented")


In [0]:
%sql
SELECT uplift_segment, COUNT(*) AS users
FROM uplift_results_segmented
GROUP BY uplift_segment
ORDER BY users DESC


### 🔐 Data Governance & Reproducibility

- Data Source: Curated Criteo dataset stored securely in ADLS Gen2
- Access Control: Managed via Databricks Secret Scopes and SPNs
- Versioning: Model training and artifacts tracked with MLflow
- Lineage: Feature engineering and output pipelines documented in Notebooks
