# VOD Causal Analysis: From Uplift to Dynamic Pricing

This notebook demonstrates a complete causal inference workflow for a Video-on-Demand (VOD) platform. We progress effectively through two stages of maturity:

1.  **Binary Uplift Modeling**: Determining *who* should receive a promotion (Treatment vs. Control).
2.  **Continuous Price Optimization**: Determining *what price* maximizes revenue using Double Machine Learning (DML).

### Contents
1.  **Data Generation**: Create synthetic data with hidden confounding and price elasticity.
2.  **Part 1: Binary Uplift** - Using X-Learners to target users for campaigns.
3.  **Part 2: Dynamic Pricing** - Using EconML and DML to estimate price elasticity and optimize revenue.

In [None]:
import sys
import os
import warnings
warnings.filterwarnings('ignore')

# Add src to path
sys.path.append(os.path.abspath("../src"))

import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns

from vod_causal.data.generator import VODSyntheticData
from vod_causal.models.xlearner import XLearner
from vod_causal.models.dml import DMLWithEconML
from vod_causal.evaluation import plot_qini_curve
from revenue_optimizer import bulk_optimize

%matplotlib inline
sns.set_style("whitegrid")

## 1. Data Generation

We generate synthetic logs for 5,000 users. The data generator simulates a scenario where:
- **Confounding**: Older, more loyal users are historically offered higher prices (less discount).
- **Elasticity**: Different users respond differently to price changes (Heterogeneous Treatment Effects).

In [None]:
generator = VODSyntheticData(n_users=5000, n_titles=100, n_interactions=20000, seed=42)
data_dict = generator.generate_all()
df = generator.create_modeling_dataset(data_dict)

print(f"Dataset shape: {df.shape}")
df.head()

### Visualize Confounding
Let's see the bias in our historical data. High usage users tend to get higher prices.

In [None]:
plt.figure(figsize=(10, 6))
sns.scatterplot(data=df.sample(2000), x="avg_daily_watch_time", y="offered_price", alpha=0.3)
plt.title("Confounding: Offered Price vs. Watch Time")
plt.xlabel("Avg Daily Watch Time (min)")
plt.ylabel("Offered Price ($)")
plt.show()

## 2. Part 1: Binary Uplift Modeling

**Goal**: Identify which users are "persuadable" by *any* promotion.

- **Treatment**: `is_treated` (1 if price < base, 0 otherwise)
- **Outcome**: `did_rent`

We use the **X-Learner**, a meta-learner efficient for unbalanced treatment groups.

In [None]:
# Prepare Binary Treatment Data
feature_cols = ["price_sensitivity", "subscription_tenure_months", "avg_daily_watch_time", 
                "is_cold_start", "base_popularity", "release_year"]

# One-Hot Encoding for Region
X = pd.get_dummies(df[feature_cols + ["geo_region"]], columns=["geo_region"], drop_first=True)
T_binary = df["is_treated"].astype(int)
Y = df["did_rent"]

print("Training X-Learner...")
xl = XLearner(propensity_model=None) # Use default
xl.fit(X, T_binary, Y)

# Predict Uplift (CATE)
uplift_score = xl.predict(X)
df["uplift_score"] = uplift_score

print("Training Complete.")

### Evaluation: Qini Curve
The Qini curve measures the cumulative gain of targeting users with the highest estimated uplift.

In [None]:
plot_qini_curve(y_true=Y, uplift=uplift_score, treatment=T_binary)
plt.title("Qini Curve: X-Learner Performance")
plt.show()

## 3. Part 2: Continuous Price Optimization (DML)

**Goal**: Find the exact price $P$ that maximizes revenue for each user.

- **Treatment**: `offered_price` (Continuous)
- **Method**: Double Machine Learning (DML) with `econml`.

We estimate the **Price Elasticity** $\theta(X)$ such that:
$$ \text{Demand}(P) \approx P_0 + \theta(X) \cdot (P - P_{\text{base}}) $$

In [None]:
print("Training DML Model (LinearDML)...")

T_continuous = df["offered_price"]
W_controls = df[["base_popularity", "release_year"]] # Confounders to control for
X_features = X.drop(columns=["base_popularity", "release_year"]) # Effect Modifiers

# Instantiate and Fit
dml = DMLWithEconML(model_type="linear", n_folds=3, random_state=42)
dml.fit(X_features, T_continuous, Y, W=W_controls)

print("DML Model Fitted!")

### Elasticity Analysis
Histogram of estimated elasticities. We expect them to be negative (Price Increase -> Demand Decrease).

In [None]:
elasticities = dml.effect(X_features)
df["predicted_elasticity"] = elasticities

plt.figure(figsize=(10, 5))
sns.histplot(df["predicted_elasticity"], kde=True, bins=30)
plt.title("Distribution of Price Elasticity (DML)")
plt.xlabel("Elasticity (Change in Prob per $)")
plt.axvline(0, color='r', linestyle='--')
plt.show()

### Revenue Optimization
Using the `revenue_optimizer` module, we calculate the optimal price point for a sample of users.

In [None]:
# Optimize for a subset of users
sample_users = X_features.iloc[:1000].copy()
results = bulk_optimize(sample_users, dml)

print("Optimization Results:")
print(results.head())

plt.figure(figsize=(10, 5))
sns.histplot(results["optimal_price"], bins=15)
plt.title("Recommended Prices Distribution")
plt.xlabel("Price ($)")
plt.show()

## Conclusion

We have successfully:
1.  Identified high-uplift users using **X-Learners**.
2.  Estimated price elasticity using **DML**.
3.  Generated optimized price recommendations to maximize revenue.

This pipeline enables modern, causal-driven personalization for VOD.