# A/B Testing Analysis: New vs Old Landing Page

## 1. Introduction
**Objective:** Determine if a new landing page performs better than the current one in terms of user conversions.

We perform an A/B test to compare the conversion rates of a treatment group (new page) and a control group (old page).

In [None]:
import pandas as pd
import numpy as np
from scipy import stats
import matplotlib.pyplot as plt
import seaborn as sns

## 2. Load Dataset

In [None]:
df = pd.read_csv('/mnt/data/ab_data.csv')
df.head()

## 3. Data Cleaning
- Remove rows with mismatched group and landing page
- Remove duplicate user IDs

In [None]:
ab_clean = df.query(
    '(group == "treatment" and landing_page == "new_page") or (group == "control" and landing_page == "old_page")'
)

# Remove duplicates
ab_clean = ab_clean.drop_duplicates(subset='user_id')
ab_clean.shape

## 4. Conversion Rate Analysis

In [None]:
summary = ab_clean.groupby('group')['converted'].agg(['mean', 'count', 'sum']).reset_index()
summary.columns = ['Group', 'Conversion Rate', 'Total Users', 'Total Conversions']
plt.figure(figsize=(6,4))
sns.barplot(x='Group', y='Conversion Rate', data=summary)
plt.title("Conversion Rate by Group")
plt.ylim(0, 0.15)
plt.grid(axis='y')
plt.show()
summary

## 5. Hypothesis Testing (Z-Test for Proportions)

In [None]:
# Control values
conv_c = 17489
n_c = 145274

# Treatment values
conv_t = 17264
n_t = 145310

p_pool = (conv_c + conv_t) / (n_c + n_t)
se = np.sqrt(p_pool * (1 - p_pool) * (1/n_c + 1/n_t))
z = (conv_t/n_t - conv_c/n_c) / se
p = stats.norm.sf(abs(z)) * 2
z, p

## 6. Conclusion
- **Z-score:** ~-1.31
- **P-value:** ~0.19

**Result:** The difference in conversion rates is not statistically significant. We cannot conclude that the new page performs better.