# 4.5 A/B Testing in Marketing Case Study

## 4.5.3 Data Collection and Preparation

Load and prepare the A/B test data:

```python
import numpy as np
import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt
from scipy import stats

# Load the data
df = pd.read_csv('marketing_AB.csv')

# Data cleaning
df = df.drop(["Unnamed: 0"], axis=1)  

# Clean column names by removing leading/trailing spaces and replacing spaces with underscores
df.rename(columns=lambda x: x.strip().replace(" ", "_"), inplace=True)

# Create a new column 'converted_int' to represent conversions as 1 (True) or 0 (False)
df["converted_int"] = df['converted'].apply(lambda x: 1 if x else 0)

# Print information about the DataFrame
print(f'Rows: {df.shape[0]}')             
print(f'Columns: {df.shape[1]}')          
print(f'Missing Values: {df.isnull().values.sum()}') 
print(f'Unique Values: \n{df.nunique()}')  
```

## 4.5.4 Statistical Analysis of Results
### 4.5.4.1 Descriptive Statistics

Calculate basic statistics and conversion rates:

```python
print(df.describe())

# Overall conversion rate
print(f"Overall conversion rate: {df['converted'].mean():.2%}")

# Conversion rate by group
conversion_rates = df.groupby('test_group')['converted'].mean()
print("\nConversion rates by group:")
print(conversion_rates)
```

### 4.5.4.2 Hypothesis Testing

Perform Student's t-test and chi-square test:

```python
# Student's t-test
ad_group = df[df['test_group'] == 'ad']['converted_int']
psa_group = df[df['test_group'] == 'psa']['converted_int']

t_statistic, p_value = stats.ttest_ind(ad_group, psa_group)
print("Student's t-test:")
print(f"t-statistic: {t_statistic:.4f}")
print(f"p-value: {p_value:.4f}")

# Chi-square test
contingency_table = pd.crosstab(df['test_group'], df['converted'])
chi2, p_value, dof, expected = stats.chi2_contingency(contingency_table)
print("\nChi-squared test:")
print(f"Chi-squared statistic: {chi2:.4f}")
print(f"p-value: {p_value:.4f}")
```

### 4.5.4.3 Effect Size Calculation

Calculate absolute and relative differences in conversion rates:

```python
diff_abs = conversion_rates['ad'] - conversion_rates['psa']
diff_rel = (conversion_rates['ad'] - conversion_rates['psa']) / conversion_rates['psa']

print(f"Absolute difference in conversion rates: {diff_abs:.2%}")
print(f"Relative difference in conversion rates: {diff_rel:.2%}")
```

## 4.5.5 Results Visualization

Create visualizations to illustrate the results:

```python
plt.figure(figsize=(10, 6))
sns.barplot(x='test_group', y='converted', data=df)
plt.title('Conversion Rate by Group')
plt.ylabel('Conversion Rate')
plt.show()

# Distribution of conversions by day and group
plt.figure(figsize=(12, 6))
sns.countplot(data=df, x='most_ads_day', hue='test_group')
plt.title('Distribution of Conversions by Day and Group')
plt.xlabel('Day with Most Ads')
plt.ylabel('Number of Conversions')
plt.xticks(rotation=45)
plt.legend(title='Test Group')
plt.show()
```