## **<div style = "text-align:center"><span style = "color:#8502d1"><b>Marketing Campaign Impact</b></span></div>**

<a id="import"></a>
# <p style="background-color:#8502d1; font-family:calibri; color:white; font-size:100%; text-align:center; border-radius:15px 50px;">Marketing A/B Testing</p>


<br>

<!-- <div style="text-align:center;">
<img  src="air.png">
</div> -->




<div class="list-group" id="list-tab" role="tablist">
<p style="background-color:#4B0082;font-family:newtimeroman;color:#FFF9ED;font-size:150%;text-align:center;border-radius:10px 10px;">

About Dataset

Marketing A/B testing dataset Marketing companies want to run successful campaigns, but the market is complex and several options can work. So normally they tun A/B tests, that is a randomized experimentation process wherein two or more versions of a variable (web page, page element, banner, etc.) are shown to different segments of people at the same time to determine which version leaves the maximum impact and drive business metrics.

The companies are interested in answering two questions:

Would the campaign be successful?
If the campaign was successful, how much of that success could be attributed to the ads?
With the second question in mind, we normally do an A/B test. The majority of the people will be exposed to ads (the experimental group). And a small portion of people (the control group) would instead see a Public Service Announcement (PSA) (or nothing) in the exact size and place the ad would normally be.

The idea of the dataset is to analyze the groups, find if the ads were successful, how much the company can make from the ads, and if the difference between the groups is statistically significant.





<div style="align:center">


Data dictionary:

Index: Row index

user id: User ID (unique)

test group: If "ad" the person saw the advertisement, if "psa" they only saw the public service announcement

converted: If a person bought the product then True, else is False
total ads: Amount of ads seen by person

most ads day: Day that the person saw the biggest amount of ads

most ads hour: Hour of day that the person saw the biggest amount of ads
v

</div>

</p>   

    

<a id="import"></a>
# <p style="background-color:#8502d1; font-family:calibri; color:white; font-size:100%; text-align:center; border-radius:15px 50px;">Step 1 | Import Libraries</p>

In [None]:
import numpy as np
import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt
from scipy.stats import chi2_contingency
from scipy import stats

<a id="read"></a>
# <p style="background-color:#8502d1; font-family:calibri; color:white; font-size:100%; text-align:center; border-radius:15px 50px;">Step 2 | Read Dataset</p>

In [None]:
df = pd.read_csv('/kaggle/input/marketing-ab/marketing_AB.csv')

<a id="overview"></a>
# <p style="background-color:#8502d1; font-family:calibri; color:white; font-size:100%; text-align:center; border-radius:15px 50px;">Step 3 | Dataset Overview</p>

In [None]:
df.head()

In [None]:
df.tail()

In [None]:
df.info()

In [None]:
df.describe()

In [None]:
df.shape

<a id="overview"></a>
# <p style="background-color:#8502d1; font-family:calibri; color:white; font-size:100%; text-align:center; border-radius:15px 50px;">Step 4 | Data Cleaning</p>

In [None]:
df.drop('Unnamed: 0' , axis=1 , inplace=True)

In [None]:
df.head()




<div class="alert alert-block alert-info ;"> 📌  The Unnamed:0 column got dropped, but the current column names have space in between, which may add difficulty later. So it is better to rename the 'user id ' to 'user_id' format. Since most column names have to be adjusted, I'm going to use lambda function. </div>



In [None]:
df.rename(columns= lambda x : x.replace(' ' , '_') , inplace=True)
df.head()

In [None]:
df.isnull().any()

In [None]:
df.isnull().sum()

<a id="eda"></a>
# <p style="background-color:#8502d1; font-family:calibri; color:white; font-size:100%; text-align:center; border-radius:15px 50px;">Step 5 | EDA</p>

In [None]:
df['test_group'].value_counts()

In [None]:
df.groupby('test_group')['converted'].value_counts()

In [None]:
df.groupby(['test_group' , 'converted']).count()['user_id']

In [None]:
df.groupby(['test_group' , 'converted']).count()['user_id'].unstack('converted')

In [None]:
convertedgroup = df.groupby('test_group').sum()['converted'].plot.pie(figsize=(7,7) , autopct='%1.0f%%')
plt.show();

In [None]:
ax = df.groupby(['test_group' , 'converted']).count()['user_id'].unstack('converted').plot(
    kind='bar',
    figsize=(10,8),
    grid=True
)

ax.tick_params(axis='x' , labelsize=15)
plt.xticks(rotation='horizontal')
ax.tick_params(axis='y' , labelsize=15)
ax.set_ylabel('Count')

plt.show()

In [None]:
ax = df.groupby(by=['most_ads_day' , 'test_group']).sum()['converted'].unstack('test_group').plot(
    kind='bar',
    figsize=(12,10),
    grid = True , 
    stacked=True
)

ax.set_ylabel('converted')
ax.set_title('Most ad days and converted')
plt.show()

In [None]:
ax = df.groupby(by=['most_ads_hour' , 'test_group']).sum()['converted'].unstack('test_group').plot(
    kind='bar',
    figsize=(12,10),
    grid = True , 
    stacked=True
)

ax.set_ylabel('converted')
ax.set_title('Most ad days and converted')
plt.show()

<a id="eda"></a>
# <p style="background-color:#8502d1; font-family:calibri; color:white; font-size:100%; text-align:center; border-radius:15px 50px;"> Normal</p>

In [None]:
fig = plt.figure(figsize=(13,7))
ax = sns.kdeplot(df.loc[(df['test_group'] == 'ad'), 'converted'], color='b', fill=True, label='Seen ad')
ax = sns.kdeplot(df.loc[(df['test_group'] == 'psa'), 'converted'], color='r', fill=True, label='Not seen')
ax.set(xlabel='Test group', ylabel='Frequency')
plt.title('Test group VS. Converted')
plt.legend();

# Bootstrap analysis

In [None]:
#subset the original dataframe
ad_experimental=df[df['test_group']=='ad']
psa_control=df[df['test_group']=='psa']

#find the average converted rate of each group
ad_converted=ad_experimental['converted'].mean()
psa_converted=psa_control['converted'].mean()

In [None]:
#create an empty list to store bootstrapped means
boot_ad=[]

#Set up a loop that will iterate 1000 times. In each iteration, a new bootstrapped sample will be generated
for i in range (1000):
    boot_mean=ad_experimental.sample(frac=1,replace=True)['converted'].mean()
    boot_ad.append(boot_mean)
    

boot_ad=pd.DataFrame(boot_ad)

#Create a density plot of the bootstrapped means
boot_ad.plot(kind='density');

In [None]:
boot_psa=[]

for i in range(1000):
    boot_mean=psa_control.sample(frac=1,replace=True)['converted'].mean()
    boot_psa.append(boot_mean)
    
boot_psa=pd.DataFrame(boot_psa)
boot_psa.plot(kind='density');

From the density plot,we can clearly see the sampling distribution of the mean (obtained through the bootstrap resampling process) is approximately normal. It allows for the application of statistical tests and confidence intervals that rely on the assumption of normality. For example, we can perform hypothesis testing or construct confidence intervals through using methods like the t-test or z-test based on the assumption of normality.

<a id="eda"></a>
# <p style="background-color:#8502d1; font-family:calibri; color:white; font-size:100%; text-align:center; border-radius:15px 50px;">Code Base Test</p>

In [None]:
df['converted'].mean()

In [None]:
ads = df.query('test_group == "ad"')
psa = df.query('test_group == "psa"') 

In [None]:
ads['converted'].mean() * 100

In [None]:
ads['converted'].mean() * 100


<div class="alert alert-block alert-info ;"> 📌 According to the results of advertising, it has increased sales. Now we have to check whether this increase is statistically significant or not? </div>

In [None]:
ads = df.query('test_group == "ad"') 
psa = df.query('test_group == "psa"')


ads_ab = np.random.binomial(len(ads), df.converted.mean(), 10000) / len(ads)
psa_ab = np.random.binomial(len(psa), df.converted.mean(), 10000) / len(psa)


p_diffs = ads_ab - psa_ab

In [None]:
ab_data_diff = ads['converted'].mean() - psa['converted'].mean()
ab_data_diff

In [None]:
(p_diffs > ab_data_diff).mean() * 100  

In [None]:
(p_diffs < ab_data_diff).mean() * 100

In [None]:
plt.hist(p_diffs, bins=100)
low = ab_data_diff
higth = p_diffs.mean()
plt.axvline(x=low, color='g')
plt.axvline(x=higth, color='r');

<a id="eda"></a>
# <p style="background-color:#8502d1; font-family:calibri; color:white; font-size:100%; text-align:center; border-radius:15px 50px;">T-Test</p>

In [None]:
import pandas as pd
import numpy as np
import scipy.stats as stats

def ttest(data1, data2):
  """Performs a t-test on two sets of data."""
  t_statistic, p_value = stats.ttest_ind(data1, data2)
  return t_statistic, p_value

def main():

    # Create two groups of users
    ad_group = df[df["test_group"] == "ad"]
    psa_group = df[df["test_group"] == "psa"]

    # Calculate the mean number of ads seen for each group
    ad_mean = ad_group["converted"].mean()
    psa_mean = psa_group["converted"].mean()

    # Perform a t-test to compare the means of the two groups
    t_statistic, p_value = ttest(ad_group["converted"], psa_group["converted"])

    # Print the results
    print("T-statistic:", t_statistic)
    print("P-value:", p_value)
    if p_value < 0.05:
        print("There is a statistically significant difference in the conversion rates of the two groups.")
    else:
        print("There is no statistically significant difference in the conversion rates of the two groups.")

if __name__ == "__main__":
  main()


<a id="eda"></a>
# <p style="background-color:#8502d1; font-family:calibri; color:white; font-size:100%; text-align:center; border-radius:15px 50px;">Chi Test</p>

In [None]:

# Create two groups of users
ad_group = df[df["test_group"] == "ad"]
psa_group = df[df["test_group"] == "psa"]

# Calculate the conversion rate for each group
# نرخ تبدیل = (تعداد مشتریان که عمل خاص را انجام داده‌اند / تعداد کل مشتریان یا بازدیدکنندگان) × 100

ad_conversion_rate = len(ad_group[ad_group["converted"] == 1]) / len(ad_group)
psa_conversion_rate = len(psa_group[psa_group["converted"] == 1]) / len(psa_group)

# Calculate the chi-squared statistic
contingency_table = pd.crosstab(df["test_group"], df["converted"])
chi2, p_value, degrees_of_freedom, expected_values = chi2_contingency(contingency_table)

# Print the results
print("Ad conversion rate:", ad_conversion_rate)
print("PSA conversion rate:", psa_conversion_rate)
print("Chi-squared statistic:", chi2)
print("P-value:", p_value)

# If the p-value is less than 0.05, then we can conclude that there is a statistically significant difference in the conversion rates of the two groups.
if p_value < 0.05:
    print("There is a statistically significant difference in the conversion rates of the two groups.")
else:
    print("There is no statistically significant difference in the conversion rates of the two groups.")

The chi-squared statistic of 54.005823883685245 is much larger than the critical value of 3.841, which is the cutoff for significance at the 0.05 level. This means that the difference between the two conversion rates is very likely to be real and not due to chance.

The p-value of 1.9989623063390075e-13 is also very small. This means that there is a very small probability that the difference between the two conversion rates could have occurred by chance.

In other words, the chi-squared statistic and p-value suggest that there is strong evidence to conclude that there is a significant difference between the ad conversion rate and the PSA conversion rate.

In your case, the chi-squared statistic is greater than the critical value and the p-value is much less than 0.05, so you can conclude that the difference between the two conversion rates is significant at the 0.05 level.

<a id="eda"></a>
# <p style="background-color:#8502d1; font-family:calibri; color:white; font-size:100%; text-align:center; border-radius:15px 50px;">End</p>