## Data Analysis from the Data i've recived

#### Things i want to find out
* What was the click-through rate for each version?
* Which version was the winner?
* Do the results seem conclusive?

### Loading libraries&settings

In [28]:
import pandas as pd
from scipy import stats
eniac_a = pd.read_csv('eniac_a.csv')
eniac_b = pd.read_csv('eniac_b.csv')
eniac_c = pd.read_csv('eniac_c.csv') 
eniac_d = pd.read_csv('eniac_d.csv')
pd.set_option('display.max_colwidth', 100) 
pd.set_option('display.max_rows', 100)       
pd.set_option('display.max_columns', 100)

### Eniac A

##### Lets see the structure of DF

In [8]:
eniac_a

Unnamed: 0,Element ID,Tag name,Name,No. clicks,Visible?,Snapshot information
0,48,h1,ENIAC,269,True,Homepage Version A - white SHOP NOW • https://eniac.com/index-a.php
1,25,div,mySidebar,309,True,"created 2021-09-14 • 14 days 0 hours 34 mins • 25326 visits, 23174 clicks"
2,4,a,Mac,279,True,
3,69,a,iPhone,246,True,
4,105,a,Accessories,1235,True,
5,36,a,Chargers & Cables,1261,False,
6,99,a,iPhone Accessories,1226,False,
7,68,a,Watch Accessories,1261,False,
8,13,a,Mac Accessories,1308,False,
9,15,a,AirTag,206,False,


##### So the number of visits = 25326 as we see in column 'Snapshot information' but i'll try to adress that number with code

In [9]:
number_of_visits_a = int(eniac_a.loc[1, 'Snapshot information'].split('•')[2].split('visits')[0].strip())
number_of_visits_a

25326

##### Now i want to see how many clicks where on button "SHOP NOW"

In [10]:
shop_button_clics_a = eniac_a.loc[eniac_a['Name'] == 'SHOP NOW', 'No. clicks'].values[0]

##### Lets see conversion rate

In [11]:
eniac_a_conversion = shop_button_clics_a / number_of_visits_a
eniac_a_conversion

##### Chi-square clicks value: We consider only clicks on a specific element, where 'Click' represents the number of clicks, and 'No-click' is calculated as visits minus clicks.

In [23]:
no_clic_a = number_of_visits_a - shop_button_clics_a
no_clic_a

np.int64(24814)

### Eniac B (SHOP NOW RED BUTTON)

In [14]:
eniac_b

Unnamed: 0,Element ID,Tag name,Name,No. clicks,Visible?,Snapshot information
0,48,h1,ENIAC,236,True,Homepage Version B - red SHOP NOW • https://eniac.com/index-b.php
1,25,div,mySidebar,304,True,"created 2021-10-27 • 14 days 0 hours 34 mins • 24747 visits, 22592 clicks"
2,4,a,Mac,268,True,
3,69,a,iPhone,260,True,
4,105,a,Accessories,1214,True,
5,36,a,Chargers & Cables,1259,False,
6,99,a,iPhone Accessories,1237,False,
7,68,a,Watch Accessories,1221,False,
8,13,a,Mac Accessories,1210,False,
9,15,a,AirTag,195,False,


In [15]:
number_of_visits_b = int(eniac_b.loc[1, 'Snapshot information'].split('•')[2].split('visits')[0].strip())
shop_button_clics_b = eniac_b.loc[eniac_b['Name'] == 'SHOP NOW', 'No. clicks'].values[0]
eniac_b_conversion = shop_button_clics_b / number_of_visits_b
eniac_b_conversion

np.float64(0.01135491170646947)

##### Chi-square clicks value

In [24]:
no_clic_b = number_of_visits_b - shop_button_clics_b
no_clic_b

np.int64(24466)

### Eniac C (SEE DEALS BUTTON)

In [16]:
eniac_c

Unnamed: 0,Element ID,Tag name,Name,No. clicks,Visible?,Snapshot information
0,48,h1,ENIAC,288,True,Homepage Version C - white SEE DEALS • https://eniac.com/index-c.php
1,25,div,mySidebar,283,True,"created 2021-10-27 • 14 days 0 hours 34 mins • 24876 visits, 23031 clicks"
2,4,a,Mac,262,True,
3,69,a,iPhone,234,True,
4,105,a,Accessories,1288,True,
5,36,a,Chargers & Cables,1224,False,
6,99,a,iPhone Accessories,1175,False,
7,68,a,Watch Accessories,1264,False,
8,13,a,Mac Accessories,1203,False,
9,15,a,AirTag,202,False,


In [17]:
number_of_visits_c = int(eniac_c.loc[1, 'Snapshot information'].split('•')[2].split('visits')[0].strip())
shop_button_clics_c = eniac_c.loc[eniac_c['Name'] == 'SEE DEALS', 'No. clicks'].values[0]
eniac_c_conversion = shop_button_clics_c / number_of_visits_c
eniac_c_conversion

np.float64(0.0211850779868146)

##### Chi-square clicks value 

In [25]:
no_clic_c = number_of_visits_c - shop_button_clics_c
no_clic_c

np.int64(24349)

### ENIAC D (SEE DEALS RED BUTTON)

In [18]:
eniac_d

Unnamed: 0,Element ID,Tag name,Name,No. clicks,Visible?,Snapshot information
0,48,h1,ENIAC,285,True,Homepage Version D - red SEE DEALS • https://eniac.com/index-d.php
1,25,div,mySidebar,305,True,"created 2021-10-27 • 14 days 0 hours 34 mins • 25233 visits, 23062 clicks"
2,4,a,Mac,274,True,
3,69,a,iPhone,243,True,
4,105,a,Accessories,1267,True,
5,36,a,Chargers & Cables,1260,False,
6,99,a,iPhone Accessories,1296,False,
7,68,a,Watch Accessories,1252,False,
8,13,a,Mac Accessories,1273,False,
9,15,a,AirTag,201,False,


In [19]:
number_of_visits_d = int(eniac_d.loc[1, 'Snapshot information'].split('•')[2].split('visits')[0].strip())
shop_button_clics_d = eniac_d.loc[eniac_d['Name'] == 'SEE DEALS', 'No. clicks'].values[0]

eniac_d_conversion = shop_button_clics_d / number_of_visits_d
eniac_d_conversion

np.float64(0.007648713985653708)

##### Chi-square clicks value

In [26]:
no_clic_d = number_of_visits_d - shop_button_clics_d
no_clic_d

np.int64(25040)

## Comparing rates

In [21]:
conversion_rate_compare = {
    'rates': [eniac_a_conversion, eniac_b_conversion, eniac_c_conversion, eniac_d_conversion],
    'names': ['Version_A', 'Version_B', 'Version_C', 'Version_D']
}
conversions_compare = pd.DataFrame(conversion_rate_compare)

### sorting values to see which ones have best conversions

In [22]:
conversions_compare.sort_values(by='rates', ascending=False)

Unnamed: 0,rates,names
2,0.021185,Version_C
0,0.020216,Version_A
1,0.011355,Version_B
3,0.007649,Version_D


#### Conclusion

##### It seems like the red variations are the worst performers, while the white buttons perform much better.

![image.png](attachment:image.png)

### Statiscial Analysis

#### Our hypothesis we established in Step1_introduction 
* Null Hypothesis (H0): There is no difference in the click-through rate (CTR) of the "SHOP NOW" button between the current version and any of the new variants (A, B, C, and D)
* Alternative Hypothesis (H1): There is a difference in the click-through rate (CTR) of the "SHOP NOW" button between the current version and at least one of the new variants (A, B, C, and D).

#### I want to performe chi-square test so i'll prepare df with data i established on task before

In [27]:
chi_value_data = {
    'Version_A': [shop_button_clics_a, no_clic_a],
    'Version_B': [shop_button_clics_b, no_clic_b],
    'Version_C': [shop_button_clics_c, no_clic_c],
    'Version_D': [shop_button_clics_d, no_clic_d]
}
indexes = ['Click', 'No-click']
chi_value_df= pd.DataFrame(chi_value_data, index=indexes)
chi_value_df

Unnamed: 0,Version_A,Version_B,Version_C,Version_D
Click,512,281,527,193
No-click,24814,24466,24349,25040


##### Now i'll performe the test

In [48]:
chi2, pvalue, dof, expected = stats.chi2_contingency(chi_value_df)

In [66]:
chi2 = round(chi2, 4)
expected = pd.DataFrame(expected, index=['Click', 'No-click'], columns=chi_value_df.columns).round(2)

# Display the results
print(f"Chi2 statistic: {chi2}")
print(f"p-value: {pvalue}")
print(f"Degrees of freedom: {dof}")
print("Expected frequencies:")
print(expected)

Chi2 statistic: 161.117
p-value: 2.716121660786871e-48
Degrees of freedom: 3
Expected frequencies:
          Version_A  Version_B  Version_C  Version_D
Click        382.49     373.74     375.69     381.08
No-click   24943.51   24373.26   24500.31   24851.92


##### We set in introduction our level to 95% (p-value 0.05) so we check whether the p-value is smaller than 0.05 or not

In [50]:
if pvalue < 0.05:
    print("We reject the null hypothesis.")
else:
    print("We accept the null hypothesis.")

We reject the null hypothesis.


#### Conclusion:
* This just tells us that some version(s) indeed performed better than others. We might be sure that the best version (Version C) performed better than the worst one (Version D), but we cannot be certain that the differences between the two versions with white buttons or two versions with red buttons are significant.
#### *post-hoc test need to be performed*

### Post-Hoc Test (I can clearly see that just version A vs C will not have significant diffrence but i want to check it)

###### For post hoc tests following a Chi-Square, we use what is referred to as the Bonferroni Adjustment. […] this adjustment is used to counteract the problem of Type I Error that occurs when multiple comparisons are made. Following a Chi-Square test that includes an explanatory variable with 3 or more groups, we need to subset to each possible paired comparison. When interpreting these paired comparisons, rather than setting the α-level (p-value) at 0.1*, we divide 0.1* by the number of paired comparisons that we will be making. The result is our new α-level (p-value).

#### preparing data for test

In [54]:
observed = chi_value_df.copy()
observed

Unnamed: 0,Version_A,Version_B,Version_C,Version_D
Click,512,281,527,193
No-click,24814,24466,24349,25040


##### We'll need to perform 6 comprasions
* A vs B
* A vs C
* A vs D
* B vs C
* B vs D
* C vs D

##### New alpha = 0.05/6 = 0.0083

In [76]:
alpha = 0.05/6
alpha

0.008333333333333333

##### I decided to do comprasion manually.

##### Comparing A vs B

In [78]:
ab_df = observed.drop(columns=['Version_C', 'Version_D'])
ab_chi2, ab_pvalue, ab_dof, ab_expected = chi2_contingency(ab_df)
if ab_pvalue < alpha:
    print("The difference between Version A and Version B is statistically significant.")
else:
    print("The difference between Version A and Version B is not statistically significant.")


The difference between Version A and Version B is statistically significant.


##### Comparing A vs C

In [80]:
ac_df = observed.drop(columns=['Version_B', 'Version_D'])
ac_chi2, ac_pvalue, ac_dof, ac_expected = chi2_contingency(ac_df)
if ac_pvalue < alpha:
    print("The difference between Version A and Version C is statistically significant.")
else:
    print("The difference between Version A and Version C is not statistically significant. IMPORTANT!")

The difference between Version A and Version C is not statistically significant. IMPORTANT


##### Comparing A vs D

In [82]:
ad_df = observed.drop(columns=['Version_B', 'Version_C'])
ad_chi2, ad_pvalue, ad_dof, ad_expected = chi2_contingency(ad_df)
if ad_pvalue < alpha:
    print("The difference between Version A and Version D is statistically significant.")
else:
    print("The difference between Version A and Version D is not statistically significant. IMPORTANT!")

The difference between Version A and Version D is statistically significant.


##### Comparing B vs C

In [83]:
bc_df = observed.drop(columns=['Version_A', 'Version_D'])
bc_chi2, bc_pvalue, bc_dof, bc_expected = chi2_contingency(bc_df)
if bc_pvalue < alpha:
    print("The difference between Version B and Version C is statistically significant.")
else:
    print("The difference between Version B and Version C is not statistically significant. IMPORTANT!")

The difference between Version B and Version C is statistically significant.


##### Comparing B vs D

In [84]:
bd_df = observed.drop(columns=['Version_A', 'Version_C'])
bd_chi2, bd_pvalue, bd_dof, bd_expected = chi2_contingency(bd_df)
if bd_pvalue < alpha:
    print("The difference between Version B and Version D is statistically significant.")
else:
    print("The difference between Version B and Version D is not statistically significant. IMPORTANT!")

The difference between Version B and Version D is statistically significant.


##### Comparing C vs D

In [85]:
cd_df = observed.drop(columns=['Version_A', 'Version_B'])
cd_chi2, cd_pvalue, cd_dof, cd_expected = chi2_contingency(cd_df)
if cd_pvalue < alpha:
    print("The difference between Version C and Version D is statistically significant.")
else:
    print("The difference between Version C and Version D is not statistically significant. IMPORTANT!")

The difference between Version C and Version D is statistically significant.


#### As i expected there is no significant difference between Version A and Version C after applying the Bonferroni correction.

### Additional Data recived from Eniac team
![image.png](attachment:image.png)

### Summary

#### Version A: The "SHOP NOW", this version has a high drop-off rate (~10%) and the highest homepage-return rate (~5%). This suggests that users might be clicking "SHOP NOW" but not finding what they expect, causing them to return to the homepage frequently.

#### Version C: The "SEE DEALS" this version also has the higher drop-off rate (~12%). This indicates that while users are engaging with the CTA, they might be leaving the site soon after, possibly due to not finding relevant content or offers after clicking.

#### There's need to investigate further to establish with high confidence which version is better. The solution can be tracking more additional data for example
* Time on Page: The average time users spend on the page after clicking the CTA
* User Navigation Path: To see users behaviour
* Conversion Rate: The percentage of users who complete a desired action (e.g., purchase) after clicking the CTA.
* Bounce Rate: The percentage of single-page sessions (i.e., sessions in which the person left the site from the landing page without interacting with the page).