## Problem Statement

An e-commerce company is evaluating two different website designs to see which one results in higher customer engagement. Design A is the current design, while Design B incorporates new features aimed at improving user experience. The company hypothesizes that Design B will lead to a higher average time spent on the website by users.

**Datasets:**
- current_design.csv: Contains data for user interactions with the current website design (Design A), with columns user_id and time_spent_minutes.
- new_design.csv: Contains data for user interactions with the new website design (Design B), with columns user_id and time_spent_minutes.

**Objective:**
- To determine whether Design B results in a higher average time spent on the website compared to Design A.

**Steps to perform:**
- Set the null and alternate hypothesis for this analysis.
- Load the datasets current_design.csv and new_design.csv.
- Calculate the mean and standard deviation of the time spent for both designs.
- Determine the sizes of both groups.
- Calculate the z-score to compare the means of both groups.
- Set the significance level (alpha) at 5% for a right-tailed test.
- Calculate the critical z-value for the right-tailed test at the 5% significance level.
- Compare the calculated z-score with the critical z-value to decide whether to reject the null hypothesis.
- Write down your observations in the end.

**Import Necessary Libraries**

In [4]:
import pandas as pd 
import numpy as np
from scipy import stats

**Define hypothesis**



In [None]:
Null hypothesis => Design B does not result in a higher average time spent on the website compared to Design A.
ALternative hypothess = Design B results in a higher average time spent on the website compared to Design A.

**1: Load the datasets**

In [7]:
current = pd.read_csv("current_design.csv")
current.tail()

Unnamed: 0,user_id,time_spent_minutes
95,C096,6.41
96,C097,6.93
97,C098,6.77
98,C099,5.1
99,C100,5.02


In [9]:
new = pd.read_csv("new_design.csv")
new.head()

Unnamed: 0,user_id,time_spent_minutes
0,T001,7.49
1,T002,7.37
2,T003,7.32
3,T004,6.85
4,T005,7.1


**2: Calculate the mean and standard deviation of the time spent for both designs.**

In [22]:
#control statistics
current_sample_mean = current.time_spent_minutes.mean()
current_std = current.time_spent_minutes.std()
current_shape = current.shape[0]

current_sample_mean , current_std , current_shape


(np.float64(6.015199999999998), np.float64(0.6182550877553322), 100)

In [23]:
#test statistics
new_sample_mean = new.time_spent_minutes.mean()
new_std = new.time_spent_minutes.std()
new_shape = new.shape[0]

new_sample_mean , new_std , new_shape


(np.float64(8.062599999999998), np.float64(0.9025257711981236), 100)

In [32]:
A = current_std**2/current_shape
B = new_std**2/new_shape
z_score = (new_sample_mean - current_sample_mean )/np.sqrt(A+B)
z_score

np.float64(18.715151117476786)

**3: Test using rejection region (i.e. critical z value)**

In [33]:
Alpha = 0.05
z_critical = stats.norm.ppf(1-Alpha)
z_critical

np.float64(1.6448536269514722)

In [34]:
z_score > z_critical

np.True_

### Observations and Conclusion



z_score > z_critical is given True and z_score is greater than z_critical so we reject the null hypothesis 

p_value 

In [35]:
p_value = 1 - stats.norm.cdf(z_score)
p_value

np.float64(0.0)

In [36]:
p_value < Alpha 

np.True_

p_value is not less than Alpha so we reject null hypothesis 