A/B testing, also known as split testing, is a method used to compare two versions of a web page, app feature, email, or other marketing asset to determine which one performs better. It involves dividing your audience into two groups:

- <b>Group A (Control Group)<b>: This group sees the original version.
- <b>Group B (Treatment Group)<b>: This group sees the modified version.

# Two sample mean testing (unpaired) -1

A custom exhaust manufacturing company want to compare the fuel economy after upgrading motorcycles from stock to custom exhaust. The test were run for 2 different batches i.e. BS4 and BS6 as custom exhaust is compatible with both BS4 and BS6 motorcycles. Both the batches have all different models of bikes as changes has to be compared across all models variants with old BS4 and newer BS6. To track if the custom exhaust affects the fuel economy of the BS4 and BS6 motorcycles differently, you decided to conduct a few hypothesis tests.

- Dataset to use: `unpaired-exhaust.csv`

### Importing the necessary libraries

In [1]:
import pandas as pd
import scipy.stats as ss

In [3]:
## Load the dataset
twosampup_df = pd.read_csv("unpaired-exhaust.csv")

In [4]:
## Check the data for BS4
twosampup_df.head()

Unnamed: 0,Bike ID,Fuel Economy,Batch
0,A1,43,BS4
1,A2,52,BS4
2,A3,58,BS4
3,A4,44,BS4
4,A5,48,BS4


In [5]:
twosampup_df.shape

(48, 3)

In [6]:
## Check the data for BS6
twosampup_df.tail()

Unnamed: 0,Bike ID,Fuel Economy,Batch
43,B20,45,BS6
44,B21,54,BS6
45,B22,48,BS6
46,B23,62,BS6
47,B24,40,BS6


### Subset the scores of each class to separate series

In [7]:
bs4 = twosampup_df[twosampup_df['Batch']=="BS4"]['Fuel Economy']
bs4.mean()

52.916666666666664

In [8]:
bs6 = twosampup_df[twosampup_df['Batch']=="BS6"]['Fuel Economy']
bs6.mean()

51.375

- H0:mean(bs4)=mean(bs6)
- H1:mean(bs4)!=mean(bs6)

There's a slight difference in the fuel economy. BS6 vehicles have got lesser fuel economy than that of BS4 with custom exhaust

In [9]:
ss.ttest_ind(bs4,bs6, alternative = "two-sided")

Ttest_indResult(statistic=0.7087914510636509, pvalue=0.482030754068267)

**Inferences**
- As you can see the p-value (~0.48) is much higher than 0.05, and therefore the results are ***not*** statistically significant.
- Hence, you can say that there's NO evidence to suggest that there is a difference in the Fuel Economy between the motorcycles of BS4 and BS6 class.

# Two sample mean testing (unpaired) -2

A Bolt & Nut manufacturing company want to find out if two production lines produce M5 nuts of the same weight. To track if their is any difference in weights of the M5 nuts manufactured in both lines, you decided to conduct a few hypothesis tests.

- Dataset to use: `unpaired-nuts.csv`

In [11]:
## Load the dataset
twosampup_df = pd.read_csv("unpaired-nuts.csv")

In [12]:
twosampup_df.head()

Unnamed: 0,Serial No,weight in grams,Production line
0,A1,2.18,A
1,A2,2.17,A
2,A3,2.14,A
3,A4,2.18,A
4,A5,2.15,A


In [13]:
twosampup_df.tail()

Unnamed: 0,Serial No,weight in grams,Production line
43,B20,2.11,B
44,B21,2.32,B
45,B22,2.22,B
46,B23,2.35,B
47,B24,2.29,B


### Subset the scores of each class to separate series

In [14]:
line_A = twosampup_df[twosampup_df['Production line']=="A"]['weight in grams']
line_A.mean()

2.1808333333333336

In [15]:
line_B = twosampup_df[twosampup_df['Production line']=="B"]['weight in grams']
line_B.mean()

2.276666666666667

There's a slight difference in the Bolt's weight. Line A Bolts have got lesser weight than that of Line B.

- H0:mean(A)=mean(B)
- H1:mean(A)!=mean(B)

### Pass these arguments to the with unpaired method

In [16]:
ss.ttest_ind(line_A,line_B, alternative = "two-sided")

Ttest_indResult(statistic=-3.2050530416037435, pvalue=0.0024556820555403586)

**Inferences**
- As you can see the p-value (~0.0024) is much lesser than 0.05, and therefore the results are  ***statistically significant***.
- Hence, you can say that there's ENOUGH evidence to suggest that there is a difference in the weights between the bolts manufactures in Line A and Line B class.

# Two sample mean testing (paired) -1

Basket ball coach want to know whether or not a certain training program is able to increase the max vertical jump (in inches) of college basketball players. To test this, we may recruit a sample of 25 college basketball players and measure each of their max vertical jumps. Then, we may have each player use the training program for one month and then measure their max vertical jump again at the end of the month. To track if their is any difference in jump heights of the players after training program, you decided to conduct a few hypothesis tests.

- Dataset to use: `paired-jump.csv`

In [17]:
##Let's load the dataset
twosampp_df = pd.read_csv("paired_jump.csv")

In [18]:
twosampp_df.head()

Unnamed: 0,Player,Jump before training,Jump after training
0,Player 1,22,24
1,Player 2,20,22
2,Player 3,19,19
3,Player 4,24,22
4,Player 5,25,28


In [19]:
### Average Jump height before training
twosampp_df['Jump before training'].mean()

22.72

In [20]:
### Average Jump height after training
twosampp_df['Jump after training'].mean()

23.8

The null hypothesis is that, on average, the Jump height will be similar for both before and after training program. Our task is to determine, whether there's enough statistical evidence to suggest otherwise

- H0: Mean(Before Training)=Mean(After Training)
- H1: Mean(Before Training)!=Mean(After Training)

Here again, we see a slight difference in Jump height, with the height after training being better on average by ~1 inch as compared to the height before training. Let's evaluate if this difference is statistically significant

In [21]:
ss.ttest_rel(twosampp_df['Jump before training'],twosampp_df['Jump after training'], alternative = "two-sided")

TtestResult(statistic=-3.2602767700386956, pvalue=0.0033180017066275855, df=24)

**Inferences**
- As you can see the p-value (~0.0033) is much lesser than 0.05, and therefore the results are ***statistically significant.***
- Hence, you can say that there's ENOUGH evidence to suggest that there is a difference in the Jump heights after the training program.

# Two sample mean testing (paired) -2

Suppose chemical engineers wish to compare the fuel economy obtained by two different formulations of gasoline. Since fuel economy varies widely from car to car, if the mean fuel economy of two independent
samples of vehicles run on the two types of fuel were compared, even if one formulation were better than the other the large variability from vehicle to vehicle might make any difference arising from difference in fuel difficult to detect. To track if different gasoline causes any difference in fuel economy of the car, you decided to conduct a few hypothesis tests.

- Dataset to use: `paired-car.csv`

In [23]:
##Let's load the dataset
twosampp_df = pd.read_csv("paired_car.csv")

In [24]:
twosampp_df.head()

Unnamed: 0,Car model,Car 1,Car 2
0,Volvo XC 60,17.0,17.0
1,Dodge Viper,13.2,12.9
2,Honda CR-Z,35.3,35.4
3,Hummer H3,13.6,13.2
4,Lexus RX,32.7,32.5


In [25]:
twosampp_df['Car 1'].mean()

21.62222222222222

In [26]:
twosampp_df['Car 2'].mean()

21.477777777777778

In [27]:
ss.ttest_rel(twosampp_df['Car 1'],twosampp_df['Car 2'], alternative = "two-sided")

TtestResult(statistic=2.5999999999999988, pvalue=0.03161781156452984, df=8)

**Inferences**
- As you can see the p-value (~0.03161) is much lesser than 0.05, and therefore the results are ***statistically significant.***
- Hence, you can say that there's ENOUGH evidence to suggest that there is a difference in the Fuel economy after using different formulation of Gasoline