In [2]:
import numpy as np
import pandas as pd

import statsmodels.api as sm
from statsmodels.formula.api import ols
from scipy.stats import ttest_ind, kstest

# Q1. Dosage and Gender

A researcher is interested in determining the effects of different dosages of a dietary supplement on the performance of both males and females on a physical endurance test.

The three different dosages of the medicine are low, medium, and high, and the genders are male and female.

Data: [Dataset](./dosages.csv)

Sample data:

![](https://d2beiqkhq929f0.cloudfront.net/public_assets/assets/000/059/314/original/Screenshot_2023-12-13_at_5.54.59_PM.png?1702470320)

Conduct an appropriate hypothesis test to determine the interaction effects of the test at a 1% significance level.

In [4]:
df = pd.read_csv('./dosages.csv')
df.head()

Unnamed: 0,Dietary,Supplement_Dosage,Test_values
0,Female,Low,35.6
1,Female,Medium,49.4
2,Female,High,55.2
3,Male,Low,92.2
4,Male,Medium,45.4


In [6]:
# Two Way Anova

# fit an ols model on the data frame
# use 'fit()' to fit the linear model
# ols('dependent variable ~ C(independent variable1) * C(independent variable2)', data=df).fit()
test = ols('Test_values ~ C(Dietary) * C(Supplement_Dosage)', data=df).fit()

# create a table for a 2-way ANOVA test
# Pass the linear model 'test'
# 'typ = 2' performs two-way ANOVA
anova_table = sm.stats.anova_lm(test, typ = 2)

# Display the results
# Significance level is 0.01
print(anova_table)

                                      sum_sq   df         F    PR(>F)
C(Dietary)                        532.000833  1.0  1.075214  0.339742
C(Supplement_Dosage)              130.811667  2.0  0.132190  0.878657
C(Dietary):C(Supplement_Dosage)  2869.201667  2.0  2.899438  0.131502
Residual                         2968.715000  6.0       NaN       NaN


# Q2. Fertiliser and Watering Frequency
A researcher wants to investigate the effects of two different fertilizers (‘A’ & ‘B’) and three watering frequencies (‘Low’, ‘Medium’, ‘High’) on the growth of tomato plants.

Is there a significant interaction between the fertilizer type and watering frequency on plant growth?

Dataset: [Data](./Fertilizer.csv)

Sample data:

![](https://d2beiqkhq929f0.cloudfront.net/public_assets/assets/000/059/317/original/Screenshot_2023-12-13_at_6.05.51_PM.png?1702470963)

Conduct an appropriate hypothesis test to determine only the interaction effects of this research at a 5% significance level.


In [8]:
path = './Fertilizer.csv'
df = pd.read_csv(path)
df.head()

Unnamed: 0,Fertilizer,Watering_Frequency,Plant_Height
0,A,Low,15.2
1,A,Medium,20.7
2,A,High,24.3
3,B,Low,18.4
4,B,Medium,23.1


In [9]:
# Two Way Anova

# fit an ols model on the data frame
# use 'fit()' to fit the linear model
# ols('dependent variable ~ C(independent variable1) * C(independent variable2)', data=df).fit()
test = ols('Plant_Height ~ C(Fertilizer) * C(Watering_Frequency)', data=df).fit()

# create a table for a 2-way ANOVA test
# Pass the linear model 'test'
# 'typ = 2' performs two-way ANOVA
anova_table = sm.stats.anova_lm(test, typ = 2)

# Display the results
# Significance level is 0.05
print(anova_table)

                                         sum_sq    df         F    PR(>F)
C(Fertilizer)                         84.067222   1.0  1.509060  0.242831
C(Watering_Frequency)                100.634444   2.0  0.903226  0.431117
C(Fertilizer):C(Watering_Frequency)   50.034444   2.0  0.449075  0.648519
Residual                             668.500000  12.0       NaN       NaN


# Q3. Education vs Test

The Committee head of a national entrance exam wants to analyze if there are any differences in learning outcomes between students with different educational backgrounds (high school or college) and teaching methods (traditional or interactive) on test scores.

Data: [Dataset](./Teaching_Method.csv)

Sample data:

![](https://d2beiqkhq929f0.cloudfront.net/public_assets/assets/000/059/323/original/Screenshot_2023-12-13_at_6.21.09_PM.png?1702471882)

Conduct an appropriate hypothesis test to determine the main effects & interaction effects of the test at a 5% significance level.

In [11]:
df = pd.read_csv('./Teaching_Method.csv')
df.head()

Unnamed: 0,Education,Teaching_Method,Test_Score
0,High School,Traditional,72
1,High School,Interactive,85
2,College,Traditional,70
3,College,Interactive,92
4,High School,Traditional,74


In [12]:
# Two Way Anova

# fit an ols model on the data frame
# use 'fit()' to fit the linear model
# ols('dependent variable ~ C(independent variable1) * C(independent variable2)', data=df).fit()
test = ols('Test_Score ~ C(Education) * C(Teaching_Method)', data=df).fit()

# create a table for a 2-way ANOVA test
# Pass the linear model 'test'
# 'typ = 2' performs two-way ANOVA
anova_table = sm.stats.anova_lm(test, typ = 2)

# Display the results
# Significance level is 0.05
print(anova_table)

                                 sum_sq    df         F    PR(>F)
C(Education)                       6.25   1.0  0.081477  0.780172
C(Teaching_Method)               552.25   1.0  7.199348  0.019920
C(Education):C(Teaching_Method)    4.00   1.0  0.052146  0.823216
Residual                         920.50  12.0       NaN       NaN


# Q4. Sales Strategy

A data analyst is comparing the sales amounts (in dollars) for two different marketing strategies (A and B). The sales data for 20 days under each strategy is collected.

```javascript
sales_strategy_A = [156, 153, 157, 154, 156, 159, 152, 156, 157, 154, 153, 157, 157,152, 155, 154, 151, 157, 155, 151]

sales_strategy_B = [135, 147, 126, 136, 158, 139, 163, 141, 156, 142, 130, 129, 161, 158, 117, 151, 121, 135, 123, 153]
```

Perform an appropriate test to assess if there is a significant difference in the sales distributions between Strategy A and Strategy B. Use a significance level of 0.05.

In [17]:
sales_strategy_A = [156, 153, 157, 154, 156, 159, 152, 156, 157, 154, 153, 157, 157,152, 155, 154, 151, 157, 155, 151]
sales_strategy_B = [135, 147, 126, 136, 158, 139, 163, 141, 156, 142, 130, 129, 161, 158, 117, 151, 121, 135, 123, 153]

# Perform a KS Test
t_stats, p_value = kstest(sales_strategy_A, sales_strategy_B)
print(f'T-Stats: {t_stats}, P-Value: {p_value}')

# Aplha value is 0.05
if p_value < 0.05:
    print('Reject the null hypothesis: Data is not from the same distribution')
else:
    print('Accept the null hypothesis: Data has same distribution')


T-Stats: 0.65, P-Value: 0.0002704973445409677
Reject the null hypothesis: Data is not from the same distribution


# Q5. Mobile App Response

A researcher is investigating the distribution of response times (in seconds) for two different versions of a mobile app, i.e. the time taken for a mobile app to respond to a user action, measured in seconds.


The goal is to determine if the response time distributions significantly differ between the two versions.

Data for 20 users for each app version is collected.
```javascript
response_times_version_A = [1.2, 1.3, 1.1, 1.4, 1.2, 1.3, 1.0, 1.5, 1.2, 1.3, 1.2, 1.4, 1.1, 1.3, 1.2, 1.5, 1.3, 1.4, 1.2, 1.3]

response_times_version_B = [1.6, 1.2, 1.3, 1.4, 1.1, 1.3, 1.2, 1.5, 1.3, 1.4, 1.2, 1.3, 1.2, 1.4, 1.1, 1.3, 1.5, 1.2, 1.3, 1.4]
```
Choose the appropriate test for the given scenario

### Answer: ✅KS Test 


# Q6. Delivery Method

An online shopping platform is testing two different delivery methods to improve the delivery times for their customers.

The data below represents the delivery times (in hours) for a sample of orders using Method A and Method B.
```javascript
delivery_method_A = [2.5, 3.2, 2.8, 3.5, 3.0, 2.7, 2.9, 3.1, 2.6, 3.3]

delivery_method_B = [3.8, 3.2, 3.5, 3.1, 3.9, 3.0, 3.3, 3.6, 3.4, 3.7]
```
Using an appropriate test, determine if there is a significant difference in the delivery time distributions between Method A and Method B. Use a significance level of 0.05.

In [19]:
delivery_method_A = [2.5, 3.2, 2.8, 3.5, 3.0, 2.7, 2.9, 3.1, 2.6, 3.3]
delivery_method_B = [3.8, 3.2, 3.5, 3.1, 3.9, 3.0, 3.3, 3.6, 3.4, 3.7]

# Perform a KS Test
t_stats, p_value = kstest(delivery_method_A, delivery_method_B)
print(f'T-Stats: {t_stats}, P-Value: {p_value}')

# Aplha value is 0.05
if p_value < 0.05:
    print('Reject the null hypothesis: Data is not from the same distribution')
else:
    print('Accept the null hypothesis: Data has same distribution')


T-Stats: 0.5, P-Value: 0.16782134274394334
Accept the null hypothesis: Data has same distribution


# Q7. Banking app

A bank is launching two different approaches (A and B) to encourage customers to adopt its new mobile banking app. The bank randomly assigns a group of customers to each approach and monitors their adoption rates over a month.

Data:
```javascript
Group A (Approach A): [38, 40, 42, 37, 39, 41, 36, 35, 43, 38]  
Group B (Approach B): [48, 45, 46, 43, 50, 44, 49, 47, 42, 46] 
```
Objective:

Assess whether the new incentive program in Approach B leads to a statistically significant improvement in the adoption rates compared to Approach A.

Choose the suitable statistical test.

In [22]:
groupA = [38, 40, 42, 37, 39, 41, 36, 35, 43, 38]  
groupB = [48, 45, 46, 43, 50, 44, 49, 47, 42, 46] 

# Perform a T-Test Ind(A/B Test)
t_stats, p_value = ttest_ind(groupA, groupB, alternative='less')
print(f'T-Stats: {t_stats}, P-Value: {p_value}')


T-Stats: -6.125851335983492, P-Value: 4.359142475666563e-06


# Q8. Suitable scenario for A/B testing

Select the scenarios for which A/B testing can be effectively used?

A)

A coffee shop is considering two different promotional strategies to boost sales during the morning hours. 
Strategy A involves offering discounts on coffee, while Strategy B involves introducing a new breakfast menu. 
The marketing team wants to determine which strategy leads to a higher increase in sales.

B)

A university is conducting a survey to assess the effectiveness of two teaching methods (Method A and Method B) in improving student performance in mathematics. 
The goal is to determine if one teaching method is more successful than the other in helping students achieve higher scores.

C)

A nutritionist is conducting a study to compare the average weight loss among three different diet plans. 
She randomly assigns participants into three groups, each following a different diet plan (Low-Carb, Mediterranean, and Vegan).
After eight weeks, she records the weight loss for each participant.

D)

A manufacturing company is interested in studying the effects of two factors (Temperature and Humidity) on the strength of a certain type of material.
They conduct an experiment where they expose samples of the material to different combinations of temperature (High and Low) and humidity (High and Low) levels.

# Correct Answer: A, B

Explanation

- A) A coffee shop considering two different promotional strategies (Strategy A and Strategy B) to boost sales during the morning hours is a scenario suitable for A/B testing. In A/B testing, the coffee shop can randomly assign customers to either Strategy A or Strategy B and measure the impact on sales. This helps in determining which strategy leads to a higher increase in sales.

- B) A university conducting a survey to assess the effectiveness of two teaching methods (Method A and Method B) in improving student performance in mathematics is also suitable for A/B testing. The university can randomly assign students to either Method A or Method B, assess their performance, and determine if one teaching method is more successful than the other.

- C) The scenario with the nutritionist comparing average weight loss among three different diet plans involves more than two groups (Low-Carb, Mediterranean, and Vegan). While it is a suitable scenario for ANOVA (Analysis of Variance), A/B testing is typically designed for comparing two variations.

- D) The manufacturing company studying the effects of two factors (Temperature and Humidity) on material strength involves more than two variations (different combinations of temperature and humidity). Similar to option C, this scenario is better suited for ANOVA or factorial experiments, as it involves more than two variations.

### **In A/B testing, the focus is on comparing two variations to determine which one performs better in achieving a specific goal or outcome.**