In [1]:
import numpy as np
import matplotlib.pyplot as plt
import pandas as pd
import seaborn as sns
from scipy.stats import ttest_rel, f_oneway, norm

# Q1. IQ test

Sun Pharmaceutical Industries claims that a person's IQ improves after they use the Donepezil drug.

To test this claim a trial was conducted considering 20 patients. An IQ test was conducted for these patients before giving the drug and an IQ test was conducted for the same set of patients after the drug the recorded results are shown below.
```python
IQ_before=[101,124,89,57,135,98,69,105,114,106,97,121,93,116,102,71,88,108,144,99]

IQ_after=[113,127,89,70,127,104,69,127,115,99,104,120,95,129,106,71,94,112,154,96]
```
Perform an appropriate test to test the claim at 90% confidence.

In [2]:
iq_before=[101,124,89,57,135,98,69,105,114,106,97,121,93,116,102,71,88,108,144,99]

iq_after=[113,127,89,70,127,104,69,127,115,99,104,120,95,129,106,71,94,112,154,96]
statistic, pvalue = ttest_rel(iq_before,iq_after, alternative='less')  
print('statistic:',statistic)
print('pvalue:',pvalue)
if pvalue < 0.1:
    print('reject null hypothesis')
else:
    print('accept null hypothesis')

statistic: -2.5849213105659876
pvalue: 0.009079862169696327
reject null hypothesis


# Q2. Advertising Campaigns

In a randomized controlled trial, you are comparing the effectiveness of two different advertising campaigns (A and B) in increasing sales. You collect data on sales from two groups: one exposed to campaign A and the other to campaign B. Which of the following statistical tests is most appropriate for comparing the mean sales between the two groups?


- Chi-squared test
- Pearson correlation
- Paired samples t-test
- Independent samples t-test

### ✅ **Independent samples t-test**

# Q3. New Flagship Product
A software company is planning to release a new version of its flagship product. The intent is that the new version will be better as compared to the previous version. Formulate the null and alternative hypotheses for this scenario.

Based on the options provided, the most suitable pair of null and alternative hypotheses for the scenario would be:

Null Hypothesis (H0): The new version will have the same number of reported bugs as the previous version. 

Alternative Hypothesis (H1): The new version will reduce the number of reported bugs.

# Q4. Four Machines
Suppose there a four machines m1, m2, m3, and m4 in a factory that is used to produce a certain kind of cotton fabric.

Samples of size 4 with each unit having 100sq. meters are selected from the output of the machine randomly, and the number of flaws in every 100 sq. meters is counted and listed below.
```python
m1 = [8, 9, 11, 12]
m2 = [6, 8, 10, 4]
m3 = [14, 12, 18, 9]
m4 = [20, 22, 25, 23]
```
Do you think there is a significant difference in the performance of the four machines?

Check whether there is a significant difference (consider a 5% significance level)

In [12]:
m1 = [8, 9, 11, 12]
m2 = [6, 8, 10, 4]
m3 = [14, 12, 18, 9]
m4 = [20, 22, 25, 23]
stats, p_value = f_oneway(m1, m2, m3, m4)
print('stats:',stats)
print('p_value:',p_value)
# Checking the p-value
if p_value < 0.05:
    print("There is a significant difference in the performance of the four machines.")
else:
    print("There is no significant difference in the performance of the four machines.")

stats: 25.221574344023324
p_value: 1.8124793267561276e-05
There is a significant difference in the performance of the four machines.


# Q5. Highway mileage

A popular car manufacturing brand claims that their car model Rex500 has an average highway mileage of 21.50 Km/L, you want to test whether this claim is statistically significant or not.

You managed to get data from 45 cars of this model and found that the average highway mileage is 20.42 Km/L, with a standard deviation of 2.7 Km/L

With 99% confidence, will you be able to conclude that the average highway mileage is statistically lower than the claimed fuel economy?

Use the appropriate test and select the correct option below:

In [36]:
mean = 21.50
sample = 45
sample_mean = 20.42
sample_std = 2.7

z_score = (sample_mean - mean) / (sample_std / np.sqrt(sample))
print('z_score:',z_score)

p_value = norm.cdf(z_score)
print('p_value:',p_value)

if p_value < 0.01:
    print("Reject null hypothesis: The avg is different")
else:
    print("Fail to Reject null hypothesis: The avg is same")

z_score: -2.6832815729997432
p_value: 0.003645179045767866
Reject null hypothesis: The avg is different


# Q6. p-value

Setting the p-level at 0.01 increases the chances of making


- All of the above
- Type I error
- Type II error
- Type III error

# Q7. New Feature Quarter
​​You have a dataset with a 'date' column in the format 'YYYY-MM-DD'. You want to create a new feature 'quarter' which indicates the quarter of the year (Q1, Q2, Q3, or Q4) corresponding to each date. Which of the following code snippets achieves this task correctly?
- df['quarter'] = pd.to_datetime(df['date']).dt.month // 3 + 2
- df['quarter'] = pd.to_datetime(df['date']).dt.quarter
- df['quarter'] = pd.to_datetime(df['date']).apply(lambda x: 'Q'+str((x.month-1)//3+1).zfill(2))
- df['quarter'] = pd.to_datetime(df['date']).apply(lambda x: 'Q'+str((x.month-1)//3+2))

In [19]:
df = pd.DataFrame({'date': ['2022-01-01', '2022-04-01', '2022-07-01', '2022-10-01']})
df

Unnamed: 0,date
0,2022-01-01
1,2022-04-01
2,2022-07-01
3,2022-10-01


In [31]:
# df['quarter'] = pd.to_datetime(df['date']).dt.month // 3 + 2 -> ❌
df['quarter'] = pd.to_datetime(df['date']).apply(lambda x: 'Q'+str((x.month-1)//3+1).zfill(2))
df

Unnamed: 0,date,quarter
0,2022-01-01,Q01
1,2022-04-01,Q02
2,2022-07-01,Q03
3,2022-10-01,Q04


# Q8. Error Rate
In hypothesis testing, what is the type I error rate if we set the significance level (α) at 0.01?


- 0.05
- 0.01
- 0.001
- 0.10

# Q9. Marathon Race time
In a sample of marathon race times, the mean time is 4 hours, and the standard deviation is 30 minutes. If you set the Z-score threshold to 2, which race times would be considered outliers?

In [16]:
mean = 4
std_dev = 0.5
z_score = 2
x = z_score*std_dev + mean
x 

5.0

In [18]:
x = mean - z_score*std_dev
x

3.0