##### Write a Python function that takes in two arrays of data and calculates the F-value for a variance ratio test. The function should return the F-value and the corresponding p-value for the test.

In [104]:
import numpy as np
from scipy import stats

## Function for variance ratio test
def test_variance(sample_1, sample_2):

    # Calculate variance
    var1 = np.var(sample_1, ddof=1)
    var2 = np.var(sample_2, ddof=1)

    # Calculate f value
    f_value = var1 / var2

    # Degree of Freedom
    df1 = len(sample_1) - 1
    df2 = len(sample_1) - 1

    # Calculate p value (Probability)
    p_value = stats.f.cdf(f_value, df1, df2)

    return(f_value, p_value)

In [105]:
## Sample usgae:

class_a_height = np.random.randint(110, 150, 20)
class_b_height = np.random.randint(110, 150, 20)

f_value, p_value = test_variance(class_a_height, class_b_height)
print(f"Ration of variance = {f_value}")
print(f"Probability of observing this ratio if population variance are similar is {p_value}")

Ration of variance = 1.2719361856417695
Probability of observing this ratio if population variance are similar is 0.6973460013711861


##### Q2. Given a significance level of 0.05 and the degrees of freedom for the numerator and denominator of an F-distribution, write a Python function that returns the critical F-value for a two-tailed test.

In [106]:
def two_tailed_f_value(alpha, dfn, dfd):
    f_value = stats.f.ppf(q=(1 - alpha/2), dfn=dfn, dfd=dfd)
    return f_value

In [107]:
# Example Usage:
numerator_dof = 18
denom_dof = 27
significance = 0.05

critical_value = two_tailed_f_value(significance, numerator_dof, denom_dof)
print(f"Critical f-value for two tailed test at {significance} significance is {critical_value}")

Critical f-value for two tailed test at 0.05 significance is 2.291176179061679


##### Q3. Write a Python program that generates random samples from two normal distributions with known variances and uses an F-test to determine if the variances are equal. The program should output the F- value, degrees of freedom, and p-value for the test.

In [108]:
# Parameters
sample_size = 30
variance1 = 9  # Variance for the first distribution
variance2 = 16  # Variance for the second distribution
mean1 = 30  # Mean for the first distribution
mean2 = 34  # Mean for the second distribution

# Generate sample from the distribution
sample1 = np.random.normal(mean1, np.sqrt(variance1), sample_size)
sample2 = np.random.normal(mean2, np.sqrt(variance2), sample_size)

# Calculate f_value
f_value = np.var(sample1, ddof=1) / np.var(sample2, ddof=1)

# Degree of Freedom
dfn = dfd = sample_size - 1
# Calculate P value
p_value = stats.f.cdf(f_value, dfn, dfd)

# Print results
print(f"f-value: {f_value}")
print(f"Degree of freedom: {dfn}, {dfd}")
print(f"p_value: {p_value}")

f-value: 0.2220926844219334
Degree of freedom: 29, 29
p_value: 5.916543421415453e-05


##### Q4.The variances of two populations are known to be 10 and 15. A sample of 12 observations is taken from each population. Conduct an F-test at the 5% significance level to determine if the variances are significantly different.

In [109]:
## Given
var1 = 10
var2 = 15
sample_size_1 = 12
sample_size_2 = 12
dfn = sample_size_1 - 1
dfd = sample_size_2 - 1
alpha = 0.05

# Calculate f_value
f_statistic = var1 / var2

# Calculate P value
p_value = stats.f.cdf(f_value, dfn, dfd)

# Calculate critical value
critical_f_value = stats.f.ppf(q=1-alpha, dfn = dfn, dfd = dfd)

# Print key results
print(f"F Statistic: {f_statistic}")
print(f"Critical F Value: {critical_f_value}")
print(f"P value: {p_value}")

if f_statistic > critical_f_value:
    print("Reject the null hypothesis. Variances are significantly different.")
else:
    print("Fail to reject the null hypothesis. Variances are not significantly different.")


F Statistic: 0.6666666666666666
Critical F Value: 2.8179304699530863
P value: 0.00966174106632722
Fail to reject the null hypothesis. Variances are not significantly different.


##### Q5. A manufacturer claims that the variance of the diameter of a certain product is 0.005. A sample of 25 products is taken, and the sample variance is found to be 0.006. Conduct an F-test at the 1% significance level to determine if the claim is justified.

In [110]:
## Given
var1 = 0.005
var2 = 0.006
sample_size = 25
df2 = sample_size - 1
significance = 0.01

# Assume degree of freedom for population based on sample
df1 = sample_size - 1

# Calculating f statistic
f_statistic = var1 / var2

# Calculate critical f value for 0.01 significance
critical_f = stats.f.ppf(q = 1 - significance, dfn = df1, dfd = df2)

# Print key results
print(f"F Statistic: {f_statistic}")
print(f"Critical F Value: {critical_f}")

if f_statistic > critical_f:
    print("Reject claim. Population variance is not 0.005")
else:
    print("Fail to reject the claim. Population variance is 0.05")

F Statistic: 0.8333333333333334
Critical F Value: 2.659072104348157
Fail to reject the claim. Population variance is 0.05


##### Q6. Write a Python function that takes in the degrees of freedom for the numerator and denominator of an F-distribution and calculates the mean and variance of the distribution. The function should return the mean and variance as a tuple.

* **The F distribution has the following properties**:

    1. The mean of the distribution is equal to:
    $$ mean = \frac{df_2}{df_2 - 2}\ [for\ df_2 > 2] $$
    2. The variance is equal to:
    $$ variance = \frac{2*df_2^2*(df_1 + df_2 - 2)}{df_1*(df_2 - 2)^2*(df_2 - 4)}\ [for\ df_2 > 4]
    

In [111]:
# Function to approximate mean and variance from degree of freedom
def get_mean_and_var(dfn, dfd):
    #Check if degree of freedom of denominator is above 4 to calculate Variance

    if dfd <= 4:
        raise ValueError("Degrees of freedom for denominator (dfd) must be greater than 4.")
    
    mean = dfd / (dfd - 2)
    var = (2 * dfd**2 * (dfn + dfd - 2)) / (dfn * (dfd - 2)**2 * (dfd - 4))

    return (mean, var)

In [112]:
# Sample usage:
df_numerator = 19
df_denominator = 5

try:
    mean, var = get_mean_and_var(df_numerator, df_denominator)
except ValueError as e:
    print(e)
else:
    print(f"distribution mean : {mean}")
    print(f"distribution variance : {var}")

distribution mean : 1.6666666666666667
distribution variance : 6.432748538011696


##### Q7. A random sample of 10 measurements is taken from a normal population with unknown variance. The sample variance is found to be 25. Another random sample of 15 measurements is taken from another normal population with unknown variance, and the sample variance is found to be 20. Conduct an F-test at the 10% significance level to determine if the variances are significantly different.

In [113]:
## Hypothesis:
# H0 - Varience are not significantly different
# H1 - Variance are significantly different

## Given 
sample_size_1 = 10
sample_size_2 = 15
var_1 = 25
var_2 = 20
significance = 0.1

# Calculate f statistic
f_statistic = var1 / var2

# Calculate degree of freedom
df1 = sample_size_1 - 1
df2 = sample_size_2 - 1

# Calculate critical value
critical_f = stats.f.ppf(q= 1 - significance, dfn = df1, dfd = df2)

# print results
print(f"F Statistic: {f_statistic}")
print(f"Critical F Value: {critical_f}")

if f_statistic > critical_f:
    print("Reject the hypothesis - variance are significantly different")
else:
    print("Fail to reject the hypothesis - variance are not significantly different")


F Statistic: 0.8333333333333334
Critical F Value: 2.121954566976902
Fail to reject the hypothesis - variance are not significantly different


#### Q8. The following data represent the waiting times in minutes at two different restaurants on a Saturday night: Restaurant A: 24, 25, 28, 23, 22, 20, 27; Restaurant B: 31, 33, 35, 30, 32, 36. Conduct an F-test at the 5% significance level to determine if the variances are significantly different.

In [114]:
## Hypothesis:
# H0 - Varience are not significantly different
# H1 - Variance are significantly different

## Given 
wait_time_A = np.array([24, 25, 28, 23, 22, 20, 27])
wait_time_B = np.array([31, 33, 35, 30, 32, 36])
significance = 0.05

# Calculate variance
var_A = np.var(wait_time_A)
var_B = np.var(wait_time_B)

# Calculate f statistic
f_statistic = var_A / var_B

# Calculate degree of freedom
df_A = len(wait_time_A) - 1
df_B = len(wait_time_B) - 1

# calculate critical f value
critical_f = stats.f.ppf(q = 1 - significance, dfn = df_A, dfd = df_B)

# print results
print(f"F Statistic: {f_statistic}")
print(f"Critical F Value: {critical_f}")

if f_statistic > critical_f:
    print("Reject the hypothesis - variance are significantly different")
else:
    print("Fail to reject the hypothesis - variance are not significantly different")

F Statistic: 1.496767651159843
Critical F Value: 4.950288068694318
Fail to reject the hypothesis - variance are not significantly different


##### Q9. The following data represent the test scores of two groups of students: Group A: 80, 85, 90, 92, 87, 83; Group B: 75, 78, 82, 79, 81, 84. Conduct an F-test at the 1% significance level to determine if the variances are significantly different.

In [115]:
## Hypothesis
# H0 - Variance are not different
# H1 - Variance are significantly different

## Given
score_A = np.array([80, 85, 90, 92, 87, 83])
score_B = np.array([75, 78, 82, 79, 81, 84])
significance = 0.01

# Calculate variance
var_A = np.var(score_A)
var_B = np.var(score_B)

# Calculate f statistic
f_statistic = var_A / var_B

# Calculate degree of freedom
df_A = len(score_A) - 1
df_B = len(score_B) - 1

# calculate critical f value
critical_f = stats.f.ppf(q = 1 - significance, dfn = df_A, dfd = df_B)

# print results
print(f"F Statistic: {f_statistic}")
print(f"Critical F Value: {critical_f}")

if f_statistic > critical_f:
    print("Reject the hypothesis - variance are significantly different")
else:
    print("Fail to reject the hypothesis - variance are not significantly different")

F Statistic: 1.9442622950819677
Critical F Value: 10.967020650907992
Fail to reject the hypothesis - variance are not significantly different
