<a href="https://colab.research.google.com/github/sameermdanwer/python-assignment-/blob/main/Statistics_Advance_Assignment_7.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Q1. Write a Python function that takes in two arrays of data and calculates the F-value for a variance ratio
test. The function should return the F-value and the corresponding p-value for the test.

To calculate the F-value for a variance ratio test (also known as an F-test), you can use the formula:

[
F = \frac{s_1^2}{s_2^2}
]

Where:

(s_1^2) is the sample variance of the first group.
(s_2^2) is the sample variance of the second group.
The p-value can be computed using the scipy.stats.f module, which is part of the SciPy library in Python.


In [2]:
import numpy as np
from scipy.stats import f

def f_variance_test(data1, data2):
    # Calculate sample variances
    var1 = np.var(data1, ddof=1)  # Sample variance for dataset 1
    var2 = np.var(data2, ddof=1)  # Sample variance for dataset 2

    # Calculate the F-value
    F_value = var1 / var2

    # Get the sample sizes
    n1 = len(data1)  # Sample size for dataset 1
    n2 = len(data2)  # Sample size for dataset 2

    # Degrees of freedom
    df1 = n1 - 1
    df2 = n2 - 1

    # Calculate the p-value
    p_value = 1 - f.cdf(F_value, df1, df2)  # Right-tailed test

    return F_value, p_value

# Example usage:
data1 = [5, 6, 7, 8, 9]
data2 = [2, 3, 4, 5, 6]
F_value, p_value = f_variance_test(data1, data2)
print("F-value:", F_value)
print("p-value:", p_value)

F-value: 1.0
p-value: 0.5


# Q2. Given a significance level of 0.05 and the degrees of freedom for the numerator and denominator of an
F-distribution, write a Python function that returns the critical F-value for a two-tailed test.

To calculate the critical F-value for a two-tailed test at a given significance level, you can use the ppf (percent point function) method from the scipy.stats.f module. However, since the F-distribution is inherently one-tailed, we need to adjust for two-tailed tests by considering both tails of the distribution.

In a two-tailed test, you need to divide the significance level by 2, and the critical F-value will be found for this adjusted alpha level.

Here's how to create the Python function:

In [3]:
from scipy.stats import f

def critical_f_value(alpha, df1, df2):
    """
    Calculate the critical F-value for a two-tailed test.

    Parameters:
    alpha: float
        The significance level for the test (e.g., 0.05).
    df1: int
        Degrees of freedom for the numerator.
    df2: int
        Degrees of freedom for the denominator.

    Returns:
    critical_value: float
        The critical F-value for the two-tailed test.
    """
    # Adjust alpha for two-tailed test
    alpha_two_tailed = alpha / 2

    # Calculate the critical F-value
    critical_value = f.ppf(1 - alpha_two_tailed, df1, df2)

    return critical_value

# Example usage:
alpha = 0.05
df1 = 5  # Degrees of freedom for the numerator
df2 = 10  # Degrees of freedom for the denominator
critical_value = critical_f_value(alpha, df1, df2)
print("Critical F-value for two-tailed test:", critical_value)

Critical F-value for two-tailed test: 4.236085668188633


# Q3. Write a Python program that generates random samples from two normal distributions with known

variances and uses an F-test to determine if the variances are equal. The program should output the F-
value, degrees of freedom, and p-value for the test.

To implement a Python program that generates random samples from two normal distributions with known variances and performs an F-test to determine if the variances are equal, we'll follow these steps:

Generate two random samples from normal distributions using NumPy.
Calculate the sample variances.
Use the F-test to calculate the F-value, degrees of freedom, and p-value.
Output the results.


In [4]:
import numpy as np
from scipy import stats

def generate_samples(mu1, sigma1, n1, mu2, sigma2, n2):
    """
    Generate random samples from two normal distributions.

    Parameters:
    mu1: mean of the first distribution
    sigma1: standard deviation of the first distribution
    n1: sample size of the first distribution
    mu2: mean of the second distribution
    sigma2: standard deviation of the second distribution
    n2: sample size of the second distribution

    Returns:
    Sampled data from both distributions.
    """
    sample1 = np.random.normal(mu1, sigma1, n1)
    sample2 = np.random.normal(mu2, sigma2, n2)
    return sample1, sample2

def f_variance_test(data1, data2):
    """
    Perform F-test for equality of variances.

    Parameters:
    data1 : array-like
        First sample of data
    data2 : array-like
        Second sample of data

    Returns:
    F-value, degrees of freedom, p-value.
    """
    var1 = np.var(data1, ddof=1)
    var2 = np.var(data2, ddof=1)
    F_value = var1 / var2

    n1 = len(data1)
    n2 = len(data2)
    df1 = n1 - 1
    df2 = n2 - 1

    p_value = 1 - stats.f.cdf(F_value, df1, df2)

    return F_value, df1, df2, p_value

# Main program
def main():
    # Parameters for the two normal distributions
    mu1, sigma1, n1 = 0, 1, 30  # Mean, standard deviation, sample size for first distribution
    mu2, sigma2, n2 = 0, 1.5, 30  # Mean, standard deviation, sample size for second distribution

    # Generate random samples
    sample1, sample2 = generate_samples(mu1, sigma1, n1, mu2, sigma2, n2)

    # Perform F-test
    F_value, df1, df2, p_value = f_variance_test(sample1, sample2)

    # Output results
    print(f"Sample 1 Variance: {np.var(sample1, ddof=1):.4f}")
    print(f"Sample 2 Variance: {np.var(sample2, ddof=1):.4f}")
    print(f"F-value: {F_value:.4f}")
    print(f"Degrees of Freedom for Sample 1: {df1}")
    print(f"Degrees of Freedom for Sample 2: {df2}")
    print(f"P-value: {p_value:.4f}")

if __name__ == "__main__":
    main()

Sample 1 Variance: 1.0372
Sample 2 Variance: 1.9941
F-value: 0.5202
Degrees of Freedom for Sample 1: 29
Degrees of Freedom for Sample 2: 29
P-value: 0.9582


# Q4.The variances of two populations are known to be 10 and 15. A sample of 12 observations is taken from
each population. Conduct an F-test at the 5% significance level to determine if the variances are
significantly different.

To conduct an F-test to determine if the variances of two populations are significantly different, given that the variances of the populations (\sigma_1^2) and (\sigma_2^2) are known, we can follow these steps:

1. State the null and alternative hypotheses:

* Null Hypothesis (H_0): The variances are equal, i.e., ( \sigma_1^2 = \sigma_2^2 ).
* Alternative Hypothesis (H_a): The variances are not equal, i.e., ( \sigma_1^2 \neq \sigma_2^2 ).
2. Calculate the F-statistic:
The F-statistic is calculated as:
[
F = \frac{s_1^2}{s_2^2}
]
where (s_1^2) is the variance of the first sample and (s_2^2) is the variance of the second sample.

3. Degrees of freedom:

* Degrees of freedom for the first sample: ( df_1 = n_1 - 1 = 12 - 1 = 11 )
* Degrees of freedom for the second sample: ( df_2 = n_2 - 1 = 12 - 1 = 11 )
4. Determine the critical value:
For a two-tailed test at the 5% significance level, we need to find the critical F-values using an F-distribution table or a statistical library.

5. Calculate the p-value.

6. Make a decision: If the calculated F-statistic falls outside the critical values or if the p-value is less than the significance level, we reject the null hypothesis

In [5]:
from scipy import stats

# Given data
var1 = 10  # Variance of population 1
var2 = 15  # Variance of population 2
n1 = 12    # Sample size for population 1
n2 = 12    # Sample size for population 2
alpha = 0.05  # Significance level

# Calculate the F-statistic
F_value = var1 / var2

# Degrees of freedom
df1 = n1 - 1
df2 = n2 - 1

# Critical F-value for two-tailed test
F_critical_low = stats.f.ppf(alpha / 2, df1, df2)  # Lower critical value
F_critical_high = stats.f.ppf(1 - alpha / 2, df1, df2)  # Upper critical value

# p-value
p_value = (1 - stats.f.cdf(F_value, df1, df2)) * 2  # Two-tailed

# Output results
print(f"F-value: {F_value:.4f}")
print(f"Degrees of Freedom: df1 = {df1}, df2 = {df2}")
print(f"Critical F-values: Low = {F_critical_low:.4f}, High = {F_critical_high:.4f}")
print(f"P-value: {p_value:.4f}")

# Decision
if F_value < F_critical_low or F_value > F_critical_high:
    print("Reject the null hypothesis: The variances are significantly different.")
else:
    print("Fail to reject the null hypothesis: The variances are not significantly different.")

F-value: 0.6667
Degrees of Freedom: df1 = 11, df2 = 11
Critical F-values: Low = 0.2879, High = 3.4737
P-value: 1.4876
Fail to reject the null hypothesis: The variances are not significantly different.


# Q5. A manufacturer claims that the variance of the diameter of a certain product is 0.005. A sample of 25
products is taken, and the sample variance is found to be 0.006. Conduct an F-test at the 1% significance
level to determine if the claim is justified.

To conduct an F-test for the manufacturer’s claim regarding the variance of the product's diameter, we will perform the following steps:

1. State the null and alternative hypotheses:

* Null hypothesis ( H_0 ): The variance of the product diameter is equal to the claimed variance, i.e., ( \sigma^2 = 0.005 ).
* Alternative hypothesis ( H_a ): The variance of the product diameter is not equal to the claimed variance, i.e., ( \sigma^2 \neq 0.005 ).
2. Given information:

* Claimed variance ( \sigma_0^2 = 0.005 )
* Sample variance ( s^2 = 0.006 )
* Sample size ( n = 25 )
3. Calculate the F-statistic:
The F-statistic is calculated as:
[
F = \frac{s^2}{\sigma_0^2}
]

4. Degrees of freedom:

The degrees of freedom is calculated as ( df = n - 1 = 25 - 1 = 24 ).
5. Determine the critical F-value:
For a two-tailed test at the 1% significance level, we need to find the critical F-values using the degrees of freedom and the significance level.

6. Calculate the p-value.

7. Make a decision: If the calculated F-statistic falls outside the critical values or if the p-value is less than the significance level, we reject the null hypothesis.


In [6]:
from scipy import stats

# Given data
claimed_variance = 0.005  # Claimed variance (sigma_0^2)
sample_variance = 0.006  # Sample variance (s^2)
n = 25  # Sample size
alpha = 0.01  # Significance level (1%)

# Calculate the F-statistic
F_value = sample_variance / claimed_variance

# Degrees of freedom
df = n - 1

# Critical F-values for two-tailed test
F_critical_low = stats.f.ppf(alpha / 2, df, df)  # Lower critical value
F_critical_high = stats.f.ppf(1 - alpha / 2, df, df)  # Upper critical value

# p-value
# If we have to consider the p-value for two-tailed test
p_value = (stats.f.cdf(F_value, df, df)) * 2  # Two-tailed

# Output results
print(f"F-value: {F_value:.4f}")
print(f"Degrees of Freedom: df = {df}")
print(f"Critical F-values: Low = {F_critical_low:.4f}, High = {F_critical_high:.4f}")
print(f"P-value: {p_value:.4f}")

# Decision
if F_value < F_critical_low or F_value > F_critical_high:
    print("Reject the null hypothesis: The sample variance is significantly different from the claimed variance.")
else:
    print("Fail to reject the null hypothesis: The sample variance is not significantly different from the claimed variance.")

F-value: 1.2000
Degrees of Freedom: df = 24
Critical F-values: Low = 0.3371, High = 2.9667
P-value: 1.3413
Fail to reject the null hypothesis: The sample variance is not significantly different from the claimed variance.


# Q6. Write a Python function that takes in the degrees of freedom for the numerator and denominator of an
F-distribution and calculates the mean and variance of the distribution. The function should return the
mean and variance as a tuple.

You can calculate the mean and variance of an F-distribution using the formulas:

* The mean of an F-distribution is given by:
[
\text{Mean} = \frac{d_1}{d_1 - 2} \quad \text{for } d_1 > 2
]

* The variance of an F-distribution is given by:
[
\text{Variance} = \frac{2 \cdot d_2^2 \cdot (d_1 + d_1 - 2)}{d_1^2 \cdot (d_2 - 2) \cdot (d_2 - 4)} \quad \text{for } d_2 > 4
]

where (d_1) is the degrees of freedom for the numerator, and (d_2) is the degrees of freedom for the denominator.

Here is a Python function that takes in d1 and d2 as inputs and returns the mean and variance as a tuple:

In [7]:
def f_distribution_properties(d1, d2):
    """
    Calculate the mean and variance of the F-distribution given
    the degrees of freedom for the numerator (d1) and the denominator (d2).

    Parameters:
    d1 (int): Degrees of freedom for the numerator.
    d2 (int): Degrees of freedom for the denominator.

    Returns:
    tuple: Mean and variance of the F-distribution.
    """

    if d1 <= 2:
        mean = None  # Mean is undefined for d1 <= 2
    else:
        mean = d1 / (d1 - 2)

    if d2 <= 4:
        variance = None  # Variance is undefined for d2 <= 4
    else:
        variance = (2 * d2**2 * (d1 + d1 - 2)) / (d1**2 * (d2 - 2) * (d2 - 4))

    return (mean, variance)

# Example usage:
d1 = 5
d2 = 10
mean, variance = f_distribution_properties(d1, d2)
print(f"Mean: {mean}, Variance: {variance}")

Mean: 1.6666666666666667, Variance: 1.3333333333333333


# Q7. A random sample of 10 measurements is taken from a normal population with unknown variance. The
sample variance is found to be 25. Another random sample of 15 measurements is taken from another
normal population with unknown variance, and the sample variance is found to be 20. Conduct an F-test
at the 10% significance level to determine if the variances are significantly different.

To conduct an F-test to determine if the variances from two normal populations are significantly different, we'll follow these steps:

1. Steps:
State the Hypotheses:

Null hypothesis ( H_0 ): The variances are equal, i.e., ( \sigma_1^2 = \sigma_2^2 ).
Alternative hypothesis ( H_a ): The variances are not equal, i.e., ( \sigma_1^2 \neq \sigma_2^2 ).
2. Given Information:

* Sample 1 (first normal population):
* Sample size ( n_1 = 10 )
* Sample variance ( s_1^2 = 25 )
*     Sample 2 (second normal population):
* Sample size ( n_2 = 15 )
* Sample variance ( s_2^2 = 20 )
3. Calculate the F-statistic:
The F-statistic is calculated as follows:
[
F = \frac{s_1^2}{s_2^2}
]
Here, we use the larger variance in the numerator to ensure the ( F ) statistic is greater than or equal to 1.

4. Degrees of Freedom:

Degrees of freedom for the numerator ( df_1 = n_1 - 1 )
Degrees of freedom for the denominator ( df_2 = n_2 - 1 )
5. Determine the Critical F-value:
We'll find the critical F-values using the degrees of freedom and the significance level for both tails.

6. Make a Decision: If the calculated F-statistic falls outside the critical values, we reject the null hypothesis.

In [8]:
from scipy import stats

def f_test_variances(n1, s1_squared, n2, s2_squared, alpha):
    # Calculate the F-statistic
    F_statistic = s1_squared / s2_squared

    # Degrees of freedom
    df1 = n1 - 1
    df2 = n2 - 1

    # Critical F-values for two-tailed test
    F_critical_lower = stats.f.ppf(alpha / 2, df1, df2)
    F_critical_upper = stats.f.ppf(1 - alpha / 2, df1, df2)

    return F_statistic, (F_critical_lower, F_critical_upper)

# Given data
n1 = 10  # Sample size for the first group
s1_squared = 25  # Sample variance for the first group
n2 = 15  # Sample size for the second group
s2_squared = 20  # Sample variance for the second group
alpha = 0.10  # Significance level

# Perform F-test
F_stat, critical_values = f_test_variances(n1, s1_squared, n2, s2_squared, alpha)

# Output results
print(f"F-statistic: {F_stat:.4f}")
print(f"Critical F-values: Lower = {critical_values[0]:.4f}, Upper = {critical_values[1]:.4f}")

# Decision
if F_stat < critical_values[0] or F_stat > critical_values[1]:
    print("Reject the null hypothesis: The variances are significantly different.")
else:
    print("Fail to reject the null hypothesis: The variances are not significantly different.")

F-statistic: 1.2500
Critical F-values: Lower = 0.3305, Upper = 2.6458
Fail to reject the null hypothesis: The variances are not significantly different.


# Q8. The following data represent the waiting times in minutes at two different restaurants on a Saturday
night: Restaurant A: 24, 25, 28, 23, 22, 20, 27; Restaurant B: 31, 33, 35, 30, 32, 36. Conduct an F-test at the 5%
significance level to determine if the variances are significantly different.

To conduct an F-test to determine if the variances of the waiting times at two different restaurants are significantly different, we'll follow these steps:

# Steps:
1. State the Hypotheses:

* Null hypothesis ( H_0 ): The variances are equal, i.e., ( \sigma_A^2 = \sigma_B^2 ).
* Alternative hypothesis ( H_a ): The variances are not equal, i.e., ( \sigma_A^2 \neq \sigma_B^2 ).
2. Given Data:

Restaurant A: 24, 25, 28, 23, 22, 20, 27
Restaurant B: 31, 33, 35, 30, 32, 36
3. Calculate the Sample Variances and Sizes:
We need to calculate the sample variances ( s_A^2 ) and ( s_B^2 ), as well as their sample sizes ( n_A ) and ( n_B ).

4. Calculate the F-statistic:
The F-statistic is calculated as follows:
[
F = \frac{s_A^2}{s_B^2}
]

5. Degrees of Freedom:

Degrees of freedom for Restaurant A ( df_A = n_A - 1 )
Degrees of freedom for Restaurant B ( df_B = n_B - 1 )
6. Determine the Critical F-value:
Find the critical F-value using the degrees of freedom and the significance level for a two-tailed test.

7. Make a Decision: If the calculated F-statistic falls outside the critical values, we reject the null hypothesis.

Calculation in Python:

In [9]:
import numpy as np
from scipy import stats

def f_test(waiting_times_a, waiting_times_b, alpha):
    # Calculate sample sizes
    n_a = len(waiting_times_a)
    n_b = len(waiting_times_b)

    # Calculate sample variances
    s_a_squared = np.var(waiting_times_a, ddof=1)  # Sample variance for A
    s_b_squared = np.var(waiting_times_b, ddof=1)  # Sample variance for B

    # Calculate the F-statistic
    F_statistic = s_a_squared / s_b_squared

    # Degrees of freedom
    df_a = n_a - 1
    df_b = n_b - 1

    # Critical F-values for two-tailed test
    F_critical_lower = stats.f.ppf(alpha / 2, df_a, df_b)
    F_critical_upper = stats.f.ppf(1 - alpha / 2, df_a, df_b)

    return F_statistic, (F_critical_lower, F_critical_upper), (df_a, df_b)

# Given data
waiting_times_a = [24, 25, 28, 23, 22, 20, 27]  # Restaurant A
waiting_times_b = [31, 33, 35, 30, 32, 36]      # Restaurant B
alpha = 0.05  # Significance level

# Perform F-test
F_stat, critical_values, degrees_freedom = f_test(waiting_times_a, waiting_times_b, alpha)

# Output results
print(f"F-statistic: {F_stat:.4f}")
print(f"Critical F-values: Lower = {critical_values[0]:.4f}, Upper = {critical_values[1]:.4f}")
print(f"Degrees of freedom: df_A = {degrees_freedom[0]}, df_B = {degrees_freedom[1]}")

# Decision
if F_stat < critical_values[0] or F_stat > critical_values[1]:
    print("Reject the null hypothesis: The variances are significantly different.")
else:
    print("Fail to reject the null hypothesis: The variances are not significantly different.")

F-statistic: 1.4552
Critical F-values: Lower = 0.1670, Upper = 6.9777
Degrees of freedom: df_A = 6, df_B = 5
Fail to reject the null hypothesis: The variances are not significantly different.


# Q9. The following data represent the test scores of two groups of students: Group A: 80, 85, 90, 92, 87, 83;
Group B: 75, 78, 82, 79, 81, 84. Conduct an F-test at the 1% significance level to determine if the variances
are significantly different.

To determine if the variances of test scores from two groups of students are significantly different using an F-test at the 1% significance level, we will follow the steps below:

# Steps:
1. State the Hypotheses:

* Null hypothesis ( H_0 ): The variances are equal, i.e., ( \sigma_A^2 = \sigma_B^2 ).
* Alternative hypothesis ( H_a ): The variances are not equal, i.e., ( \sigma_A^2 \neq \sigma_B^2 ).
2. Given Data:

* Group A: 80, 85, 90, 92, 87, 83
* Group B: 75, 78, 82, 79, 81, 84
3. Calculate Sample Variances:
We need to calculate the sample variances ( s_A^2 ) for Group A and ( s_B^2 ) for Group B.

4. Calculate the F-statistic:
The F-statistic is calculated as:
[
F = \frac{s_A^2}{s_B^2}
]
where ( s_A^2 ) is the variance for Group A and ( s_B^2 ) is the variance for Group B. We will use the larger variance in the numerator to ensure that the F-statistic is greater than or equal to 1.

5. Degrees of Freedom:

* Degrees of freedom for Group A ( df_A = n_A - 1 )
* Degrees of freedom for Group B ( df_B = n_B - 1 )
6. Determine Critical F-values:
Find the critical F-values associated with the degrees of freedom and the significance level (for a two-tailed test).

7. Decision: If the calculated F-statistic falls outside the critical values, we reject the null hypothesis.

In [10]:
import numpy as np
from scipy import stats

def f_test(test_scores_a, test_scores_b, alpha):
    # Calculate sample sizes
    n_a = len(test_scores_a)
    n_b = len(test_scores_b)

    # Calculate sample variances
    s_a_squared = np.var(test_scores_a, ddof=1)  # Sample variance for Group A
    s_b_squared = np.var(test_scores_b, ddof=1)  # Sample variance for Group B

    # Calculate the F-statistic
    F_statistic = s_a_squared / s_b_squared

    # Degrees of freedom
    df_a = n_a - 1
    df_b = n_b - 1

    # Critical F-values for two-tailed test
    F_critical_lower = stats.f.ppf(alpha / 2, df_a, df_b)
    F_critical_upper = stats.f.ppf(1 - alpha / 2, df_a, df_b)

    return F_statistic, (F_critical_lower, F_critical_upper), (df_a, df_b)

# Given data
test_scores_a = [80, 85, 90, 92, 87, 83]  # Group A
test_scores_b = [75, 78, 82, 79, 81, 84]  # Group B
alpha = 0.01  # Significance level

# Perform F-test
F_stat, critical_values, degrees_freedom = f_test(test_scores_a, test_scores_b, alpha)

# Output results
print(f"F-statistic: {F_stat:.4f}")
print(f"Critical F-values: Lower = {critical_values[0]:.4f}, Upper = {critical_values[1]:.4f}")
print(f"Degrees of freedom: df_A = {degrees_freedom[0]}, df_B = {degrees_freedom[1]}")

# Decision
if F_stat < critical_values[0] or F_stat > critical_values[1]:
    print("Reject the null hypothesis: The variances are significantly different.")
else:
    print("Fail to reject the null hypothesis: The variances are not significantly different.")

F-statistic: 1.9443
Critical F-values: Lower = 0.0669, Upper = 14.9396
Degrees of freedom: df_A = 5, df_B = 5
Fail to reject the null hypothesis: The variances are not significantly different.
