# 1. Explain the properties of the F-distribution.

Asymmetry: The F-distribution is skewed to the right and is not symmetrical

Positive values: The F-distribution can only have positive values

Parameters: The F-distribution is defined by two parameters: the degrees of freedom of the numerator (\(m\)) and the degrees of freedom of the denominator (\(n\))

Approximates normal distribution: As the degrees of freedom for the numerator and the denominator increase, the F-distribution approaches the normal distribution

Uses: The F-distribution is used to compare variances and in two-way Analysis of Variance

F-statistic: The F-statistic is greater than or equal to zero

Shape: The exact shape of the F-distribution depends on the degrees of freedom associated with the numerator and the denominator

# 2. In which types of statistical tests is the F-distribution used, and why is it appropriate for these tests?

The F-distribution is used in hypothesis testing, particularly in the analysis of variance (ANOVA) and the F-test, to compare the variances of two populations or samples

## Hypothesis testing
Scientists use hypothesis testing to statistically compare data from two or more populations. The F-distribution is used to determine if the F-value for a study indicates any statistically significant differences between two populations

##ANOVA
In ANOVA, the F-distribution is used to determine if the observed differences between sample means are statistically significant. This is done by using the ratio of two mean squares (variances).

##F-test
The F-test is a statistical test that uses the F-distribution to compare two variances by dividing them

The F-distribution is appropriate for these tests because it's a probability distribution that arises frequently as the null distribution of a test statistic



# 3. What are the key assumptions required for conducting an F-test to compare the variances of two populations?


When conducting an F-test to compare the variances of two populations, several key assumptions must be met:

1)Independence: The samples must be independent of each other. This means that the selection or value of one sample does not affect the other.

2)Normality: The data in both populations should be approximately normally distributed. While the F-test is somewhat robust to deviations from normality, severe departures can affect the validity of the test.

3)Homogeneity of Variances: The variances of the two populations should be equal (homoscedasticity). The F-test specifically tests the null hypothesis that the two population variances are equal.

4)Random Sampling: The samples should be randomly drawn from their respective populations to ensure that the results are generalizable.

If these assumptions are met, the F-test can be appropriately applied to compare the variances of the two populations. If the assumptions are violated, alternative tests or transformations may be necessary to obtain valid results

# 4. What is the purpose of ANOVA, and how does it differ from a t-test?

The purpose of ANOVA (Analysis of Variance) is to determine whether there are statistically significant differences between the means of three or more groups. It assesses the variability within each group compared to the variability between the groups. If the variability between groups is greater than the variability within groups, it suggests that at least one group mean is different from the others.

Key Differences Between ANOVA and t-tests:

#Number of Groups:

t-test: Used to compare the means of two groups.

ANOVA: Used to compare the means of three or more groups.


##Type of Comparison:

t-test: Directly assesses whether the means of the two groups are different.

ANOVA: Tests the hypothesis that all group means are equal without specifying which means are different. If ANOVA indicates significant differences, post-hoc tests can then identify which specific means differ.

##Error Rate Control:

t-test: Each test has its own Type I error rate, and conducting multiple t-tests increases the overall error rate.

ANOVA: Controls the Type I error rate across all comparisons when assessing multiple groups, maintaining a consistent alpha level.
Assumptions:

Both tests assume normality and homogeneity of variances, but ANOVA can handle more complex experimental designs and is robust against some violations of these assumptions.

# 5. Explain when and why you would use a one-way ANOVA instead of multiple t-tests when comparing more than two groups

Using a one-way ANOVA instead of multiple t-tests when comparing more than two groups is important for several reasons:

1)Control of Type I Error Rate: When you perform multiple t-tests, each test carries a risk of committing a Type I error (incorrectly rejecting the null hypothesis). If you conduct several t-tests, the overall risk of a Type I error increases. One-way ANOVA maintains the Type I error rate at a desired level (e.g., 0.05) across all comparisons, making it a more reliable method.

2)Efficiency: One-way ANOVA allows for the comparison of all group means simultaneously in a single test. This is more efficient than performing multiple t-tests, as it requires less computational effort and time.

3)Assumption Testing: ANOVA also provides a framework for assessing whether there are significant differences among group means, while allowing for the examination of underlying assumptions (like homogeneity of variances) in one go.

4)Post-hoc Testing: If the ANOVA indicates significant differences, post-hoc tests can be conducted to determine which specific groups differ. This stepwise approach is clearer and more organized than multiple t-tests, where one would have to manually compare each pair.

In summary, one-way ANOVA is preferred over multiple t-tests for comparing three or more groups because it controls the error rate, is more efficient, and allows for a structured analysis of differences among group means.

# 7. Compare the classical (frequentist) approach to ANOVA with the Bayesian approach. What are the key differences in terms of how they handle uncertainty, parameter estimation, and hypothesis testing?

The classical (frequentist) approach to ANOVA and the Bayesian approach differ significantly in their handling of uncertainty, parameter estimation, and hypothesis testing. Here are the key differences:

1. Handling Uncertainty:

Frequentist Approach:

Uncertainty is addressed through the concept of sampling distributions and p-values. The focus is on long-run frequencies of events under repeated sampling.
Confidence intervals are used to provide a range of values within which the true parameter is expected to lie, based on the data at hand.
Bayesian Approach:
Uncertainty is quantified through probability distributions. Bayesian methods treat parameters as random variables with their own distributions (prior, posterior).
The focus is on updating beliefs about the parameters based on observed data, leading to posterior distributions that reflect both prior information and data evidence.

2. Parameter Estimation:

Frequentist Approach:

Parameters are fixed but unknown values. Estimation is done using point estimates (e.g., means) and intervals (confidence intervals) without incorporating prior beliefs.
ANOVA focuses on estimating group means and variances based on sample data.
Bayesian Approach:

Parameters are treated as random variables, and estimation is done using the posterior distribution. The Bayesian approach provides a full distribution of parameter estimates rather than single point estimates.
It incorporates prior distributions, which can reflect previous knowledge or beliefs about the parameters.
3. Hypothesis Testing:

Frequentist Approach:

Hypothesis testing is done using null and alternative hypotheses, often assessed through p-values. A low p-value indicates significant evidence against the null hypothesis.
The decision to reject or fail to reject the null hypothesis is made based on a predetermined significance level (e.g., α = 0.05).
Bayesian Approach:

Bayesian hypothesis testing uses the posterior probabilities to evaluate hypotheses. For instance, the Bayes Factor can be calculated to compare the strength of evidence for different hypotheses.
Instead of a binary decision (reject/fail to reject), Bayesian analysis provides probabilities that reflect how much more likely one hypothesis is compared to another, allowing for a more nuanced interpretation.

# 8. Question: You have two sets of data representing the incomes of two different professions1
# Profession A2 [48, 52, 55, 60, 62]
# Profession B2 [45, 50, 55, 52, 47] Perform an F-test to determine if the variances of the two professions'
# incomes are equal. What are your conclusions based on the F-test?

# Task2 Use Python to calculate the F-statistic and p-value for the given data.

# Objective2 Gain experience in performing F-tests and interpreting the results in terms of variance comparison
# New Section

In [1]:
import numpy as np
import scipy.stats as stats

# Data for the two professions
profession_A = [48, 52, 55, 60, 62]
profession_B = [45, 50, 55, 52, 47]

# Calculate the variances
var_A = np.var(profession_A, ddof=1)
var_B = np.var(profession_B, ddof=1)

# Calculate the F-statistic
F_statistic = var_A / var_B

# Degrees of freedom
df_A = len(profession_A) - 1
df_B = len(profession_B) - 1

# Calculate the p-value
p_value = stats.f.sf(F_statistic, df_A, df_B)  # Right-tail test

# Output the results
print(f"F-statistic: {F_statistic}")
print(f"p-value: {p_value}")



F-statistic: 2.089171974522293
p-value: 0.24652429950266966


# 9. Question2 Conduct a one-way ANOVA to test whether there are any statistically significant differences in average heights between three different regions with the following data1

# Region A2 [160, 162, 165, 158, 164]

# Region B2 [172, 175, 170, 168, 174]

# Region C2 [180, 182, 179, 185, 183]

# New Section

#Task2 Write Python code to perform the one-way ANOVA and interpret the results
# Objective2 Learn how to perform one-way ANOVA using Python and interpret F-statistic and p-value

In [2]:
import numpy as np
import scipy.stats as stats

# Data for the three regions
region_A = [160, 162, 165, 158, 164]
region_B = [172, 175, 170, 168, 174]
region_C = [180, 182, 179, 185, 183]

# Perform one-way ANOVA
F_statistic, p_value = stats.f_oneway(region_A, region_B, region_C)

# Output the results
print(f"F-statistic: {F_statistic}")
print(f"p-value: {p_value}")


F-statistic: 67.87330316742101
p-value: 2.870664187937026e-07
