# Statistics Advance - 1 - Assignment Questions

## Q1.Explain the properties of the F distribution.

### Answer:-

#### The F-distribution is a continuous probability distribution that arises in the analysis of variances, particularly in ANOVA and regression analysis. Its properties include:

Asymmetry : The F-distribution is positively skewed, with most of its density concentrated near 0.

Non-negativity : F-values ​​cannot be negative.

Dependence on degrees of freedom : The shape of the F-distribution changes based on the degrees of freedom of the two groups being compared.

Right-tailed test : Primarily used in right-tailed tests where we assess if one variance is significantly greater than another.

## Q2. In which types of statistical tests is the F-distribution used, and why is it appropriate for these tests?

### Answer:-

#### 
The F-distribution is used in tests like ANOVA (Analysis of Variance) and F-tests for comparing variances. It is appropriate because these tests rely on variance comparisons across groups or samples. The F-distribution accounts for the ratio of variances from independent samples, which helps in assessing if observed differences are statistically significant.

## Q3. What are the key assumptions required for conducting an F-test to compare the variances of two populations?

### Answer:-

#### 
Normality : The populations from which samples are drawn should follow a normal distribution.

Independence : The samples must be independent of each other.

Equal sample sizes : While not strictly necessary, equal or similar sample sizes help improve the reliability of the F-test.

## Q4. What is the purpose of ANOVA, and how does it differ from a t-test?

### Answer:-

#### 
ANOVA (Analysis of Variance) is used to compare means across multiple groups to identify statistically significant differences. It differs from a t-test, which is typically used to compare the means of only two groups. ANOVA controls for the increased risk of Type I errors when comparing more than two groups, whereas multiple t-tests would increase the chance of error.

## Q5. Explain when and why you would use a one-way ANOVA instead of multiple t-tests when comparing more than two groups.

### Answer:-

#### 
A one-way ANOVA is preferred when comparing more than two groups because it reduces the risk of Type I error that arises from performing multiple t-tests. One-way ANOVA evaluates all group means simultaneously with a single test, making it more efficient and statistically robust than conducting multiple pairwise t-tests.

## Q6.Explain how variance is partitioned in ANOVA into between-group variance and within-group variance. How does this partitioning contribute to the calculation of the F-statistic?

### Answer:-

#### 
ANOVA splits total variance into two parts: between-group variance (variation due to differences between group means) and within-group variance (variation within each group). The F-statistic is calculated as the ratio of between-group variance to within-group variance. A larger F-statistic suggests a greater likelihood that observed differences are not due to chance.

## Q7.Compare the classical (frequentist) approach to ANOVA with the Bayesian approach. What are the key differences in terms of how they handle uncertainty, parameter estimation, and hypothesis testing?

### Answer:-

#### 
In the frequentist approach , ANOVA relies on p-values ​​and confidence intervals, assuming fixed data and random sampling. Uncertainty is expressed through confidence levels, and hypothesis testing determines whether to reject the null hypothesis.


The Bayesian approach incorporates prior beliefs or distributions in addition to observed data, updating these beliefs with the data to generate a posterior distribution. It provides direct probability statements about hypotheses, which the frequentist approach does not offer, and handles parameter estimation probabilistically rather than through fixed confidence intervals.

## Q8.F-test for Profession Incomes
Given Data :
Profession A : [48, 52, 55, 60, 62]
Profession B : [45, 50, 55, 52, 47]

To conduct the F-test:


In [1]:
import scipy.stats as stats
import numpy as np

# Data
profession_a = [48, 52, 55, 60, 62]
profession_b = [45, 50, 55, 52, 47]

# Variances
var_a = np.var(profession_a, ddof=1)
var_b = np.var(profession_b, ddof=1)

# F-statistic and p-value
f_stat = var_a / var_b
dfn = len(profession_a) - 1  # Degrees of freedom numerator
dfd = len(profession_b) - 1  # Degrees of freedom denominator
p_value = stats.f.cdf(f_stat, dfn, dfd)

f_stat, p_value


(np.float64(2.089171974522293), np.float64(0.7534757004973305))

#### Interpretation: If the p-value is below a chosen significance level (e.g., 0.05), we reject the null hypothesis, suggesting a significant difference in variances.

## Q9. One-way ANOVA for Heights Across Regions

Given Data:


Region A: [160, 162, 165, 158, 164]

Region B: [172, 175, 170, 168, 174]

Region C: [180, 182, 179, 185, 183]

To conduct the ANOVA:

In [2]:
from scipy.stats import f_oneway

# Data
region_a = [160, 162, 165, 158, 164]
region_b = [172, 175, 170, 168, 174]
region_c = [180, 182, 179, 185, 183]

# Perform one-way ANOVA
f_stat, p_value = f_oneway(region_a, region_b, region_c)

f_stat, p_value


(np.float64(67.87330316742101), np.float64(2.8706641879370266e-07))

#### Interpretation: If the p-value is below a significance level (e.g., 0.05), we conclude that there are statistically significant differences in heights between at least two of the regions.

# Thank You