<h1><font face = "Times"; size = 15; color = "red"> Statistical Tests 

<font face="Callibre">
<i>Parametric and non-parametric tests are two broad categories of statistical tests used to analyze data in different ways, depending on the characteristics of the data and the assumptions that can be made about the population from which the data is sampled. Here's an overview of both types:</i><br>
    
1. Parametric Tests:

- T-Test: Used for comparing the means of two groups to determine if they are significantly different. It assumes that the data follows a normal distribution and that the variances of the two groups are approximately equal. Variants include the independent samples t-test and paired samples t-test.

- Analysis of Variance (ANOVA): Used to compare the means of three or more groups to determine if there are statistically significant differences among them. ANOVA assumes that the data is normally distributed and that the variances are approximately equal.

- Linear Regression: Used to model the relationship between a dependent variable and one or more independent variables. It assumes that the relationship is linear and that the residuals (the differences between observed and predicted values) are normally distributed.

- Chi-Square Test: Used for analyzing categorical data, such as frequency counts or proportions. It is used to determine if there is an association between two categorical variables.

- F-Test: Often used in ANOVA to test if there are significant differences in variances among groups. It is used to assess the homogeneity of variances assumption.

**Parametric tests are powerful when their assumptions are met, but they may produce inaccurate results if these assumptions are violated.**

2. Non-Parametric Tests:

- Mann-Whitney U Test: A non-parametric alternative to the independent samples t-test, used to compare two independent groups when the assumption of normality is not met.

- Wilcoxon Signed-Rank Test: A non-parametric alternative to the paired samples t-test, used to compare two related groups when the assumption of normality is not met.

- Kruskal-Wallis Test: A non-parametric alternative to one-way ANOVA, used to compare three or more independent groups when the assumption of normality is not met.

- Chi-Square Test of Independence: Non-parametric version of the chi-square test, used to analyze the association between two categorical variables.

- Spearman's Rank Correlation: A non-parametric alternative to Pearson's correlation coefficient, used to assess the strength and direction of the relationship between two variables when the assumption of linearity is not met.

**Non-parametric tests are less sensitive to data distribution assumptions and are often used when data do not meet the assumptions of parametric tests or when dealing with ordinal or nominal data.**

# T-test

In [1]:
import numpy as np
from scipy import stats

class_a_scores = [85, 92, 88, 78, 95]
class_b_scores = [88, 90, 84, 92, 89]

t_stat, p_value = stats.ttest_ind(class_a_scores, class_b_scores)
print(f"T-Statistic: {t_stat}")
print(f"P-Value: {p_value}")

T-Statistic: -0.3097891054043949
P-Value: 0.7646371797244251


**Interpretation: If the p-value is less than the significance level (e.g., 0.05), we can conclude that there is a significant difference in test scores between the two classes.**

# ANOVA

In [2]:
import numpy as np
from scipy import stats

school_a_scores = [85, 92, 88, 78, 95]
school_b_scores = [88, 90, 84, 92, 89]
school_c_scores = [78, 85, 80, 88, 86]

f_stat, p_value = stats.f_oneway(school_a_scores, school_b_scores, school_c_scores)
print(f"F-Statistic: {f_stat}")
print(f"P-Value: {p_value}")

F-Statistic: 1.6337625178826896
P-Value: 0.23576299034013118


**Interpretation: If the p-value is less than the significance level (e.g., 0.05), we can conclude that there is a significant difference in mean test scores among the three schools.**

# Linear Regression

In [8]:
import numpy as np
import statsmodels.api as sm

# Sample data
house_sizes = np.random.randint(1000, 2500, 50)
house_prices = np.random.randint(150, 450, 50) * 1000

# Add a constant for the intercept
X = sm.add_constant(house_sizes)

# Create the linear regression model
model = sm.OLS(house_prices, X).fit()

# Print the regression summary
print(model.summary())


                            OLS Regression Results                            
Dep. Variable:                      y   R-squared:                       0.000
Model:                            OLS   Adj. R-squared:                 -0.021
Method:                 Least Squares   F-statistic:                 0.0001869
Date:                Sun, 27 Aug 2023   Prob (F-statistic):              0.989
Time:                        17:18:59   Log-Likelihood:                -642.99
No. Observations:                  50   AIC:                             1290.
Df Residuals:                      48   BIC:                             1294.
Df Model:                           1                                         
Covariance Type:            nonrobust                                         
                 coef    std err          t      P>|t|      [0.025      0.975]
------------------------------------------------------------------------------
const       3.042e+05   5.61e+04      5.423      0.0

# Chi-square test

In [4]:
import numpy as np
from scipy.stats import chi2_contingency

data = np.array([[30, 25, 15], [40, 35, 30], [20, 15, 10], [10, 5, 8]])

chi2_stat, p_value, dof, expected_data = chi2_contingency(data)
print(f"Chi-Square Statistic: {chi2_stat}")
print(f"P-Value: {p_value}")

Chi-Square Statistic: 3.247343611357586
P-Value: 0.7772233511167025


**Interpretation: If the p-value is less than the significance level (e.g., 0.05), we can conclude that there is a significant association between "Region" and "Product Type."**

# Mann-whitney U test

In [5]:
import numpy as np
from scipy.stats import mannwhitneyu

group_a_scores = [60, 70, 75, 80, 85]
group_b_scores = [55, 65, 70, 72, 78]

stat, p_value = mannwhitneyu(group_a_scores, group_b_scores)
print(f"Mann-Whitney U Statistic: {stat}")
print(f"P-Value: {p_value}")

Mann-Whitney U Statistic: 17.5
P-Value: 0.345741825860727


**Interpretation: If the p-value is less than the significance level, we can conclude that there is a significant difference in scores between the two groups.**

# Wilcoxon Signed-Rank Test

In [6]:
import numpy as np
from scipy.stats import wilcoxon

scores_before = [60, 70, 75, 80, 85]
scores_after = [65, 75, 80, 85, 90]

stat, p_value = wilcoxon(scores_before, scores_after)
print(f"Wilcoxon Signed-Rank Statistic: {stat}")
print(f"P-Value: {p_value}")

Wilcoxon Signed-Rank Statistic: 0.0
P-Value: 0.0625


**Interpretation: If the p-value is less than the significance level, we can conclude that there is a significant difference in scores before and after the tutoring program.**