From Xin Xin
1. Chi-square test
Used for: Categorical(nominal or ordinal) data represented as counts or frequencies. Requires large expected frequencies (At least 5).
Purpose: Independence? e.g., is gender associated with voting preferences?
Assumption: No normality assumption, observations are independent.

2. Correlation coefficient

3. T-test
Used for: Continuous (interval or ratio) data.
Purpose: Compare means to determine whether there is a statistically significant difference between them. e.g., comparing test scores between two classes.
Assumption: Approximately normally distributed. Variances of two groups are equal (In the independent t-test) or appropriately adjusted if unequal.

4. ANOVA
Used for: Continuous data.
Purpose: Compare the means across three or more groups to see if at least one group mean is statistically different from the others. if showing a significant result, follow-up tests(post hoc tests) are typically used to identify which groups differ.
One-Way ANOVA: For one independent variable with three or more levels(groups).
Two-Way ANOVA: For studying the effect of two independent variables, including any interaction between them.
Assumption: Normally distributed within each group. homogeneity of variances (each group has a similar variance). Observations are independent.


5. Paired t-test
Purpose: For comparing means from the same subjects measured at different times or under different conditions.

6. Runs test
Used for: Assess the randomness of a sequence. It counts the number of runs - continuous sequences of similar items (for example, a run of 0's or 1's in a binary string), and compares the observed number of runs to what would be expected if the data were produced randomly.
H0: Randomness.

7. Spearmanr
Use when you need to assess the strength and direction of a monotonic relationship between two continuous or ordinal variables (especially when data are not normally distributed or contain outliers). Spearman’s tells you about the rank-based association between two variables.
A monotonic relationship implies that variables move together, either both increasing or decreasing over time, without any directional reversal, even if the rate of change varies.

8. Kstest
H0: The data follows the specified distribution, no significant difference.

9. Mann-Whitney U test
Null Hypothesis (H₀): The distributions of the two groups are the same (no significant difference).
Alternative Hypothesis (H₁): One group has "systematically higher (or lower)" values than the other.
Instead of comparing means, it compares ranks of the combined data from both groups.
Unlike the t-test, it does not assume normality; instead, it uses the ranks of the data rather than the raw values.

10. Kruskal-Wallis H test

1. Chi-square test
The chi-square test is a statistical method used to determine whether the differences between observed frequencies in categorical data and the frequencies expected under a given hypothesis (such as independence or a specific distribution) are due to chance.
if p-value < 0.05: Reject H0 (No significant difference), i.e., The treatment generated a big impact, they're different.

In [4]:
from scipy.stats import chi2_contingency
import numpy as np

#observed = np.array([[10, 20, 30],
#                    [25, 15, 10]])
#chi2 =  16.369047619047624
#p value =  0.0002789372227504811

observed = np.array([[10, 20, 30],
                    [9, 15, 28]])
#chi2 =  0.2658104146307416
#p_value =  0.8755480837285884

observed = np.array([[10, 20, 30],
                    [9, 19, 28]])
#chi2 =  0.009318167212904068
#p_value =  0.9953517530873542

chi2, p, dof, expected = chi2_contingency(observed)

print("chi2 = ", chi2)
print("p_value = ", p)


chi2 =  0.009318167212904068
p_value =  0.9953517530873542


2. Correlation coefficient

In [7]:
from scipy.stats import pearsonr

#x = np.array([1,2,3,4,5])
#y = np.array([5,4,2,4,5])
#corr_coef =  0.0
#p_value =  1.0000000000000002

#x = np.array([1,2,3,4,5])
#y = np.array([2,3,4,5,7])
#corr_coef =  0.9863939238321437
#p_value =  0.00190127466019637

x = np.array([1,2,3,4,5])
y = np.array([2,3,4,5,2])
#corr_coef =  0.2425356250363329
#p_value =  0.6942488516293603

corr_coef, p_value = pearsonr(x, y)

print("corr_coef = ", corr_coef)
print("p_value = ", p_value)

corr_coef =  0.2425356250363329
p_value =  0.6942488516293603


3. T-test

In [15]:
from scipy import stats

#sample1 = np.array([23, 21, 19, 20, 25, 17])
#sample2 = np.array([30, 29, 33, 31, 28, 32])
#t_stat =  -6.932325934139483
#p_value =  4.032693788776049e-05

sample1 = np.array([23, 21, 19, 20, 25, 17])
sample2 = np.array([23, 21, 19, 20, 25, 18])
#t_stat =  -0.10552657229828256
#p_value =  0.9180447478749818

t_stat, p_value = stats.ttest_ind(sample1, sample2)

print("t_stat = ", t_stat)
print("p_value = ", p_value)

t_stat =  -0.10552657229828256
p_value =  0.9180447478749818


4. ANOVA

In [16]:
from scipy import stats
import numpy as np

group1 = np.array([23, 21, 19, 20, 26, 17])
group2 = np.array([30, 29, 33, 31, 28, 32])
group3 = np.array([24, 23, 25, 29, 26, 27])
#f_stat =  22.35779816513762
#p_value =  3.16246800865601e-05

f_stat, p_value = stats.f_oneway(group1, group2, group3)

print("f_stat = ", f_stat)
print("p_value = ", p_value)

f_stat =  22.35779816513762
p_value =  3.16246800865601e-05


5. Paired t-test

In [17]:
from scipy.stats import ttest_rel

before = np.array([1, 2, 3, 4, 5])
after = np.array([2, 2.5, 3.5, 4.5, 5.5])
#t_stat =  -5.999999999999998
#p_value =  0.0038825370469605155

t_stat, p_value = ttest_rel(before, after)

print("t_stat = ", t_stat)
print("p_value = ", p_value)

t_stat =  -5.999999999999998
p_value =  0.0038825370469605155


6. Runs test

In [24]:
from statsmodels.sandbox.stats.runs import runstest_1samp

#data = np.array([0, 1, 1, 0, 1, 0, 1, 1, 1, 0])
#z_stat =  0.49170755108038944
#p_value =  0.6229260993842225

data = np.array([0, 1, 0, 1, 0, 1, 0, 1, 0, 1])
z_stat =  2.3478713763747794
p_value =  0.01888104015609877

z_stat, p_value = runstest_1samp(data)

print("z_stat = ", z_stat)
print("p_value = ", p_value)

z_stat =  2.3478713763747794
p_value =  0.01888104015609877


7. Spearmanr

In [29]:
from scipy.stats import spearmanr

#x = np.array([1, 2, 3, 4, 5])
#y = np.array([5, 6, 7, 8, 10])
#rho =  0.9999999999999999
#p_value =  1.4042654220543672e-24

x = np.array([1, 2, 3, 4, 5])
y = np.array([5, 6, 15, 8, 1])
#rho =  -0.09999999999999999
#p_value =  0.8728885715695383

rho, p_value = spearmanr(x, y)

print("rho = ", rho)
print("p_value = ", p_value)


rho =  -0.09999999999999999
p_value =  0.8728885715695383


8. Kstest

In [36]:
from scipy.stats import kstest

#data = np.array([1, 2, 3, 4, 4, 5, 5, 6, 7])
#d_stat =  0.8661387569407097
#p_value =  2.760204636913194e-08

data = np.random.normal(0, 1, 10)
#d_stat =  0.2465276181710127
#p_value =  0.5017135875077006

d_stat, p_value = kstest(data, 'norm')

print("d_stat = ", d_stat)
print("p_value = ", p_value)

d_stat =  0.2465276181710127
p_value =  0.5017135875077006


9. Mann-Whitney U test

In [40]:
from scipy.stats import mannwhitneyu

#x = np.array([1, 2, 3, 4, 5])
#y = np.array([5, 6, 7, 8, 9])
#u_stat =  0.5
#p_value =  0.01597069635378012

#x = np.array([1, 2, 3, 4, 5])
#y = np.array([20, 6, -39, 8, 9])
#u_stat =  5.0
#p_value =  0.15079365079365079

x = np.array([1, 2, 3, 4, 5])
y = np.array([-3, -2, -1, 0, 1])
#u_stat =  24.5
#p_value =  0.01597069635378012

u_stat, p_value = mannwhitneyu(x, y)

print("u_stat = ", u_stat)
print("p_value = ", p_value)

u_stat =  24.5
p_value =  0.01597069635378012



10. Kruskal-Wallis H test

In [None]:
from scipy.stats import kruskal

x = np.array([1, 2, 3, 4, 5])
y = np.array([5, 6, 7, 8, 9])
z = np.array([1, 1, 2, 3, 4])
#h_stat =  9.500724637681161
#p_value =  0.008648561098752401

h_stat, p_value = kruskal(x, y, z)

print("h_stat = ", h_stat)
print("p_value = ", p_value)

h_stat =  9.500724637681161
p_value =  0.008648561098752401
