Student's t-distribution is defined as the distribution of the random variable t which is (very loosely) the "best" that we can do not knowing sigma.



The t-distribution plays a role in a number of widely used statistical analyses, including Student's t-test for assessing the statistical significance of the difference between two sample means, the construction of confidence intervals for the difference between two population means, and in linear regression analysis. The Student's t-distribution also arises in the Bayesian analysis of data from a normal family

When should the T distribution be used?
You must use the t-distribution table when working problems when the population standard deviation (σ) is not known and the sample size is small (n<30). General Correct Rule: If σ is not known, then using t-distribution is correct. If σ is known, then using the normal distribution is correct.

What is the T value?
When you perform a t-test, you're usually trying to find evidence of a significant difference between population means (2-sample t) or between the population mean and a hypothesized value (1-sample t). The t-value measures the size of the difference relative to the variation in your sample data.


In [None]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
from scipy import stats

np.random.seed(1)

In [None]:
#Studnt, n=999, p<0.05, 2-tail
#equivalent to Excel TINV(0.05,999)
print (stats.t.ppf(1-0.025, 999))

In [None]:
from scipy.stats import t
fig, ax = plt.subplots(1, 1)
df = 2.74335149908
mean, var, skew, kurt = t.stats(df, moments='mvsk')

x = np.linspace(t.ppf(0.01, df),t.ppf(0.99, df), 100)
ax.plot(x, t.pdf(x, df),'r-', lw=5, alpha=0.6, label='t pdf')

rv = t(df)
ax.plot(x, rv.pdf(x), 'k-', lw=2, label='frozen pdf')

vals = t.ppf([0.001, 0.5, 0.999], df)
np.allclose([0.001, 0.5, 0.999], t.cdf(vals, df))

r = t.rvs(df, size=1000)
ax.hist(r, normed=True, histtype='stepfilled', alpha=0.2)
ax.legend(loc='best', frameon=False)
plt.show()

F Distribution

In probability theory and statistics, the F-distribution, also known as Snedecor's F distribution or the Fisher–Snedecor distribution is a continuous probability distribution that arises frequently as the null distribution of a test statistic, most notably in the analysis of variance (ANOVA)


Why do we use the F distribution?
The main use of F-distribution is to test whether two independent samples have been drawn for the normal populations with the same variance, or if two independent estimates of the population variance are homogeneous or not, since it is often desirable to compare two variances rather than two averages.

What is the F test?
An F-test is any statistical test in which the test statistic has an F-distribution under the null hypothesis. It is most often used when comparing statistical models that have been fitted to a data set, in order to identify the model that best fits the population from which the data were sampled.

Can an F test be negative?
Thus, any F -statistic will always be non-negative. For a given sample, it is possible to get 0 if all conditional means are identical, or undefined if all data exactly equal the conditional means, but these are extremely unlikely to happen in practice even if the null hypothesis is completely true


In [None]:
from scipy.stats import f
fig, ax = plt.subplots(1, 1)

#Calculate a few first moments:
dfn, dfd = 29, 18
mean, var, skew, kurt = f.stats(dfn, dfd, moments='mvsk')

#Display the probability density function (pdf):
x = np.linspace(f.ppf(0.01, dfn, dfd),f.ppf(0.99, dfn, dfd), 100)
ax.plot(x, f.pdf(x, dfn, dfd),'r-', lw=5, alpha=0.6, label='f pdf')

# Alternatively, the distribution object can be called (as a function) to fix the shape, 
#location and scale parameters. This returns a “frozen” RV object holding the given parameters fixed.
#Freeze the distribution and display the frozen pdf:
rv = f(dfn, dfd)
ax.plot(x, rv.pdf(x), 'k-', lw=2, label='frozen pdf')

#Check accuracy of cdf and ppf:
vals = f.ppf([0.001, 0.5, 0.999], dfn, dfd)
np.allclose([0.001, 0.5, 0.999], f.cdf(vals, dfn, dfd))

#Generate random numbers:
r = f.rvs(dfn, dfd, size=1000)

#And compare the histogram:
ax.hist(r, normed=True, histtype='stepfilled', alpha=0.2)
ax.legend(loc='best', frameon=False)
plt.show()

Chi Square Test

A statistical method assessing the goodness of fit between a set of observed values and those expected theoretically.


In [None]:
from scipy.stats import chisquare
print (chisquare([16, 18, 16, 14, 12, 12]))
print (chisquare([16, 18, 16, 14, 12, 12], f_exp=[16, 16, 16, 16, 16, 8]))
