**scipy.stats.ttest_ind**

(a, b, axis=0, equal_var=True, nan_policy='propagate', permutations=None, random_state=None, alternative='two-sided', trim=0)

Calculate the T-test for the means of two independent samples of scores.

This is a test for the null hypothesis that 2 independent samples have identical average (expected) values. This test assumes that the populations have identical variances by default.

In [2]:
import numpy as np
from scipy import stats
rng = np.random.default_rng(12345) # set seed for reproducibilty

**features of scipy.stats **



In [7]:
from scipy.stats import norm
rv = norm()
dir(rv)

['__class__',
 '__delattr__',
 '__dict__',
 '__dir__',
 '__doc__',
 '__eq__',
 '__format__',
 '__ge__',
 '__getattribute__',
 '__gt__',
 '__hash__',
 '__init__',
 '__init_subclass__',
 '__le__',
 '__lt__',
 '__module__',
 '__ne__',
 '__new__',
 '__reduce__',
 '__reduce_ex__',
 '__repr__',
 '__setattr__',
 '__sizeof__',
 '__str__',
 '__subclasshook__',
 '__weakref__',
 'a',
 'args',
 'b',
 'cdf',
 'dist',
 'entropy',
 'expect',
 'interval',
 'isf',
 'kwds',
 'logcdf',
 'logpdf',
 'logpmf',
 'logsf',
 'mean',
 'median',
 'moment',
 'pdf',
 'pmf',
 'ppf',
 'random_state',
 'rvs',
 'sf',
 'stats',
 'std',
 'support',
 'var']

**Common methods**
The main public methods for continuous RVs are:

**rvs**: Random Variates

**pdf**: Probability Density Function

**cdf**: Cumulative Distribution Function

**sf**: Survival Function (1-CDF)

**ppf**: Percent Point Function (Inverse of CDF)

**isf**: Inverse Survival Function (Inverse of SF)

**stats**: Return mean, variance, (Fisher’s) skew, or (Fisher’s) kurtosis

**moment**: non-central moments of the distribution

The code below creates two datasets using the random variables from a normal distribution using the stats.norm.rvs function in the scipy.stats package. 
Both datasets shoudl contain the exact same numbers in the distribution.

In [3]:
#rvs1= random variables
# Shifting and scaling:All continuous distributions take loc and scale as keyword parameters to adjust the location and scale of the distribution, e.g., 
#for the standard normal distribution, the location is the mean and the scale is the standard deviation.
#To generate a sequence of random variates, use the size keyword argument:
rvs1 = stats.norm.rvs(loc=5, scale=10, size=500, random_state=rng)
rvs2 = stats.norm.rvs(loc=5, scale=10, size=500, random_state=rng)
stats.ttest_ind(rvs1, rvs2)
# p-value is >0.05 so you would fail to reject the null hypothesis that these populations are similiar 

Ttest_indResult(statistic=-0.664108846106885, pvalue=0.5067740450450016)

In [4]:
rvs1 = stats.norm.rvs(loc=5, scale=10, size=500, random_state=rng)
rvs2 = stats.norm.rvs(loc=5, scale=10, size=500, random_state=rng)
stats.ttest_ind(rvs1, rvs2, equal_var=False) # the test above assumes equal variance this does not and so the p value will change.ttest_ind underestimates p for unequal variances:
# p value is <0.05 so you would reject the null hypothesis that these populations are similiar because of the unequal variance

Ttest_indResult(statistic=2.3133511468559473, pvalue=0.02090595957658397)

In [12]:
# rsv3 is introduced which has the same sample size and mean but standard deviation is larger.
# p-value here is >0.05 so you would fail to reject the null hypothesis that these poulations are similiar
rvs3 = stats.norm.rvs(loc=5, scale=20, size=500, random_state=rng) 
stats.ttest_ind(rvs1, rvs3)

Ttest_indResult(statistic=0.43319113173052654, pvalue=0.6649695277098774)

In [13]:
# The variance does not seem to matter in this instance because when rsv1 != rsv3 due to the difference in standard deviation
stats.ttest_ind(rvs1, rvs3, equal_var=False )

Ttest_indResult(statistic=0.43319113173052654, pvalue=0.6650039826255966)

In [14]:
rvs4 = stats.norm.rvs(loc=5, scale=20, size=100, random_state=rng)
stats.ttest_ind(rvs1, rvs4)
# p-value is <0.05 - reject the null hypothesis. This makes sense because the sample standard deviation and size are different in both these populations

Ttest_indResult(statistic=3.367556612906114, pvalue=0.0008070826619777821)

In [15]:
stats.ttest_ind(rvs1, rvs4, equal_var=False )
# 

Ttest_indResult(statistic=2.149386766259157, pvalue=0.03383318995219999)

# References: #
https://docs.scipy.org/doc/scipy/tutorial/stats.html?highlight=scipy%20ttests
