# Analysis of one sample of data

This script shows how to

- Use a t-test for a single mean
- Use a non-parametric test (Wilcoxon signed rank) to check a single mean 
- Compare the values from the t-distribution with those of a normal distribution

Author: Thomas Haslwanter, Date:   June-2014

In [1]:
#filter warnings
import warnings
warnings.filterwarnings('ignore')
%pylab inline
import scipy.stats as stats
from urllib.request import urlopen

Populating the interactive namespace from numpy and matplotlib


## Check Mean

Data from Altman, check for significance of mean value:

*Take the average daily energy intake (kJ) over 10 days of 11 healthy women, and compare it to the recommended level of 7725 kJ.*

In [5]:
# Get data from Altman
inFile = 'altman_91.txt'
url_base = 'https://raw.githubusercontent.com/thomas-haslwanter/statsintro_python/master/ipynb/Data/data_altman/'
url = url_base + inFile
data = genfromtxt(urlopen(url), delimiter=',')

# Watch out: by default the SD is calculated with 1/N!
myMean = np.mean(data)
mySD = np.std(data, ddof=1)
print('Mean and SD: {0:4.2f} and {1:4.2f}'.format(myMean, mySD))

# Confidence intervals
tf = stats.t(len(data)-1)
ci = np.mean(data) + stats.sem(data)*np.array([-1,1])*tf.ppf(0.975)
print('The confidence intervals are {0:4.2f} to {1:4.2f}.'.format(ci[0], ci[1]))

# Check for significance
checkValue = 7725
t, prob = stats.ttest_1samp(data, checkValue)
if prob < 0.05:
    print('{0:4.2f} is significantly different from the mean (p={1:5.3f}).'.format(checkValue, prob))

# For not normally distributed data, use the Wilcoxon signed rank test
(rank, pVal) = stats.wilcoxon(data-checkValue)
if pVal < 0.05:
    issignificant = 'unlikely'
else:
    issignificant = 'likely'
    
print('It is ' + issignificant + ' that the value is {0:d}.'.format(checkValue))

Mean and SD: 6753.64 and 1142.12
The confidence intervals are 5986.35 to 7520.93.
7725.00 is significantly different from the mean (p=0.018).
It is unlikely that the value is 7725.


## Compare with Normal Distribution

In [6]:
# generate the data
random.seed(12345)
normDist = stats.norm(loc=7, scale=3)
data = normDist.rvs(100)
checkVal = 6.5

# t-tes
t, tProb = stats.ttest_1samp(data, checkVal)
print('t={0}, p={1}'.format(t, tProb))

# Comparison with corresponding normal distribution
mmean = mean(data)
mstd = std(data, ddof=1)
normProb = stats.norm.cdf(checkVal, loc=mmean,
        scale=mstd/np.sqrt(len(data)))*2

# compare
print('The probability from the t-test is ' + '{0:5.4f}, and from the normal distribution {1:5.4f}'.format(tProb, normProb))


t=1.9252254884316808, p=0.05707107880872914
The probability from the t-test is 0.0571, and from the normal distribution 0.0542
