# Z-Test for Hypothesis Testing

##  Let's suppose that after n=100 flips, we get h=61 heads. We choose a significance level of 0.05: is the coin fair or not? Our null hypothesis is: the coin is fair (q=1/2). We set these variables:

In [1]:
import numpy as np
import scipy.stats as st
import scipy.special as sp

n = 100  # number of coin flips
h = 61  # number of heads
q = .5  # null-hypothesis of fair coin


## Let's compute the z-score, which is defined by the following formula (xbar is the estimated average of the distribution).

In [2]:
xbar = float(h) / n
z = (xbar - q) * np.sqrt(n / (q * (1 - q)))
# We don't want to display more than 4 decimals.
z

2.1999999999999997

## Now, from the z-score, we can compute the p-value as follows

In [3]:
pval = 2 * (1 - st.norm.cdf(z))
pval

0.02780689502699718

## This p-value is less than 0.05, so we reject the null hypothesis and conclude that the coin is probably not fair.

statsmodels.stats.weightstats.ztest¶

statsmodels.stats.weightstats.ztest(x1, x2=None, value=0, alternative='two-sided', usevar='pooled', ddof=1.0)

    test for mean based on normal distribution, one or two samples

    In the case of two samples, the samples are assumed to be independent.
    Parameters:	

        x2 (x1,) – two independent samples
        value (float) – In the one sample case, value is the mean of x1 under the Null hypothesis. In the two sample case, value is the difference between mean of x1 and mean of x2 under the Null hypothesis. The test statistic is x1_mean - x2_mean - value.
        alternative (string) –

        The alternative hypothesis, H1, has to be one of the following

            ’two-sided’: H1: difference in means not equal to value (default) ‘larger’ : H1: difference in means larger than value ‘smaller’ : H1: difference in means smaller than value

        usevar (string, 'pooled') – Currently, only ‘pooled’ is implemented. If pooled, then the standard deviation of the samples is assumed to be the same. see CompareMeans.ztest_ind for different options.
        ddof (int) – Degrees of freedom use in the calculation of the variance of the mean estimate. In the case of comparing means this is one, however it can be adjusted for testing other statistics (proportion, correlation)

    Returns:	

        tstat (float) – test statisic
        pvalue (float) – pvalue of the t-test
