# Parameter estimation

One of the most basic data analysis problems is to estimate parameters of a **population**, such as the mean, based on our dataset, which is usually a small **sample** of the population. 

The characteristics of our sample, such as the mean and standard deviation, are (usually) easy to calculate. But how does our sample relate to the broader population that we want to know about? 

### A one-sample t-test

Imagine you work at a factory where the mean size of a product needs to be 10 cm. For quality control, a sample of 1000 items were measured.

In [19]:
#generate a pretend sample 
import numpy
sample = numpy.random.normal(9.9, 1.0, 1000)
print sample[0:10]

[ 10.80234123   9.59302331   8.28178776   9.99490039  10.87662616
  10.17129554   9.68596892  10.02494834   8.87973125  11.22704026]


In [20]:
#what's the mean
print numpy.mean(sample)

9.85997956223


So the mean size of items in the sample is somewhat less than 10. How do we know if this means that the true mean size of the product is less than 10 cm?

The classic statistical approach to this problem is a "one-sample t-test." This test basically takes a measure of the distance between the sample mean and the hypothesized mean of the population, and uses statistical distributions as the basis for figuring out how likely it is that our sample came from a population with that mean.

For our Two hypotheses:

$H_0$, the "null" hypothesis: The mean of the sample is NOT DIFFERENT from the comparison value.
$H_1$, the "alternative" hypothesis: The mean of the sample is DIFFERENT from the comparison value.

In classical statistical testing, we are usually trying to see if there is sufficient evidence *against the null hypothesis to reject it in favour of the alternative. We'll take up the question of *how much* different the samples need to be to 

Note: We could ask whether the mean is smaller or larger instead of just different (that's the difference between a one- and two-sided t-test). 


The basic equation for a one-sample t-test is:

$t=\frac{\bar{x}{\frac{Z}{s}}$

where:

Z = 

and 

s = 

In [None]:
Further, the t-test has three main assumptions, including:
- $X$ has a normal distribution with mean $\mu$ and variance $\sigma^2$
- s2 has a χ2 distribution with p degrees of freedom under the null hypothesis, where p is a positive constant
- Z and s are independent

## P-values

## What