# PUI2016 HW4 - jp4772
## Assignment 3

In [5]:
import numpy as np
import pandas as pd

## Hypotheses

To recreate the results in the journal article we must first define the Null ($H_0$) and Alternative ($H_1$) hypotheses.

In social science, significant levels often fall in 3 categories: 1 percent, 5 percent, and 10 percent. In this case, since we know the outcome is not-sigificant, we'll choose the most lenient level, that is a p-value of 0.10 ($\alpha = 0.10$)

**Null Hypothesis**: The percent of former inmates that are convicted of a felony after participating in the rehabilitation program ($P_0$) is the *same or higher* as the inmates who did not participate in the program ($P_1$). 

$$
H_0 : P_0 - P_1 \geq 0
$$ 

**Alternate Hypothesis**:  The percent of former inmates that are convicted of a felony after participating in the rehabilitation program ($P_0$) is *lower* then the percent of inmates who did not participate in the program ($P_1$). 

$$
H_a : P_0 - P_1 < 0
$$

## Z-test

Looking at the data in the table below, recreated from the article, we know percentages for $P_0$ and $P_1$:

$P_0 = 0.10$ and $P_1 = 0.117$

| Outcome               | Program Group | Control Group | Difference | P-Value |
|:----------------------|:--------------|:--------------|:-----------|:--------|
| Convicted of a felony | 10.0          | 11.7          | -1.6       | 0.419   |
| Sample size           | 568           | 409           |            |         |

In [1]:
alpha = 0.10

p0 = 0.10
p1 = 0.117

if p0-p1 >= 0:
    print ("the Null holds")
else:
    print ("we must assess the statistical significance")

we must assess the statistical significance


In [2]:
n0 = 409
n1 = 568

#lets get the counts by multiplying by the sample size
Nt0 = p0 * n0
Nt1 = p1 * n1

In [18]:
p = lambda p0, p1, n0, n1: (p0 * n0 + p1 * n1) / (n0 + n1)
se = lambda p, n0, n1: np.sqrt(p * (1 - p) * (1.0 / n0 + 1.0 / n1))
zscore = lambda p0, p1, s: (p0 - p1) / s

print 'z-score: ', zscore(p1, p0, se(p(p0, p1, n0, n1), n0, n1))

z-score:  0.83820087874


Based on the chart below, $P = 0.7967$

![](http://intersci.ss.uci.edu/wiki/images/3/3a/Normal01.jpg)

In [21]:
# p-value is 1 - P, as defined above
p = 1 - 0.7967

def report_result(p, a):
    print ('is the p value ' + 
           '{0:.2f} smaller than the critical value {1:.2f}?'.format(p,a))
    if p < a:
        print ("YES!")
    else: 
        print ("NO!")
    
    print ('the Null hypothesis is {}'.format(\
                            'rejected' if p < a  else 'not rejected') )

report_result(p, alpha)

is the p value 0.20 smaller than the critical value 0.10?
NO!
the Null hypothesis is not rejected


With a p-value of 0.20, **the Null hypothesis stands**.

## $\chi^2$ (Chi-square) test

**Contigency Table**

| convicted of a felony | yes         | no          |
|:----------------------|:------------|:------------|:----|
| Program Group         | 0.10 * 568  | 0.90 * 568  | 568 |
| Control Group         | 0.117 * 409 | 0.883 * 409 | 409 |
|                       |             |             |     |
| total                 | 104.653     | 872.347     | 977 |

In [13]:
print 'yes total:', (0.10 * 568) + (0.117 * 409)
print 'no total:', (0.90 * 568) + (0.883 * 409)
print 'total total:', (0.10 * 568) + (0.117 * 409) + (0.90 * 568) + (0.883 * 409)

yes total: 104.653
no total: 872.347
total total: 977.0


In [10]:
def evalChisq(values):
    '''
    Calculate Chi-sq from an array of values
    Written by Frederica Bianco
    '''
    values = np.array(values)
    E = np.empty_like(values)
    for j in range(len(values[0])):
        for i in range(2):
            
            E[i][j] = ((values[i,:].sum() * values[:,j].sum()) / 
                        (values).sum())
    return ((values - E)**2 / E).sum()

In [23]:
Ntot = 977
sample_values = np.array([[0.1 * 568, 0.9 * 568], [0.117 * 409, 0.883 * 409]])

print 'chisq: ', evalChisq(sample_values)

chisq:  0.718493917505


![](http://passel.unl.edu/Image/Namuth-CovertDeana956176274/chi-sqaure%20distribution%20table.PNG)

Our problem has 1-degree of freedom, as we have 2 samples (control and program groups) and 1 variable (convicted of a felony). We set the $\alpha = 0.10$ to be as lenient as possible. This results in a *minimum* Chi-Square score of 2.71. Our $\chi^2 = 0.718$ above, our **Null hypothesis stands**.



## Conclusion

#### Convicted of a Felony

For both the Z-test and the $\chi^2$ test the **Null hypothesis stands**. This means we can not confidently say that the percent of former inmates that are convicted of a felony after participating in the rehabilitation program ($P_0$) is *lower* then the percent of inmates who did not participate in the program ($P_1$).

#### Employed after Release

This is not the case for the of former prisoners employed after release. The Null hypothesis being: the percent of former prisoners employed after release who took part in the program is the *same or lower* as those who did not participate in the program. The Null hypothesis in this case is rejected in both tests. A z-score of 20.77 equaling a p-value of 0.0001 and a $\chi^2$ score of 436.22 easily passes the confidence interal of 0.05 set before the tests. Thus, the **Null hypothesis is rejected**.