In [1]:
from __future__ import print_function, division
import numpy as np

# CEO Transitional Jobs Tests

**Hypotheses Statements: CEO Transitional Jobs**

*Null Hypothesis:* The % of former prisoners employed in CEO transitional jobs after release is the same or lower for candidates who participated in the program as for the control group, significance level p=0.05.

*H0 Equation:* proportion employed in CEO jobs in program group <= proportion employed in CEO jobs in control group

*Alternative Hypothesis:* The % of former prisoners employed in CEO transitional jobs after release is the higher for candidates who participated in the program as for the control group, significance level p=0.05. 

*Ha Equation:* proportion employed in CEO jobs in program group > proportion employed in CEO jobs in control group

In [2]:
# Significance level
alpha=0.05
# Proportion of control group employed in CEO job
p_0 = 3.5*0.01 
# Proportion of program group employed in CEO job
p_1 = 70.1*0.01

if p_0-p_1 >= 0:
    print ("the Null holds")
else:
    print ("we must assess the statistical significance")

# Sample size of control group
n_0 = 409
# Sample size of program group
n_1 = 564

# Count of employed in control group
Nt_0 = p_0 * n_0
# Count of employed in program group
Nt_1 = p_1 * n_1

we must assess the statistical significance


## Z-Test

In [3]:
# Sample proportion (pooled proportion)
sp = (p_0 * n_0 + p_1 * n_1) / (n_1 + n_0)
print (sp)

0.421047276465


In [4]:
def sp_stdev(p, n):
    return(np.sqrt(p * (1 - p) / n[0] +  p * (1 - p) / n[1]))

sp_stdev_2y = sp_stdev(( Nt_0 + Nt_1) / (n_0 + n_1), [n_0, n_1])
print (p_0, n_0, n_1, sp_stdev_2y)

0.035 409 564 0.0320658086057


In [5]:
def zscore(p0, p1, s):
    return((p0 - p1) / s)

z_2y = zscore(p_1, p_0, sp_stdev_2y)
print (z_2y)

20.7697865408


In [6]:
# The z-value of 20.8 is larger than what's given in the z-table, so we'll take the area value for the highest available value, 0.9998
p_2y = 1 - 0.9998


def report_result(p,a):
    print ('is the p value {0:.2f} smaller than the critical value {1:.2f}? '.format(p,a))
    if p < a:
        print ("YES!")
    else: 
        print ("NO!")
    
    print ('the Null hypothesis is {}'.format( 'rejected' if p < a  else 'not rejected') )

    
report_result(p_2y,alpha)

is the p value 0.00 smaller than the critical value 0.05? 
YES!
the Null hypothesis is rejected


**Z-Statistic Conclusion: CEO Transitional Job:**

From our z-statistic, we obtained a p-value of 0.00 from the z-stat table. 0.00 is obviously smaller than our alpha level of 0.05, we can reject the null hypothesis and conclude that the % of former prisoners employed in CEO transitional jobs after release is the higher for candidates who participated in the program as for the control group at a significance level of alpha = 0.05.

## Chi-Square Test

### The following tables were calculated by hand:

**Observed:**

|Employed|Yes|No|
|:--------------:|:------:|:----------:|
|test sample|0.701*564 = 395.364|0.299*564 = 168.636|564|
|control sample|0.035*409 = 14.315|0.965*409 = 394.685|409|
|
|Total|409.679|563.321|973|

**Expected:**

|Employed|Yes|No|
|:--------------:|:------:|:----------:|
|test sample|(564*409.679)/973 = 237.47|(564*563.321)/973 = 326.53|564|
|control sample|(409*409.679)/973 = 172.21|(409*563.321)/973 = 236.79|409|
|
|Total|409.7|563.32|973|

In [7]:
def chisqstat(N, values, expect_num):
    return(((values[0][0] * values[1][1] - values[0][1] * values[1][0])**2) * N / expect_num)

Ntot = 973
expected_num = 564 * 409 * 409.7 * 563.32
sample_values = [[0.701 * 564, 0.299 * 564], [0.035 * 409, 0.965 * 409]]
 

print (chisqstat(Ntot,  sample_values, expected_num))

431.362687242


**Chi-Square Test Conclusion: CEO Transitional Jobs:**

A chi-square value of 431 is much larger than 6.63, the highest area given in the chi-square table and associated with the smallest listed p-value of 0.01. Since this p-value is thus much smaller than our alpha level of 0.05, we can reject the null hypothesis and conclude that the % of former prisoners employed in CEO transitional jobs after release is the higher for candidates who participated in the program as for the control group at a significance level of alpha = 0.05.

# Felony Conviction Tests

**Hypotheses Statements: Convicted of a felony**

*Null Hypothesis:* Those in the program group have the same or higher rates of felonies over three years after the program than those in the control group, alpha = 0.05.

*H0 Equation:* proportion felonies in program group >= proportion felonies in control group

*Alternative Hypothesis:* Those in the program group have lower rates of felonies over three years following the program than those in the control group. 

*Ha Equation:* proportion felonies in program group < proportion felonies in control group

In [8]:
# Significance level
alpha=0.05
# Proportion of control group convicted of a felony
p_0 = 11.7*0.01 
# Proportion of program group convected of a felony
p_1 = 10.0*0.01

if p_0-p_1 >= 0:
    print ("the Null holds")
else:
    print ("we must assess the statistical significance")

# Sample size of control group (with Recidivism data)
n_0 = 409
# Sample size of program group (with Recidivism data)
n_1 = 568

# Count of felony convicts in control group
Nt_0 = p_0 * n_0
# Count of felony convicts in program group
Nt_1 = p_1 * n_1

the Null holds


## Z-Test

In [9]:
# Sample proportion (pooled proportion)
sp = (p_0 * n_0 + p_1 * n_1) / (n_1 + n_0)
print (sp)

0.107116683726


In [10]:
def sp_stdev(p, n):
    return(np.sqrt(p * (1 - p) / n[0] +  p * (1 - p) / n[1]))

sp_stdev_2y = sp_stdev(( Nt_0 + Nt_1) / (n_0 + n_1), [n_0, n_1])
print (p_0, n_0, n_1, sp_stdev_2y)

0.117 409 568 0.0200556791612


In [11]:
def zscore(p0, p1, s):
    return((p0 - p1) / s)

z_2y = zscore(p_1, p_0, sp_stdev_2y)
print (abs(z_2y))
# Absolute value used because our z-table only contains positive values
# Could have used the negative value and just not subtracted from 1 in the next step

0.84764020522


In [12]:
# Our z-statistic falls between 0.8 and 0.9 in the table, so we took the average of the two areas corresponding to these values.
area = (0.8289 + 0.8023)/2
p_2y = 1 - area


def report_result(p,a):
    print ('is the p value {0:.2f} smaller than the critical value {1:.2f}? '.format(p,a))
    if p < a:
        print ("YES!")
    else: 
        print ("NO!")
    
    print ('the Null hypothesis is {}'.format( 'rejected' if p < a  else 'not rejected') )

    
report_result(p_2y,alpha)

is the p value 0.18 smaller than the critical value 0.05? 
NO!
the Null hypothesis is not rejected


**Z-Statistic Conclusion: Convicted of a felony:**

From our z-statistic, we obtained a p-value of 0.18 from the z-stat table. 0.18 is greater than our alpha level of 0.05, so we fail to reject the null hypothesis that those in the program group have the same or higher rates of felonies over three years after the program than those in the control group.

## Chi-Square Test

In [13]:
#Contingency table calculations
a = 568 * 0.1
b = 568 * (1 - 0.1)
c = 409 * 0.117
d = 409 * (1 - 0.117)
print(a, b, c, d)

56.8 511.2 47.853 361.147


In [14]:
rowtot = a + c
coltot = b + d
Ntot = 977
rowtot

104.653

In [15]:
coltot

872.347

**Observed:**

|Convicted felony|Yes|No|
|:--------------:|:------:|:----------:|
|test sample|0.1*568 = 56.8|0.9*568 = 511.2|568|
|control sample|0.117*409 = 47.853|0.883*409 = 361.147|409|
|
|Total|104.653|872.347|977|

**Expected:**

|Convicted felony|Yes|No|
|:--------------:|:------:|:----------:|
|test sample|(568*104.653)/977 = 60.84|(568*872.347)/977 = 507.16|568|
|control sample|(409*104.653)/977 = 43.81|(409*872.347)/977 = 365.19|409|
|
|Total|104.653|872.347|977|

In [16]:
def chisqstat(N, values, expect_num):
    return(((values[0][0] * values[1][1] - values[0][1] * values[1][0])**2) * N / expect_num)

Ntot = 977
expected_num = 568 * 409 * 104.653 * 872.347
sample_values = [[0.1 * 568, 0.9 * 568], [0.117 * 409, 0.883 * 409]]
 

print (chisqstat(Ntot,  sample_values, expected_num))

0.718493917505


In [17]:
chisq_felonies = 0.718
# Chi-square value calculated by hand, matches above value

In [18]:
# Degrees of freedom = 1, since df = n - 1 (where n is the number of classes, in this case 2 - the control and program groups)
# Our chi-square value of 0.72 falls between the areas 0.455 and 1.32 in the table, returning a p-value between 0.25 and 0.5. 
# For simplicity's sake, we'll take the average of these and determine a p-value of 0.375.

# P-value
p_chi = 0.375


def report_result(p,a):
    print ('is the p value {0:.2f} smaller than the critical value {1:.2f}? '.format(p,a))
    if p < a:
        print ("YES!")
    else: 
        print ("NO!")
    
    print ('the Null hypothesis is {}'.format( 'rejected' if p < a  else 'not rejected') )

    
report_result(p_chi,alpha)

is the p value 0.38 smaller than the critical value 0.05? 
NO!
the Null hypothesis is not rejected


**Chi-Square Test Conclusion: Convicted of a felony:**

From our chi-square value, we obtained a p-value between 0.25 and 0.5 from the chi-square table. This range is greater than our alpha level of 0.05, so we fail to reject the null hypothesis that those in the program group have the same or higher rates of felonies over three years after the program than those in the control group.