## Assignment 4
### Author: Olaf Placha

**(Chi-square independence test).** 
You are given the results of IPSOS exit polls for 2015 parliamentary elections in Poland in table **data**. Decide if we can assume that gender has no effect on voting preferences. To this end:
 * Compute row totals $r_i$, column totals $c_j$, and overall total $N$.
 * If the variables are independent, we expect to see $f_{ij} = r_i c_j / N$ in $i$-th row $j$-th column.
 * Compute the test statistic as before, i.e. $$ S = \sum_{ij} \frac{\left(f_{ij}-X_{ij}\right)^2}{f_{ij}}.$$
 * Again test vs $\chi^2$ CDF. However, if the variables are independent, we only have $(r-1)(c-1)$ degrees of freedom here (we only need to know the row and column totals).
 * One obvious offender is the KORWiN party, try removing the last column and repeating the experiment.
 
**Note:** This kind of data is (to the best of our knowledge) not available online. It has been recreated based on
online infographics and other tidbits of information available online. It is definitely not completely accurate, hopefully it is not very far off. Moreover, exit polls do not necessary reflect the actual distribution of the population.

In [20]:
import numpy as np
from scipy import stats
# Rows: women, men
# Columns: PiS, PO, Kukiz, Nowoczesna, Lewica, PSL, Razem, KORWiN
# data = np.array([ [39.7,26.4, 7.5,7.1,6.6,5.0,4.2,2.8], 
#                   [38.5,20.3,10.6,7.1,6.6,5.4,3.5,7.1]])
data = np.array([[ 17508, 11642,  3308,  3131,  2911,  2205,  1852, 1235],
                 [ 17672,  9318,  4865,  3259,  3029,  2479,  1606, 3259]])

Null Hypothesis: $H_0$ - gender and voting preferences are independent <br/>
Alternative Hypothesis: $H_a$ - gender has effect on voting preferences

In [18]:
def computeChi2(data):
    """
    Function returning chi^2 value
    """
    
    N = data.sum()
    rowSums = data.sum(axis=1)
    colSums = data.sum(axis=0)
    chi2 = 0
    
    for i in range(len(data)):
        for j in range(len(data[0])):
            fij = rowSums[i] * colSums[j] / N
            chi2 += (fij - data[i][j]) ** 2 / fij
            
    return chi2

In [63]:
def chi2IndependenceTest(data):
    """
    Performs chi^2 independence test on data and returnes score. We assume that data is 2 dimensional
    """
    
    #degrees of freedom
    df = (len(data) - 1) * (len(data[0]) - 1)
    chi2 = computeChi2(data)
    score = 1 - stats.chi2.cdf(chi2, df)
    return score

Now let's test our data

In [64]:
chi2IndependenceTest(data)

0.0

The score, which is approximately (float precision) equal to 0 is a strong indicator that gender indeed affects voting preferences. Now let's remove the KORWiN party and perform the test once again

In [65]:
chi2IndependenceTest(data[:,:-1])

0.0

Still the score suggests the same as before. But if we consider only PiS, Nowoczesna and Lewica, the score lets us assume (given some significance level) that among people who voted for them, gender had no effect on voting.

In [74]:
chi2IndependenceTest(data[:,[0,3,4]])

0.34393661614092386