Before you turn this problem in, make sure everything runs as expected. First, **restart the kernel** (in the menubar, select Kernel$\rightarrow$Restart) and then **run all cells** (in the menubar, select Cell$\rightarrow$Run All).

Make sure you fill in any place that says `YOUR CODE HERE` or "YOUR ANSWER HERE", as well as your name and collaborators below:

In [112]:
NAME = "Hegyi Gáspár András"
COLLABORATORS = ""

---

# Assignment 1: Fisher's and $\chi^2$-test

In [113]:
# setup
import pandas as pd
import numpy as np
import scipy as sp
import scipy.stats
import math

def assert_approx_equal(value1,value2):
    assert np.isclose(value1,value2)

A web authoring company uses A/B testing to select the best website design. A/B testing involves creating multiple versions (*A* and *B*) of the website in a campaign to test their effectiveness. Marketers split their audience and direct them to different versions to determine which performs better.


**Workflow of A/B testing:**
1. **Hypothesis:** Formulate a hypothesis about changes that might improve the click-through rate.
2. **Variations:** Create two versions of the original site with specific changes.
3. **Traffic Split:** Divide incoming traffic equally between the two website versions.
4. **Testing Duration:** Run the test until statistically significant results are obtained.
5. **Data Analysis:** Analyze the data to determine which version performs better.

When comparing two phenomena (like the efficacy of drugs), we use statistical tests. For comparing, e.g., means of two continuous variables, Student's t-test is used. When we observe counts of occurrences of phenomena, we usually use the $\chi^2$-test. 
However, Fisher's exact test is more suitable when the observed counts are small and in the form of a four-field contingency table. 

A/B testing allows the company to take a scientific approach to marketing.
The company prepared two versions of the website, denoted as *A* and *B*. Users who visit the web page are shown one of the two new versions of the page. We will compare the efficacy of the versions *A* and *B*. We observed the number of clicks within the site.
The following contingency table summarizes the results. In the table, $N$ denote the total number of visitors, $a$ is the number of visitors who saw the version *A* and followed a link
within *A*, $b$ is the number of visitors of version *A* that did not follow 
any link within the website, ...


#### Table 1
|          | Click YES | Click NO | | Row sum |	
|----------|-----------|----------|-|---------|
| Sample A |   $a$     |   $b$    | | $a+b$   |
| Sample B |   $c$     |   $d$    | | $c+d$   |
|__________|___________|__________| |_________|
| Sum      |  $a+c$    |  $b+d$   | | $N$     |

At first, we must formulate the so-called null hypothesis.

> Samples $A$ and $B$ were taken from the same probability distribution, and the differences between them
> were caused by accidents only. In other words, the efficacies of both versions $A$ and $B$ are the same. 

## Task 1 
Assuming that the null hypothesis is true, **derive the formula for computing the probability** that the table with results will have the same values as in *Table 1* for given values $a$, $b$, $c$, and $d$ ($N=a+b+c+d$). 

**Remarks:**

* Stating that the probability corresponds to the hypergeometric probability distributions is insufficient. You should explain the meaning of all binomial coefficients or factorials in the formulas you will use!
* You can write the complete derivation of the formula for the probability of *Table 1* by hand and submit it as a scanned picture (or an image captured by, e.g., a mobile phone) in a separate file.
* You can get at most **3 points** for this part.

**Here, you can write your answer or the file name where the derivation is.**

1. Assume the null hypothesis, that samples A and B are taken from the same probability distribution. We can reformulate the question to what is the probability that `a` people clicked yes on website `A` with the given marginal values. Since the marginal values are known, the other table values (`b,c,d`) are determined.

2. Suppose the samples are labeled. If the samples are taken uniformly, it means that there are $n!$ possible orderings if the samples. 

3. Count the possibilities that there are exactly `a` people clicking on website `A`. Let's say we first get all the samples that click yes, which is (a+c). This means we have to count the possible ways to have `a` number of samples belonging to website `A`. This is $(a+c)\choose a$ ways.

4. By the same idea, the possible ways of having `b` people not clicking on website `A` from $b+d$ people who did not click is $(b+d)\choose b$.

5. `a` and `b` samples are chosen for website `A`, however since the samples are labeled there are $(a+b)!$ ways of ordering them. Similarly for website `B`, there are $(c+d)!$ ways of ordering them.

6. The final probability is the product of the combinations and permutations, over the permutations of all of the elements. This is the hyperggeometric distribution with the variable `a`.  
$$P(a_{11} = a)=\frac{{(a+c)\choose a} {(b+d)\choose b} (a+b)!(c+d)!}{n!} = \frac{{(a+c)\choose a}{(b+d)\choose b}}{n\choose (a+b)}$$  

7. Replacing the marginal values with $s_1,s_2$ for the columns and $r_1,r_2$ for the rows,we get the following equation:
$$P(a_{11} = a)=\frac{s_1!s_2!r_1!r_2!}{n!a!b!c!d!} = \frac{{r_1\choose a}{r_2\choose b}}{n\choose s_1}$$

## Task 2
In a campaign, the following counts were observed:

#### *Table 2*
|          | Click YES | Click NO | | Row sum |	
|----------|-----------|----------|-|---------|
| Sample A |   4       |   10     | | 14      |
| Sample B |   7       |   3      | | 10      |
|__________|___________|__________| |_________|
| Sum      |  11       |   13     | | 24      |

Implement function `table_probability(t)` that computes the probability of *Table 1* (`t` is a 2-by-2 numpy array with the values $a$, $b$, $c$ and $d$ in two rows and two columns) when assuming the null hypothesis is valid, using the formula you have derived in **Task 1**. 

In [114]:

def getMarginals(t):
    """
    Returns
    -------
    s1,s2,r1,r2
    """
    a,b,c,d = t[0,0],t[0,1],t[1,0],t[1,1]
    s1 = a+c
    s2 = b+d
    r1 = a+b
    r2 = c+d
    return s1,s2,r1,r2

def hypergeometric(k,N,K,n):
    return math.comb(K,k)*math.comb(N-K,n-k)/math.comb(N,n)

def table_probability(t):
    a = t[0,0]
    s1,s2,r1,r2 = getMarginals(t)
    N = np.sum(t)
    k = a
    K = r1
    n = s1
    #numerator = math.factorial(s1)*math.factorial(s2)*math.factorial(r1)*math.factorial(r2)
    #nominator = math.factorial(n)*math.factorial(a)*math.factorial(b)*math.factorial(c)*math.factorial(d)
    return hypergeometric(k,N,K,n)

Using the function, calculate the probability of *Table 2*. Additionally, your implementation should pass all the tests below and some additional hidden tests (**1 point for this part**).

In [115]:
table = np.array([[5,5], [5,5]])
assert_approx_equal(table_probability(table),0.343718)

table_2 = np.array([[4, 10], [7, 3]])
print(f"{table_2} has probability {table_probability(table_2)}")


[[ 4 10]
 [ 7  3]] has probability 0.04812222371786243


## Task 3

The difference between versions *A* and *B* of the website is evident. Is this difference statistically significant? That is, assuming that both samples *A* and *B* are from the same probability distribution, what is the probability that two samples differ to the same or even higher extent? If this probability is small, e.g., at most $\alpha=0.05$, we can state with high confidence $(1−\alpha)=0.95$ that the null hypothesis is not valid. Based on the marginal sums ($a+b$, $c+d$, $a+c$, and $b+d$), we can easily compute that the expected value of the field $a$ is approximately $6.16$. The notion of "differing to the same or even greater extent" can be understood in two ways:

1. one-sided &ndash; only the values of $a$ that are on one side from the expected value; in our case, the values 8 and 9, or
2. two-sided &ndash; all the values of $a$ such that $|a−6.16|\ge 8−6.16$; in our case, the values 0, 1, 2, 3, 4, 8, and 9.
  
In case 1, we use a one-tailed test; in case 2, we use a two-tailed test.

Answer the following question (**1 point**)

> **Question:** In general (with sufficient data), which of the four combinations of tests {one-tiled, two-tailed}×
{Fisher's test, χ2-test} are meaningful?

**Your answer goes here.**

YOUR ANSWER HERE

Using function `table_probability`, implement the function `Fisher_p_value(table, alternative)` that for the contingency table `table` of dimension 2 $\times$ 2 computes the p-value of Fisher's test. The parameter `alternative` can have only two values:
1. For `alternative` equal to 'two-tailed', the function returns the p-value of the two-tailed Fisher's test.
2. For `alternative` equal to 'one-tailed', the function returns the p-value of the one-tailed Fisher's test, i.e., the probability that 
   the observed counts are as seen in `table` or are more extreme &ndash; further from the expected ones *in the same 
   direction* from the expected counts as `table`. If `table` contains exactly the expected counts, we will consider the direction for the values in the upper left corner less or equal to $a$.

In [116]:
def tableProbabilities(t):
    marginals= getMarginals(t)
    minMarginal = np.argmin(marginals)
    s1,s2,r1,r2 = marginals
    if(minMarginal == 1):
        t = t[:,[1,0]]
        s1,s2 = s2,s1
    elif(minMarginal == 3):
        t = t[[1,0]]
        r1,r2 = r2,r1
    N = np.sum(t)
    return np.array([table_probability(np.array([[k,r1-k],[s1-k,N+k-s1-r1]])) for k in range(0,marginals[minMarginal]+1)])

def Fisher_p_value(table, alternative='two-tailed'):
    if table.any()<0:
        ValueError("Table can not contain negative value!")
    allProbabilities = tableProbabilities(table)
    tableProbability = table_probability(table)
    if alternative == 'two-tailed':
        return sum(allProbabilities[np.where(allProbabilities<=tableProbability)])
    ## Else get the minimum value of the array, then check its expected value, and based on that 
    ## decide the direction of the sums
        
    # return p-value of the Fisher's exact test for the contingency table table
    # For alternative equal 'two-tailed', it returns the p-value of the two-sided test    
    # For alternative equal 'one-tailed', it returns the p-value of the one-sided test
    # Implement the test using the function table_probability().
    # YOUR CODE HERE


In [117]:
minPos = np.argwhere(table == table.min())
minPos[0]

array([0, 0], dtype=int64)

Now, we can compute p-values for both alternatives of the Fisher's test for *Table 2*. In the following cell, the function `Fisher_p_value` will be tested (including some hidden tests) and applied on `table_2` (**2 points** of the score).

In [118]:
t = np.array([[5,5], [5,5]])
alternative = 'two-tailed'
p = Fisher_p_value(t, alternative)
print(f"p-value of {alternative} Fisher's test for the table\n{t=}\n is {p}")
assert_approx_equal(p, 1.0)

print('_'*30)

t = np.array([[5,5], [5,5]])
alternative = 'one-tailed'
p = Fisher_p_value(t, alternative)
print(f"p-value of {alternative} Fisher's test for the table\n{t=}\n is {p}")
assert_approx_equal(p, 0.6718591)

print('='*20)

t = np.array([[38,5], [20,9]])
alternative = 'one-tailed'
p = Fisher_p_value(t, alternative)
print(f"p-value of {alternative} Fisher's test for the table\n{t=}\n is {p}")
assert_approx_equal(p, 0.042128934)

alternative = 'two-tailed'
p = Fisher_p_value(t, alternative)
print(f"p-value of {alternative} Fisher's test for the table\n{t=}\n is {p}")
assert_approx_equal(p, 0.0667809135)

print("="*60 + "\n" + "="*60)
one_tailed_Fisher_p_value = Fisher_p_value(table_2, 'one-tailed')
two_tailed_Fisher_p_value = Fisher_p_value(table_2, 'two-tailed')
print(f"For the table\n{table_2}")
print(f"The p-value of the Fisher's one-tailed test is {one_tailed_Fisher_p_value}")
print(f"The p-value of the Fisher's two-tailed test is {two_tailed_Fisher_p_value}")


p-value of two-tailed Fisher's test for the table
t=array([[5, 5],
       [5, 5]])
 is 1.0
______________________________
p-value of one-tailed Fisher's test for the table
t=array([[5, 5],
       [5, 5]])
 is None


TypeError: ufunc 'isfinite' not supported for the input types, and the inputs could not be safely coerced to any supported types according to the casting rule ''safe''

While ignoring the requirement that the χ2-test can be used only if all expected counts in the contingency table are at least 5, for all meaningful combinations of the χ2-test for *Table 2* compute χ2-statistics and its corresponding p-value.

For computing the χ2-test use suitable functions from the Python `scipy` module like `scipy.stats.chi2.pdf()`, `scipy.stats.chi2.cdf()`, `scipy.stats.chi2.sf()`, and `scipy.stats.chi2.isf()`. Your code should end with storing the value of the χ2-statistics of any of the meaningful combinations into variable `x2_stat` and its corresponsing p-value in the variable `x2_p_value`. 

In [None]:
# here goes your code for computing  χ2-test using suitable functions 
# from the Python scipy module like scipy.stats.chi2.cdf(), 
# scipy.stats.chi2.sf() and scipy.stats.chi2.isf()
# BUT NOT scipy.stats.chi2_contingency()

x2_stat = ...
x2_p_value = ...
# YOUR CODE HERE
raise NotImplementedError()

NotImplementedError: 

 In the following cell, your results (`x2_stat` and `x2_p_value`) will be evaluated (**1 point**).

In [None]:
# here, your solution will be evaluated
print(f"{x2_stat=}")
print(f"{x2_p_value=}")


Of course, `scipy` contains functions for computing Fisher's exact test and χ2-test. Compute the above tests for *Table 2* using the functions `scipy.stats.fisher_exact()` and `scipy.stats.chi2_contingency()`  (**1 point**).

In [None]:
# YOUR CODE HERE
raise NotImplementedError()

## Final evaluation

1. State explicitly whether we can or cannot reject the null hypothesis for each test you have performed for *Table 2*.
2. Further, compare the results obtained with your implementation of the tests and the results obtained when using the functions 
   `scipy.stats.fisher_exact()` and `scipy.stats.chi2_contingency()`. If there are any differences, explain them (**1 point**). 

**Here Your answer goes here.**

YOUR ANSWER HERE