## Null Hypothesis
    In the classical setup, we have a null hypothesis, H_0, that represents some default position, and some alternative hypothesis, H_1,that we’d like to compare it with. We use statistics to decide whether we can reject H_0 as false or not.

In [6]:
from typing import Tuple
import math

def normal_approximation_to_binomial(n: int, p: float) -> Tuple[float, float]:
    """Returns mu and sigma corresponding to a Binomial(n, p)"""
    mu = p * n
    sigma = math.sqrt(p * (1 - p) * n)
    return mu, sigma

Estimate of the simulation

In [7]:
from scratch.probability import normal_cdf

# the normal cdf is the probability the variable is below threshold
normal_probability_below = normal_cdf

# it is above threshold if it is not below threshold
def normal_probability_above(lo:float,
                             mu:float=0,
                             sigma:float=1) -> float:
    
    return 1-normal_cdf(lo,mu,sigma)


def normal_probability_between(lo:float,
                               hi: float,
                             mu:float=0,
                             sigma:float=1) -> float:
    
    return (normal_cdf(hi, mu, sigma) - normal_cdf(lo,mu,sigma))

# It is outside if it's not between'
def normal_probability_outside(lo:float,
                               hi:float,
                             mu:float=0,
                             sigma:float=1) -> float:
    # The probabability that N(mu,sigma) is not between lo and hi
    return (1 - normal_probability_between(lo,hi,mu, sigma))

<Figure size 432x288 with 0 Axes>

<font color = 'brown'> For example, if we want to find an interval centered at the mean and containing 60% probability, then we find the cutoffs where the upper and lower tails each contain 20% of the probability (leaving 60%)

In [8]:
from scratch.probability import inverse_normal_cdf
from typing import Tuple

def normal_upper_bound(probability: float,
                       mu: float = 0,
                       sigma: float = 1) -> float:
    """Returns the z for which P(Z <= z) = probability"""
    
    return inverse_normal_cdf(probability, mu, sigma)

def normal_lower_bound(probability: float,
                       mu: float = 0,
                       sigma: float = 1) -> float:
    """Returns the z for which P(Z >= z) = probability"""
    
    return inverse_normal_cdf(1 - probability, mu, sigma)

def normal_two_sided_bounds(probability: float,
                            mu: float = 0,
                            sigma: float = 1) -> Tuple[float, float]:
    """
    Returns the symmetric (about the mean) bounds
    that contain the specified probability
    """
    tail_probability = (1 - probability) / 2

    # upper bound should have tail_probability above it
    upper_bound = normal_lower_bound(tail_probability, mu, sigma)

    # lower bound should have tail_probability below it
    lower_bound = normal_upper_bound(tail_probability, mu, sigma)

    return lower_bound, upper_bound

In [9]:
mu_0 , sigma_0 = normal_approximation_to_binomial(1000, 0.5)
                                                  
# analogous to flipping a coin 1000 times.
# We as expected get normal mean of 500 with std defined

In [10]:
mu_0, sigma_0

(500.0, 15.811388300841896)

### <font color = 'red'>Null Hypothesis is rejected for signicance (false positive) of probability of 5% to 1% 
        i.e only 1 out of 20 times it will be outside the expected value range 

Consider the test that rejects $H_0$ if X falls outside the bounds given by

In [11]:
lower_bound, upper_bound = normal_two_sided_bounds(0.95, mu_0, sigma_0)

lower_bound,upper_bound

(469.01026640487555, 530.9897335951244)

## Power
    probabaility of not making false negative

<font color = 'magenta'> let’s check what happens if p is really 0.55, so that the coin is slightly biased toward heads.

In [12]:
# 95% bounds based on the assumption that p = 0.50

lo, hi = normal_two_sided_bounds(0.95,mu_0,sigma_0)

# actual mu and sigma based on p = 0.55
mu_1, sigma_1 = normal_approximation_to_binomial(1000, 0.55)

# A type 2 (Power) error means we fail to reject the null hypothesis,
# which will happen when X is still in our original interval.

type_2_probability = normal_probability_between(lo,hi,mu_1,sigma_1)

power = 1 - type_2_probability

power, type_2_probability

(0.8865480012953671, 0.11345199870463285)

    a 5% significance test involves using normal_probability_below
    to find the cutoff below which 95% of the
    probability lies:


In [13]:
hi = normal_upper_bound(0.95, mu_0, sigma_0)
hi

526.0073585242053

In [14]:
type_2_probability = normal_probability_below(hi, mu_1, sigma_1)
power = 1 - type_2_probability
power

0.9363794803307173

## P values
    assuming H_0 and then calculating probabilities 

In [15]:
def two_sided_p_value(x: float, mu: float = 0, sigma: float = 1) -> float:
    
    """
    How likely are we to see a value at least as extreme as x (in either
    direction) if our values are from a N(mu, sigma)?
    """
    
    if x >= mu:
        
        # x is greater than the mean, so the tail is everything greater than x
        return 2 * normal_probability_above(x, mu, sigma)
    else:
        
        # x is less than the mean, so the tail is everything less than x
        return 2 * normal_probability_below(x, mu, sigma)

two_sided_p_value(529.5, mu_0, sigma_0)

0.06207721579598835

In [16]:
import random

extreme_value_count = 0
for _ in range(1000):
    num_heads = sum(1 if random.random() < 0.5 else 0    # Count # of heads
                    for _ in range(1000))                # in 1000 flips,
    if num_heads >= 530 or num_heads <= 470:             # and count how often
        extreme_value_count += 1                         # the # is 'extreme'

# p-value was 0.062 => ~62 extreme values out of 1000


##### P value is greater then 5% , so our Null Hypothesis stays

In [17]:
extreme_value_count

# 54 extremes out of 1000 counts

63

## p - Hacking

In [18]:
# Experimenting

from typing import List
import random

def run_experiment() -> List[bool]:
    
    """Flips a fair coin 1000 times, True = heads, False = tails"""
    return [random.random() < 0.5 for _ in range(1000)]

def reject_fairness(experiment: List[bool]) -> bool:
    
    """Using the 5% significance levels"""
    num_heads = len([flip for flip in experiment if flip])
    return num_heads < 469 or num_heads > 531

random.seed(0)
experiments = [run_experiment() for _ in range(1000)]
num_rejections = len([experiment
                      for experiment in experiments
                      if reject_fairness(experiment)])

num_rejections, num_rejections/1000
# p value below 0.05%

(46, 0.046)

## A/B Testing
    Testing the new ad that is developed xD

In [19]:
import math
from typing import Tuple

def estimated_parameters(N:int, n:int) -> Tuple[float, float]:
    
    p = n/N
    sigma = math.sqrt(p * (1-p)/N)
    return p, sigma


If these two normals are independent, their difference should also be normal with mean $p_B - p_A$ and standard deviation $\sqrt{(\sigma_A ^ 2 + \sigma_B ^2)}$

In [20]:
def a_b_test_statistic(N_A:int, n_A:int, N_B: int, n_B: int) -> float:
    p_A, sigma_A = estimated_parameters(N_A, n_A)
    p_B, sigma_B = estimated_parameters(N_B,n_B)
    return (p_B - p_A) / math.sqrt(sigma_A ** 2 + sigma_B ** 2)

#### This would be a standard normal approximately

    For example, if “tastes great” gets 200 clicks out of 1,000 views and “less bias” gets 180 clicks out of    1,000 views, the statistic equals:

In [21]:
z = a_b_test_statistic(1000,200,1000,180)
z

-1.1403464899034472

In [22]:
two_sided_p_value(z)

0.254141976542236

# Beta Distribution

In [28]:
import math

'''alpha and beta both taken to be 100% or 1 for fair events'''


def B(alpha:float, beta:float) -> float:
    # a normalising constant so that the total probability is 1
    return math.gamma(alpha) * math.gamma(beta) / math.gamma(alpha + beta)

def A(x: float, alpha:float, beta:float) -> float:
    if x <= 0 or x >= 1:
        return 0
    return x ** (alpha-1) * (1-x) ** (beta - 1) / B(alpha, beta)


The weight of distribution is centered at 
    -> $\frac{alpha}{alpha + beta}$ 

![Screenshot%202021-12-06%20at%201.31.52%20AM.png](attachment:Screenshot%202021-12-06%20at%201.31.52%20AM.png)