<a href="https://colab.research.google.com/github/ocoropuj/PHYS434/blob/main/Homework_5_pvalue.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Homework 5


For a simple counting experiment, the expected background event is $b$ and the observed event is $n$.
The best estimator for signal event $s$ is:
$$s=n-b.$$

There are different metrics to evaluate discovery significance.

* Simplified Z0
$$ Z_{0, simple} = s/\sqrt{b}$$

* Asympototic Z0
$$ Z_{0, asymptotic} = \sqrt{2((s+b)\mathrm{ln} (1+s/b)-s)}$$

* Bayesian Z0
$$ p-value = \int_{n}^{\infty}\mathrm{Poisson}(k|b) dk$$.
$$Z_{0, Bayesian} =  \mathrm{Gauss_{1-sided}(p-value)} $$


In this exercise, we will implement each of the metric and compare consistency.


In [None]:
# @title Install Iminuit
!pip install iminuit


In [None]:
import numpy as np
import matplotlib.pyplot as plt
import iminuit.minimize as minimize
from tqdm import tqdm
import scipy
from scipy.stats import poisson, norm



In [None]:
# Define test statistics q_0 for Frequentist approach
# 1. We require signal event s >0 for positive signal yield.
#    Therefore, the test statistics q_0 is 0 if N_obs <= Nb

def q0(N_obs, Nb):
    if N_obs <= Nb:
        q0_out = 0
    else:
        s = N_obs - Nb
        q0_out = 2 * ((s + Nb) * np.log(1 + s / Nb) - s)
    return q0_out


In [None]:
# 2. Compute two Poisson loglikelihood of
#    a) backgorund only model
#    b) signal+background model
#    Evaluate -2 log likelihood ratio between a) and b)
def poisson_log_likelihood(k, lambd):
    return -2 * poisson.logpmf(k, lambd)

def background_only_log_likelihood(N_obs, Nb):
    return poisson_log_likelihood(N_obs, Nb)

def signal_plus_background_log_likelihood(N_obs, Nb, s):
    total_expected = Nb + s
    return poisson_log_likelihood(N_obs, total_expected)


Implement four metrics:

In [None]:
def SimplifiedZ0(N_obs, N_b):
    if N_obs <= N_b:
        return 0.0  # Z-score is 0 for N_obs <= N_b
    s = N_obs - N_b
    Zscore = s / np.sqrt(N_b)
    return Zscore

def AsymptoticZ0(N_obs, N_b):
    if N_obs <= N_b:
        return 0.0  # Z-score is 0 for N_obs <= N_b
    s = N_obs - N_b
    Zscore = np.sqrt(2 * ((s + N_b) * np.log(1 + s / N_b) - s))
    return Zscore


def BayesianZ0(N_obs,N_b):
    # Write your code here
    pvalue = 1-poisson.cdf(N_obs, N_b)
    Zscore= scipy.stats.norm.ppf(1-pvalue)
    return Zscore


Now, let's apply our code for numerical calculations.

Consider the case that backogrund only model with yields b=0.5 and observed events n=5.

Calclate discovery significance for each of the metric, respectively.

In [None]:
Nobs=5
Nb=0.5

# Calculate discovery significance for Simplified Z0
SimplifiedZ0 = SimplifiedZ0(Nobs, Nb)

# Calculate discovery significance for Asymptotic Z0
AsymptoticZ0 = AsymptoticZ0(Nobs, Nb)

p_value = 1 - poisson.cdf(Nobs, Nb)
BayesianZ0 = norm.ppf(1 - p_value)

print("Simplified Z0:", SimplifiedZ0)
print("Asymptotic Z0:", AsymptoticZ0)
print("Bayesian Z0:", BayesianZ0)

Simplified Z0: 6.363961030678928
Asymptotic Z0: 3.7451102693966782
Bayesian Z0: 4.186492134133442


Describe the consistency between different metrics.

Write your answers here:

We find the following points:

1. All three metrics provide positive Z-scores indicating that there is a level of statistical significance in the data. And indicating a deviation "in the same direction" from the background.
2. The simplified $Z_{0,simple}$ provide the highest $Z$-score which indicates a larger value of $s$ since it grows linearly (Faster than the other methods)
