<a href="https://colab.research.google.com/github/sundarjhu/Astrostatistics2021/blob/main/Astrostatistics_Lecture09_20210512.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

In [2]:
import numpy as np
from scipy.stats import norm, t

#**Module 1: Use methods from `scipy.stats.norm` to solve the following problems**

#1) Central probability: Find $a$ such that $P(|Z|<a) = 0.5$

#2) Asymmetric central probability: Find $a$ such that $P(-2a < Z < a) = 0.75$

#3) One-tailed extreme: Find $a$ such that $P(Z < -a) = 0.1$


#4) Two-tailed extreme: Find $a$ such that $P(|Z| > a) = 0.995$


#**Module 2: Use methods from `scipy.stats.t` to solve the following problems**

#1) Central probability: Find $a$ such that $P(|T_{_{\nu = 4}}| < a) = 0.5$

##2) One-tailed extreme: Find $a$ such that $P(T_{_{\nu = 4}} < -a) = 0.1$

#3) Two-tailed extreme: Find $a$ such that $P(|T_{_{\nu = 4}}| > a) = 0.995$

#**Module 3: Use methods from `scipy.stats.norm` and `scipy.stats.t` to solve the following problems**

#**Case 1: Normally-distributed variable with known variance**

#A single measurement of the mass of a rock results in a value of 0.2 kg.
#The 1$\sigma$ measurement uncertainty due to the resolution of the mass-measuring device is 0.05 kg.


##1) Construct an 82\% confidence interval on the true mass of the rock.

In [3]:
from astropy import units as u
m = 0.2 * u.kg
sig_m = 0.05 * u.kg
#P(|Z| < a) = 0.82 ==> P(Z < -a) = (1 - 0.82) / 2
nsig = -norm.ppf((1 - 0.82) / 2) #This is the distance from the mean value in number of standard deviations
dm = sig_m * nsig #This is the distance from the mean value in physical units (mass)
CI = np.array([np.round((m - dm).value, decimals = 3), np.round((m + dm).value, decimals = 3)]) * u.kg
print("The 82% confidence interval is {}".format(CI))

The 82% confidence interval is [0.133 0.267] kg


##2) What confidence is associated with the interval [0.00545, 0.3946] kg? What is the associated significance?

In [4]:
CI = np.array([0.00545, 0.3946])
#standardise this interval -- subtract the mean, divide by the standard deviation
#CI_standard = (CI - np.array([m, m])) / np.array([sig_m, sig_m])
Z_CI = (CI - m.value) / sig_m.value
print("Standardised CI: {}".format(Z_CI))

Standardised CI: [-3.891  3.892]


In [5]:
#The above is a symmetric 3.89-sigma CI!
#The associated confidence is the total probability enclosed by this interval
confidence = norm.cdf(Z_CI[1]) - norm.cdf(Z_CI[0])
print("The confidence associated with the given CI is {} (significance: {})".\
      format(np.round(confidence, decimals = 6), np.format_float_scientific(1 - confidence, precision = 2)))

The confidence associated with the given CI is 0.9999 (significance: 9.96e-05)


#**Case 2: Normally-distributed variable with unknown variance**

#Three measurements of the rock's mass result in values of 0.2, 0.35, and 0.25 kg. The (constant, homoskedastic) measurement uncertainty is unknown.

In [6]:
m = np.array([0.2, 0.35, 0.25]) * u.kg

##1) Estimate the population mean and standard deviation of the rock's mass.
##Make sure the standard deviation estimate is unbiased!!

In [7]:
mu_hat = m.mean()
sig_hat = m.std(ddof = 1)
print("Estimates for the mean and standard deviation are {} and {} respectively.".format(np.round(mu_hat, decimals = 4), np.round(sig_hat, decimals = 4)))

Estimates for the mean and standard deviation are 0.2667 kg and 0.0764 kg respectively.


##2) Compute the 95\% confidence interval on the rock's true mass.

In [8]:
#Since the sample mean was calculated in order to compute the sample standard deviation, #dof = N - 1
dof = len(m) - 1
#P(|T| < a) = 0.95 ==> P(T < -a) = (1 - 0.95) / 2
nsig = -t.ppf((1 - 0.995) / 2, df = dof) #This is the distance from the mean value in number of standard deviations
dm = sig_hat * nsig #This is the distance from the mean value in physical units (mass)
CI = np.array([np.round((mu_hat - dm).value, decimals = 4), np.round((mu_hat + dm).value, decimals = 4)]) * u.kg
print("The 95% confidence interval is {}".format(CI))

The 95% confidence interval is [-0.8094  1.3427] kg


##3) What confidence is associated with the interval [-0.0484, 0.5824] kg?

In [11]:
CI = np.array([-0.0484, 0.5824])
#studentise this interval -- subtract the sample mean, divide by the sample standard deviation
#CI_standard = (CI - np.array([m, m])) / np.array([sig_m, sig_m])
T_CI = (CI - mu_hat.value) / sig_hat.value
print("Studentised CI: {}".format(T_CI))

Studentised CI: [-4.125191    4.13391971]


In [12]:
#The above is a symmetric 4.13-sigma CI!
#The associated confidence is the total probability enclosed by this interval
confidence = t.cdf(T_CI[1], df = dof) - t.cdf(T_CI[0], df = dof)
print("The confidence associated with the given CI is {} (significance: {})".\
      format(np.round(confidence, decimals = 6), np.format_float_scientific(1 - confidence, precision = 2)))

The confidence associated with the given CI is 0.946061 (significance: 5.39e-02)
