# Confidence Intervals Code Appendix

## Student's t Distribution

### Coding Key Words:

- rvs: random variates
- pdf: probability density function
- cdf: cumulative distribution function
- ppf: percent point function or percentile (inverse of cdf)

### Required parameters:

- df: degree of freedom, n - 1
- size: size of the simulated dataset
- x: location for probability calculation
- q: lower tail probability for percentile calculation

``` Python
# Import dependency
from scipy import stats

# Create the parameter values
df = 34
x = 1
q = 0.75

# Create a random variable that follows t distribution
t = stats.t.rvs(df=df, size=1000)

# Calculate the PDF at x
tx = stats.t.pdf(x=x, df=df)

# Calculate the probability equal or less than x
txx = stats.t.cdf(x=x, df=df)

# Calculate the 75th percentile
tpct = stats.t.ppf(q=q, df=df)
```

In [1]:
# Import t from scipy.stats
from scipy import stats

In [2]:
# Create the degree of freedon 
df = 20

# Create a set of random values (1000) that follows t distribution
t = stats.t.rvs(df=df, size=1000)

print(f"Student's t Random Variable: \n{t}\n")

Student's t Random Variable: 
[ 1.24324058e+00 -5.87311818e-01  1.66729802e+00 -9.01639513e-02
 -6.61436492e-01 -3.46197891e+00 -1.22117262e+00 -9.59564998e-01
 -7.35780231e-01  1.41576368e+00  1.43124968e+00 -6.17119044e-01
 -7.28300933e-01 -7.74029787e-01 -8.62568994e-01  1.96627965e+00
 -1.13296696e+00  2.15601798e+00 -2.06909308e+00  1.10590833e+00
  7.82143161e-01 -4.15118800e-01 -1.41646803e-01 -1.10153053e+00
  6.42442932e-01  2.47228316e+00 -2.50228925e-02  9.10304447e-02
  2.22403012e-01 -2.90688191e-01 -2.19696093e-01  6.80741097e-01
 -2.48868108e-02  1.88232634e-01  3.12863669e-01  1.34520186e-01
  1.86573778e+00  1.11349138e+00  1.68408996e+00  1.89800170e+00
  1.91339141e+00  2.07876394e-01 -6.59553617e-02 -4.47802805e-01
 -3.00032369e-01  3.28278355e-01  4.04585121e-01 -8.63264566e-02
 -1.21617650e-01  2.39330871e+00  1.04841727e+00 -1.74377435e+00
 -2.18559345e+00  8.44766656e-01 -1.69555263e+00 -6.67808315e-01
  2.23362171e+00 -9.05583690e-01  2.89260144e-01  1.94239056

In [3]:
# Calculate the pdf at x = 1
stats.t.pdf(x=1, df=df)

0.23604564912670103

In [4]:
# Calculate the cdf at x = 1
stats.t.cdf(x=1, df=df)

0.8353717114141455

In [5]:
# Calculate the t statistic at 50% percentile
stats.t.ppf(q=0.5, df=df)

6.72145004306781e-17

## Confidence Intervals for Student's t Distribution

``` python
# Import stats module
from scipy import stats

# Calculate the confidence interval
stats.t.interval(
    alpha=0.9,           # Confidence level
    df=df,               # Degrees of freedom
    loc=sample_mean,     # Sample mean
    scale=standard_error # Estimated standard error for t-distribution
)
```

### Confidence Intervals for Sample Means Distribution

When the population standard deviation is known, we should use the standard z (normal) distribution.

``` Python
# Import stats module
from scipy import stats

# Calculate the confidence interval 
stats.norm.interval(
    alpha=0.9,            # Confidence level
    loc=sample_mean,      # Sample mean
    scale=standard_error  # Standard error for sample distribution
)
```

In [6]:
# Import t from scipy.stats
from scipy import stats

In [8]:
# Calculate the 95% confidence interval
sample_mean = 15
standard_error = 1.2

stats.t.interval(
    confidence=0.95,
    df=df,
    loc=sample_mean,
    scale=standard_error
)

(12.496843863280997, 17.503156136719003)

## Bootstrap

Re-sampling method to create new samples for statistical inference.

``` Python
from sklearn.utils import resample

data = [...]

boot = resample(data,              # original data set
                replace=True,      # resampling with replacement
                n_samples=10,      # number of samples
                random_state=123   # random seed to ensure consistent result
)

print(f'Bootstrap Sample: \n{boot}')
```

In [9]:
# Import resample from sklearn.utils
from sklearn.utils import resample

In [10]:
# Create 10 boostrap samples from the random variable t
boot = resample(t,
               replace=True,
               n_samples=100,
               random_state=777
)

print(f'Boostrap Sample: \n{boot}')

Boostrap Sample: 
[ 1.09714173e+00  7.13586449e-01 -1.84902587e+00 -3.49495598e-01
 -6.09457509e-01  1.83423079e+00  7.85944628e-01 -1.66020587e+00
 -6.38659486e-02 -1.33695515e+00 -1.49221574e-01  2.02080280e+00
  1.38724478e+00  2.86293656e-01  9.00717922e-01  6.55234791e-01
 -2.32056726e-02  2.26242531e+00 -1.67148073e-01 -2.48868108e-02
  6.52868268e-01 -5.83136678e-01 -1.11408128e+00  2.19304413e-01
 -1.22361629e-01  1.54337346e+00  2.77878424e-01  2.43319749e+00
  2.55627188e-01  1.09714173e+00  8.84563965e-01 -1.11408128e+00
  5.81408084e-01  2.60360439e+00  2.03213056e-03 -1.07432453e+00
 -9.06788224e-01 -1.51156284e+00  8.64904021e-01  4.42066043e-01
  1.81278116e+00  6.91340464e-01  2.03213056e-03 -1.58933512e-01
  9.54739185e-01  4.41932433e-01 -5.43754740e-01 -1.49349731e+00
 -2.79614097e-01  1.31465216e+00 -1.13296696e+00 -4.62023565e-01
 -3.35125009e-04  3.26548760e-01 -3.41210265e-01  1.32263620e-01
  7.58907979e-01  4.19940304e-01  1.60667258e+00 -6.09269428e-01
 -4.001