# Confidence Intervals Code Appendix

## Student's t Distribution

### Coding Key Words:

- rvs: random variates
- pdf: probability density function
- cdf: cumulative distribution function
- ppf: percent point function or percentile (inverse of cdf)

### Required parameters:

- df: degree of freedom, n - 1
- size: size of the simulated dataset
- x: location for probability calculation
- q: lower tail probability for percentile calculation

``` Python
# Import dependency
from scipy import stats

# Create the parameter values
df = 34
x = 1
q = 0.75

# Create a random variable that follows t distribution
t = stats.t.rvs(df=df, size=1000)

# Calculate the PDF at x
tx = stats.t.pdf(x=x, df=df)

# Calculate the probability equal or less than x
txx = stats.t.cdf(x=x, df=df)

# Calculate the 75th percentile
tpct = stats.t.ppf(q=q, df=df)
```

In [1]:
# Import t from scipy.stats
from scipy import stats

In [2]:
# Create the degree of freedon 
df = 20

# Create a set of random values (1000) that follows t distribution
t = stats.t.rvs(df=df, size=1000)

print(f"Student's t Random Variable: \n{t}\n")

Student's t Random Variable: 
[-2.39693291e-01  2.55965333e-02  1.27935558e+00  2.69659675e-01
 -1.09534612e+00 -9.65396315e-01 -7.78844160e-01 -2.66610246e+00
  1.68426179e+00 -1.13438365e+00  2.54762520e-01  5.27303762e-01
  5.06005170e-01 -1.88394756e+00  8.87790185e-02 -3.34268908e-01
 -8.02553153e-01  1.80526485e+00  7.36718485e-01  7.81460675e-01
 -1.51971537e+00 -8.39665083e-01 -1.58700210e+00 -7.73375178e-01
 -7.29083860e-01 -9.71455550e-01 -1.22151668e+00  2.24677702e+00
  7.98540109e-01 -1.63232574e-01 -1.68971778e-02 -5.79432764e-01
  1.33156147e+00  4.05024884e-01  1.20945192e+00  8.72894290e-02
  3.12387307e-01  1.47925861e+00 -1.29599078e+00  1.94336143e-01
  3.21449432e-01  6.07011990e-01  3.36644363e-01  4.56700799e-02
  1.84734144e+00  7.40495761e-01  1.90576139e+00  2.65405699e-01
 -9.43415987e-01 -9.10242855e-01  2.29003575e+00  2.71457097e-01
 -9.81967988e-01 -1.36686727e+00 -2.42618809e-01  5.38495684e-01
  1.63536733e+00 -1.24754459e+00 -4.21726306e-01 -1.07723054

In [3]:
# Calculate the pdf at x = 1
stats.t.pdf(x=1, df=df)

0.23604564912670092

In [4]:
# Calculate the cdf at x = 1
stats.t.cdf(x=1, df=df)

0.8353717114141455

In [5]:
# Calculate the t statistic at 50% percentile
stats.t.ppf(q=0.5, df=df)

6.72145054561635e-17

## Confidence Intervals for Student's t Distribution

``` python
# Import stats module
from scipy import stats

# Calculate the confidence interval
stats.t.interval(
    alpha=0.9,           # Confidence level
    df=df,               # Degrees of freedom
    loc=sample_mean,     # Sample mean
    scale=standard_error # Estimated standard error for t-distribution
)
```

### Confidence Intervals for Sample Means Distribution

When the population standard deviation is known, we should use the standard z (normal) distribution.

``` Python
# Import stats module
from scipy import stats

# Calculate the confidence interval 
stats.norm.interval(
    alpha=0.9,            # Confidence level
    loc=sample_mean,      # Sample mean
    scale=standard_error  # Standard error for sample distribution
)
```

In [6]:
# Import t from scipy.stats
from scipy import stats

In [7]:
# Calculate the 95% confidence interval
sample_mean = 15
standard_error = 1.2

stats.t.interval(
    alpha=0.95,
    df=df,
    loc=sample_mean,
    scale=standard_error
)

(12.496843863280997, 17.503156136719003)

## Bootstrap

Re-sampling method to create new samples for statistical inference.

``` Python
from sklearn.utils import resample

data = [...]

boot = resample(data,              # original data set
                replace=True,      # resampling with replacement
                n_samples=10,      # number of samples
                random_state=123   # random seed to ensure consistent result
)

print(f'Bootstrap Sample: \n{boot}')
```

In [8]:
# Import resample from sklearn.utils
from sklearn.utils import resample

In [9]:
# Create 10 boostrap samples from the random variable t
boot = resample(t,
               replace=True,
               n_samples=100,
               random_state=777
)

print(f'Boostrap Sample: \n{boot}')

Boostrap Sample: 
[-0.09481868  0.76083545  0.04278206 -0.57746206  2.07203084 -2.72164322
  0.62968489 -0.4955621   0.71487155  0.43343615  0.34740735  0.39560306
 -0.96103073  0.57989555 -1.77287786  1.31591085 -0.07948742  0.93759549
  0.68937489  1.33156147 -3.6630728  -1.60769663 -0.68050983 -0.2247758
  1.38780447  1.16206344 -0.44881316 -0.3952025  -0.71190849 -0.09481868
 -0.06829741 -0.68050983  1.0342531   1.28673875  0.8775572  -1.14649843
  0.42227528  0.47992646 -1.36429066 -1.21549272  1.09354397  1.29335724
  0.8775572  -1.50064796  1.02517985  0.62870039  1.65476309  0.85932384
  1.48100591  0.42044811 -0.80255315 -2.76255888  0.07372509 -0.69535503
  1.50812572  0.95873654 -1.16745036 -0.70653506 -0.45273413  0.19591961
 -0.15633479  0.79854011 -0.6776798  -0.42974089 -1.8773467  -0.15817599
 -1.18700818 -0.09703437  0.20593333  1.06892644 -1.81028995  0.53405629
  0.97613149  1.14241814  0.35206201 -0.0393338  -1.10504145 -0.09361187
 -0.80255315  0.24559841 -0.817374