# Civil Engineering Probability and Statistics Reference Notebook

## Table of Contents
1. [Definitions](#definitions)
2. [When to Use Different Distributions and Tests](#when-to-use-different-distributions-and-tests)
3. [Reference Examples](#reference-examples)

---

## Definitions

### Probability and Statistics Terms

- **Bayes' Theorem**: 
  $$
  P(A_j|B) = \frac{P(B|A_j)P(A_j)}{\sum_{i=1}^k P(B|A_i)P(A_i)}
  $$
  - **Definition**: A formula to update the probability of an event based on prior evidence.
  - **Use Cases**:
    1. Diagnosing structural integrity issues given prior inspection results.
    2. Updating the probability of project delays given new information on supply chain disruptions.
    3. Estimating the likelihood of construction cost overruns based on past project data.

- **Binomial Distribution**: 
  $$
  P(X = k) = \binom{n}{k} p^k (1-p)^{n-k}
  $$
  - **Definition**: Models the number of successes in a fixed number of trials with two possible outcomes.
  - **Use Cases**:
    1. Determining the probability of passing a set number of safety inspections.
    2. Estimating the likelihood of a specific number of successful concrete pours.
    3. Predicting the number of days without accidents on a construction site.

- **Poisson Distribution**: 
  $$
  P(X = k) = \frac{\lambda^k e^{-\lambda}}{k!}
  $$
  - **Definition**: Models the number of events in a fixed interval of time/space.
  - **Use Cases**:
    1. Counting the number of trucks arriving at a construction site per hour.
    2. Estimating the number of service requests a civil engineering firm receives per day.
    3. Modeling the number of cracks in a given length of pavement.

- **Normal Distribution**: 
  $$
  f(x) = \frac{1}{\sqrt{2\pi\sigma^2}} e^{-\frac{(x-\mu)^2}{2\sigma^2}}
  $$
  - **Definition**: Symmetric distribution around the mean, representing real-valued random variables.
  - **Use Cases**:
    1. Analyzing the distribution of building heights in a city.
    2. Modeling the distribution of concrete strength measurements.
    3. Estimating the variability in construction material costs.

- **Exponential Distribution**: 
  $$
  f(x; \lambda) = \lambda e^{-\lambda x} \text{ for } x \ge 0
  $$
  - **Definition**: Models the time between events in a Poisson process.
  - **Use Cases**:
    1. Estimating the time until the next equipment failure.
    2. Modeling the duration between successive arrivals of construction vehicles.
    3. Calculating the time until a maintenance activity is needed.

- **Gamma Distribution**: 
  $$
  f(x; \alpha, \beta) = \frac{\beta^\alpha x^{\alpha-1} e^{-\beta x}}{\Gamma(\alpha)}
  $$
  - **Definition**: Continuous probability distribution, often used for waiting times.
  - **Use Cases**:
    1. Modeling the time until multiple events occur (e.g., multiple equipment failures).
    2. Estimating the total duration of a series of project phases.
    3. Analyzing the distribution of rainfall amounts in a region.

- **Weibull Distribution**: 
  $$
  f(x; \lambda, k) = \frac{k}{\lambda} \left(\frac{x}{\lambda}\right)^{k-1} e^{-\left(\frac{x}{\lambda}\right)^k}
  $$
  - **Definition**: Used to model time until failure of a material/system.
  - **Use Cases**:
    1. Predicting the lifespan of construction materials.
    2. Estimating the time until failure for structural components.
    3. Modeling the time until the next major maintenance event.

- **Lognormal Distribution**: 
  $$
  f(x; \mu, \sigma) = \frac{1}{x\sigma\sqrt{2\pi}} e^{-\frac{(\ln x - \mu)^2}{2\sigma^2}}
  $$
  - **Definition**: A distribution of a variable whose logarithm is normally distributed.
  - **Use Cases**:
    1. Modeling the distribution of income or property values.
    2. Estimating the size distribution of construction project costs.
    3. Analyzing the variability in time to project completion.

- **Chi-Squared Distribution**: 
  $$
  f(x; k) = \frac{x^{k/2-1} e^{-x/2}}{2^{k/2} \Gamma(k/2)}
  $$
  - **Definition**: Distribution of the sum of the squares of \(k\) independent standard normal variables.
  - **Use Cases**:
    1. Testing the goodness-of-fit for observed data to a theoretical distribution.
    2. Analyzing variance in sample data.
    3. Performing hypothesis tests for categorical data.

- **T-Distribution**: 
  $$
  f(t; \nu) = \frac{\Gamma\left(\frac{\nu+1}{2}\right)}{\sqrt{\pi \nu} \Gamma\left(\frac{\nu}{2}\right)} \left(1 + \frac{t^2}{\nu}\right)^{-\frac{\nu+1}{2}}
  $$
  - **Definition**: Symmetric distribution used for small sample sizes or unknown variances.
  - **Use Cases**:
    1. Comparing sample means when the sample size is small.
    2. Estimating the mean difference between two related samples.
    3. Testing hypotheses about the population mean.

- **Linear Regression**: 
  $$
  Y = \beta_0 + \beta_1X + \epsilon
  $$
  - **Definition**: Models the relationship between a dependent variable and one/more independent variables.
  - **Use Cases**:
    1. Predicting construction costs based on the number of workers.
    2. Estimating the impact of material quality on building strength.
    3. Analyzing the relationship between project duration and project budget.

- **Ordinary Least Squares (OLS)**: 
  - **Definition**: Estimates the parameters in a linear regression model by minimizing the sum of squared differences.
  - **Use Cases**:
    1. Determining the best fit line for cost estimation.
    2. Modeling the relationship between environmental conditions and structural performance.
    3. Analyzing factors affecting construction time.

- **Confidence Interval**: 
  $$
  CI = \bar{X} \pm t_{\alpha/2} \left(\frac{s}{\sqrt{n}}\right)
  $$
  - **Definition**: Range of values likely to contain the population parameter.
  - **Use Cases**:
    1. Estimating the mean strength of concrete samples.
    2. Determining the range for expected project costs.
    3. Providing a range for anticipated project completion times.

- **Hypothesis Testing**: 
  - **Definition**: Method to make inferences about population parameters based on sample data.
  - **Use Cases**:
    1. Testing if a new material has higher strength than the current standard.
    2. Determining if project delays are significantly different from the planned schedule.
    3. Assessing the effectiveness of a new construction method.

- **P-Value**: 
  - **Definition**: Probability of obtaining test results at least as extreme as the observed results, assuming the null hypothesis is true.
  - **Use Cases**:
    1. Evaluating the significance of differences in material properties.
    2. Determining the likelihood of observing extreme project cost overruns.
    3. Assessing the probability of unusually high/low project completion times.

- **R-Squared (R²)**: 
  $$
  R^2 = 1 - \frac{SS_{res}}{SS_{tot}}
  $$
  - **Definition**: Proportion of variance in the dependent variable explained by the independent variables.
  - **Use Cases**:
    1. Evaluating the fit of a cost estimation model.
    2. Determining the explanatory power of factors affecting project duration.
    3. Assessing the quality of a predictive model for material strength.

- **Sum of Squared Errors (SSE)**: 
  $$
  SSE = \sum_{i=1}^n (Y_i - \hat{Y}_i)^2
  $$
  - **Definition**: Measure of discrepancy between data and estimation model.
  - **Use Cases**:
    1. Quantifying the error in cost predictions.
    2. Evaluating the accuracy of time-to-completion models.
    3. Assessing the fit of regression models for material properties.

- **Sum of Squares Total (SST)**: 
  $$
  SST = \sum_{i=1}^n (Y_i - \bar{Y})^2
  $$
  - **Definition**: Total variation in the dependent variable.
  - **Use Cases**:
    1. Determining the total variability in project costs.
    2. Analyzing the total variation in construction times.
    3. Assessing the overall variation in material strength measurements.

- **Covariance**: 
  $$
  \text{Cov}(X, Y) = \frac{1}{n-1} \sum_{i=1}^n (X_i - \bar{X})(Y_i - \bar{Y})
  $$
  - **Definition**: Measure of joint variability of two random variables.
  - **Use Cases**:
    1. Analyzing the relationship between temperature and concrete curing time.
    2. Estimating the co-variation between project cost and duration.
    3. Assessing the joint variability of material properties.

- **Correlation**: 
  $$
  r = \frac{\text{Cov}(X, Y)}{\sigma_X \sigma_Y}
  $$
  - **Definition**: Measure of the strength and direction of a linear relationship between two variables.
  - **Use Cases**:
    1. Assessing the correlation between project budget and project size.
    2. Determining the relationship between worker hours and project completion time.
    3. Analyzing the correlation between environmental conditions and material performance.
---

## When to Use Different Distributions and Tests

### Distributions

- **Normal Distribution**
  - **Scenario**: When data is symmetrically distributed around the mean.
  - **Use Case 1**: Heights of buildings.
  - **Use Case 2**: Errors in measurements.
  - **Use Case 3**: Strength measurements of materials.
  - **Steps**:
    1. Determine the mean (µ) and standard deviation (σ).
    2. Calculate probabilities or z-scores as needed.
    3. Use the normal distribution table for specific probability values.

- **Binomial Distribution**
  - **Scenario**: Fixed number of trials, two possible outcomes.
  - **Use Case 1**: Number of successful project completions.
  - **Use Case 2**: Number of days without accidents.
  - **Use Case 3**: Number of defect-free materials.
  - **Steps**:
    1. Determine the number of trials (n) and the probability of success (p).
    2. Use the binomial formula to calculate probabilities.
    3. Sum probabilities if needed for cumulative probabilities.

- **Poisson Distribution**
  - **Scenario**: Number of events in a fixed interval.
  - **Use Case 1**: Arrival rates of vehicles.
  - **Use Case 2**: Service requests per day.
  - **Use Case 3**: Cracks in pavement.
  - **Steps**:
    1. Determine the average rate (λ).
    2. Use the Poisson formula to calculate probabilities.
    3. Use cumulative probabilities for specific ranges.

- **Exponential Distribution**
  - **Scenario**: Time between events in a Poisson process.
  - **Use Case 1**: Time until next equipment failure.
  - **Use Case 2**: Duration between vehicle arrivals.
  - **Use Case 3**: Time until maintenance.
  - **Steps**:
    1. Determine the rate parameter (λ).
    2. Use the exponential formula to calculate probabilities.
    3. Use cumulative probabilities as needed.

- **Gamma Distribution**
  - **Scenario**: Waiting times or life data analysis.
  - **Use Case 1**: Time until multiple failures.
  - **Use Case 2**: Total duration of project phases.
  - **Use Case 3**: Rainfall distribution.
  - **Steps**:
    1. Determine the shape (α) and rate (β) parameters.
    2. Use the gamma formula for probability calculations.
    3. Apply cumulative probabilities if necessary.

- **Weibull Distribution**
  - **Scenario**: Time until failure analysis.
  - **Use Case 1**: Lifespan of materials.
  - **Use Case 2**: Time until structural failure.
  - **Use Case 3**: Time until maintenance.
  - **Steps**:
    1. Determine the shape (k) and scale (λ) parameters.
    2. Use the Weibull formula for probability calculations.
    3. Calculate cumulative probabilities as needed.

- **Lognormal Distribution**
  - **Scenario**: Data that can be log-transformed to a normal distribution.
  - **Use Case 1**: Income distribution.
  - **Use Case 2**: Project cost distribution.
  - **Use Case 3**: Time to project completion.
  - **Steps**:
    1. Determine the log-transformed mean (µ) and standard deviation (σ).
    2. Use the lognormal formula for probability calculations.
    3. Calculate cumulative probabilities if necessary.

### Hypothesis Tests

- **T-Tests**
  - **Scenario**: Comparing means of small sample sizes or unknown population variances.
  - **Use Case 1**: Comparing concrete strengths.
  - **Use Case 2**: Analyzing project completion times.
  - **Use Case 3**: Assessing material properties.
  - **Steps**:
    1. State the null and alternative hypotheses.
    2. Calculate the t-statistic.
    3. Compare to the critical t-value from the t-distribution table.

- **Chi-Squared Tests**
  - **Scenario**: Testing for independence or goodness of fit.
  - **Use Case 1**: Goodness-of-fit for observed data.
  - **Use Case 2**: Variance analysis.
  - **Use Case 3**: Categorical data testing.
  - **Steps**:
    1. State the null and alternative hypotheses.
    2. Calculate the chi-squared statistic.
    3. Compare to the critical chi-squared value from the chi-squared table.

- **ANOVA (Analysis of Variance)**
  - **Scenario**: Comparing means across multiple groups.
  - **Use Case 1**: Water quality measurements.
  - **Use Case 2**: Comparing project costs.
  - **Use Case 3**: Analyzing material strengths.
  - **Steps**:
    1. State the null and alternative hypotheses.
    2. Calculate the F-statistic.
    3. Compare to the critical F-value from the F-distribution table.

- **Regression Analysis**
  - **Scenario**: Modeling relationships between dependent and independent variables.
  - **Use Case 1**: Predicting construction costs.
  - **Use Case 2**: Estimating material strength.
  - **Use Case 3**: Analyzing project duration.
  - **Steps**:
    1. Collect and prepare data.
    2. Fit the regression model.
    3. Interpret the regression coefficients.

---

## Reference Examples

### Example 1: Poisson Distribution
- **Problem**: Number of cars arriving at a toll booth.
- **Data**: Average rate (\( \lambda \)) = 5 cars per minute.
- **Task**: Probability of 7 cars arriving in the next minute.
- **Solution**: 
  $$
  P(X = 7) = \frac{e^{-λ}λ^7}{7!}
  $$
- **Steps**:
  1. Identify the average rate (\( \lambda = 5 \)).
  2. Substitute into the Poisson formula:
     $$
     P(X = 7) = \frac{e^{-5}5^7}{7!} \approx 0.1044
     $$

### Example 2: Confidence Interval
- **Problem**: Estimating the mean height of buildings.
- **Data**: Sample mean (\( \bar{X} \)) = 30 ft, sample standard deviation (\( s \)) = 5 ft, sample size (\( n \)) = 50.
- **Task**: 95% confidence interval for the mean height.
- **Solution**: 
  $$
  CI = \bar{X} \pm t_{\alpha/2} \left(\frac{s}{\sqrt{n}}\right)
  $$
- **Steps**:
  1. Calculate the standard error:
     $$
     SE = \frac{s}{\sqrt{n}} = \frac{5}{\sqrt{50}} \approx 0.707
     $$
  2. Determine the critical value (\( t_{\alpha/2} \)) for 95% confidence.
  3. Calculate the confidence interval:
     $$
     CI = 30 \pm 1.96 \times 0.707 \approx [28.614, 31.386]
     $$

### Example 3: Linear Regression
- **Problem**: Predicting construction cost.
- **Data**: Cost (Y) in millions, workers (X).
- **Task**: Slope of regression line.
- **Solution**: 
  $$
  \hat{\beta}_1 = \frac{\sum (X_i - \bar{X})(Y_i - \bar{Y})}{\sum (X_i - \bar{X})^2}
  $$
- **Steps**:
  1. Calculate the mean of X and Y.
   2. Compute the numerator (\( \sum (X_i - \bar{X})(Y_i - \bar{Y}) \)).
   3. Compute the denominator (\( \sum (X_i - \bar{X})^2 \)).
   4. Calculate the slope:
      $$
      \hat{\beta}_1 = \frac{393 - \frac{113 \times 51}{41}}{769 - \frac{113^2}{41}} \approx 0.552
      $$

### Example 4: Hypothesis Testing
- **Problem**: Testing if average breaking strength of concrete is 5000 psi.
- **Data**: Sample mean = 5100 psi, sample standard deviation = 300 psi, sample size = 30.
- **Task**: Two-tailed test with \( \alpha = 0.05 \).
- **Solution**: 
  $$
  t = \frac{\bar{X} - μ}{s/√n}
  $$
  - **Steps**:
     1. State the null hypothesis (\( H_0: μ = 5000 \)).
     2. Calculate the t-statistic:
        $$
        t = \frac{5100 - 5000}{\frac{300}{\sqrt{30}}} = \frac{100}{54.77} \approx 1.83
        $$
     3. Compare to the critical value from the t-distribution table for \( df = 29 \) at \( \alpha/2 = 0.025 \) which is approximately 2.045.
     4. Since 1.83 < 2.045, we fail to reject \( H_0 \) at the 5% significance level.

---

# Civil Engineering Probability and Statistics Problems

## Table of Contents
1. [Probability & Distributions](#probability--distributions)
2. [Joint Distributions](#joint-distributions)
3. [Linear Combinations](#linear-combinations)
4. [Confidence Intervals & Hypothesis Tests](#confidence-intervals--hypothesis-tests)
5. [Regression](#regression)

---

## Probability & Distributions

---

---

## Normal Distribution

### Relevant Parameters and Characteristics

- **Population Size (\( N \))**: Total number of observations in the population.
- **Sample Size (\( n \))**: Number of observations in the sample.
- **Population Mean (\( \mu \))**: The average value in the population.
- **Sample Mean (\( \bar{X} \))**: The average value in the sample.
- **Population Variance (\( \sigma^2 \))**: The measure of variability in the population.
- **Sample Variance (\( s^2 \))**: The measure of variability in the sample.
- **Population Standard Deviation (\( \sigma \))**: The square root of the population variance.
- **Sample Standard Deviation (\( s \))**: The square root of the sample variance.
- **Population Standard Error (Infinite Population)**:
  $$
  \text{SE} = \frac{\sigma}{\sqrt{n}}
  $$
- **Population Standard Error (Finite Population)**:
  $$
  \text{SE} = \frac{\sigma}{\sqrt{n}} \sqrt{\frac{N - n}{N - 1}}
  $$
- **Sample Standard Error**:
  $$
  \text{SE} = \frac{s}{\sqrt{n}}
  $$

### Functions

- **Probability Density Function (PDF)**:
  $$
  f(x) = \frac{1}{\sqrt{2\pi\sigma^2}} e^{-\frac{(x - \mu)^2}{2\sigma^2}}
  $$

- **Cumulative Distribution Function (CDF)**:
  $$
  F(x) = \frac{1}{2} \left[ 1 + \text{erf} \left( \frac{x - \mu}{\sigma \sqrt{2}} \right) \right]
  $$
  Where \(\text{erf}\) is the error function.

- **Standard Normal Distribution**: Used to standardize normal distributions.
  $$
  Z = \frac{X - \mu}{\sigma}
  $$

### Tests

- **Z-Test**: Used when the population standard deviation (\( \sigma \)) is known, and the sample size is large.
  - **Test Statistic**:
    $$
    z = \frac{\bar{X} - \mu}{\frac{\sigma}{\sqrt{n}}}
    $$
  - **For Finite Population**:
    $$
    z = \frac{\bar{X} - \mu}{\frac{\sigma}{\sqrt{n}} \sqrt{\frac{N - n}{N - 1}}}
    $$

- **T-Test**: Used when the population standard deviation (\( \sigma \)) is unknown, especially for small sample sizes.
  - **Test Statistic**:
    $$
    t = \frac{\bar{X} - \mu}{\frac{s}{\sqrt{n}}}
    $$

### Confidence Intervals

- **For Population Mean with Known \( \sigma \)**:
  $$
  \bar{X} \pm z \left( \frac{\sigma}{\sqrt{n}} \right)
  $$
  - **For Finite Population**:
    $$
    \bar{X} \pm z \left( \frac{\sigma}{\sqrt{n}} \sqrt{\frac{N - n}{N - 1}} \right)
    $$

- **For Population Mean with Unknown \( \sigma \)**:
  $$
  \bar{X} \pm t \left( \frac{s}{\sqrt{n}} \right)
  $$

### Scenarios Where Normal Distribution Can Be Used Instead of Another Distribution

- **Central Limit Theorem (CLT)**: For large sample sizes, the distribution of the sample mean will be approximately normal, regardless of the population distribution.
- **Approximating Binomial Distribution**: When \( n \) is large and \( p \) is not too close to 0 or 1 (\( np \geq 5 \) and \( n(1 - p) \geq 5 \)).
- **Approximating Poisson Distribution**: When the rate parameter \( \lambda \) is large (\( \lambda > 10 \)).

### Summary for Normal Distribution

- **PDF**:
  $$
  f(x) = \frac{1}{\sqrt{2\pi\sigma^2}} e^{-\frac{(x - \mu)^2}{2\sigma^2}}
  $$

- **CDF**:
  $$
  F(x) = \frac{1}{2} \left[ 1 + \text{erf} \left( \frac{x - \mu}{\sigma \sqrt{2}} \right) \right]
  $$

- **Z-Scores and Z-Tests**:
  $$
  z = \frac{X - \mu}{\sigma}
  $$
  $$
  z = \frac{\bar{X} - \mu}{\frac{\sigma}{\sqrt{n}}}
  $$
  $$
  z = \frac{\bar{X} - \mu}{\frac{\sigma}{\sqrt{n}} \sqrt{\frac{N - n}{N - 1}}}
  $$

- **T-Tests**:
  $$
  t = \frac{\bar{X} - \mu}{\frac{s}{\sqrt{n}}}
  $$

- **Confidence Intervals**:
  $$
  \bar{X} \pm z \left( \frac{\sigma}{\sqrt{n}} \right)
  $$
  $$
  \bar{X} \pm z \left( \frac{\sigma}{\sqrt{n}} \sqrt{\frac{N - n}{N - 1}} \right)
  $$
  $$
  \bar{X} \pm t \left( \frac{s}{\sqrt{n}} \right)
  $$

---
---

## Normal Distribution

### Relevant Parameters and Characteristics

| **Term/Concept**                  | **Definition/Explanation**                                                                                 | **Formula/Notation**                                               |
|-----------------------------------|-------------------------------------------------------------------------------------------------------------|--------------------------------------------------------------------|
| Population Size (\( N \))         | Total number of observations in the population.                                                             | \( N \)                                                            |
| Sample Size (\( n \))             | Number of observations in the sample.                                                                       | \( n \)                                                            |
| Population Mean (\( \mu \))       | The average value in the population.                                                                        | \( \mu \)                                                          |
| Sample Mean (\( \bar{X} \))       | The average value in the sample.                                                                            | \( \bar{X} \)                                                      |
| Population Variance (\( \sigma^2 \)) | The measure of variability in the population.                                                              | \( \sigma^2 \)                                                     |
| Sample Variance (\( s^2 \))       | The measure of variability in the sample.                                                                   | \( s^2 \)                                                          |
| Population Standard Deviation (\( \sigma \)) | The square root of the population variance.                                                              | \( \sigma \)                                                       |
| Sample Standard Deviation (\( s \)) | The square root of the sample variance.                                                                   | \( s \)                                                            |
| Population Standard Error (Infinite Population) | Standard error of the mean for an infinite or very large population.                                    | \( \text{SE} = \frac{\sigma}{\sqrt{n}} \)                          |
| Population Standard Error (Finite Population) | Standard error of the mean for a finite population.                                                      | \( \text{SE} = \frac{\sigma}{\sqrt{n}} \sqrt{\frac{N - n}{N - 1}} \) |
| Sample Standard Error            | Standard error of the mean for a sample.                                                                    | \( \text{SE} = \frac{s}{\sqrt{n}} \)                               |

### Functions

| **Function**                      | **Explanation**                                                                                              | **Formula/Notation**                                               |
|-----------------------------------|--------------------------------------------------------------------------------------------------------------|--------------------------------------------------------------------|
| Probability Density Function (PDF)| Used to find the probability that a continuous random variable falls within a particular range.              | \( f(x) = \frac{1}{\sqrt{2\pi\sigma^2}} e^{-\frac{(x - \mu)^2}{2\sigma^2}} \) |
| Cumulative Distribution Function (CDF) | Used to find the probability that a random variable is less than or equal to a certain value.              | \( F(x) = \frac{1}{2} \left[ 1 + \text{erf} \left( \frac{x - \mu}{\sigma \sqrt{2}} \right) \right] \) |
| Standard Normal Distribution      | Used to standardize normal distributions.                                                                     | \( Z = \frac{X - \mu}{\sigma} \)                                   |

### Tests

| **Test**                          | **Explanation**                                                                                              | **Formula/Notation**                                               |
|-----------------------------------|--------------------------------------------------------------------------------------------------------------|--------------------------------------------------------------------|
| Z-Test                            | Used when the population standard deviation (\( \sigma \)) is known, and the sample size is large.           | \( z = \frac{\bar{X} - \mu}{\frac{\sigma}{\sqrt{n}}} \)             |
|                                   |                                                                                                              | For Finite Population: \( z = \frac{\bar{X} - \mu}{\frac{\sigma}{\sqrt{n}} \sqrt{\frac{N - n}{N - 1}}} \) |
| T-Test                            | Used when the population standard deviation (\( \sigma \)) is unknown, especially for small sample sizes.    | \( t = \frac{\bar{X} - \mu}{\frac{s}{\sqrt{n}}} \)                  |

### Confidence Intervals

| **Confidence Interval**           | **Explanation**                                                                                              | **Formula/Notation**                                               |
|-----------------------------------|--------------------------------------------------------------------------------------------------------------|--------------------------------------------------------------------|
| For Population Mean with Known \( \sigma \) | Confidence interval for the mean when the population standard deviation is known.                          | \( \bar{X} \pm z \left( \frac{\sigma}{\sqrt{n}} \right) \)          |
|                                   |                                                                                                              | For Finite Population: \( \bar{X} \pm z \left( \frac{\sigma}{\sqrt{n}} \sqrt{\frac{N - n}{N - 1}} \right) \) |
| For Population Mean with Unknown \( \sigma \) | Confidence interval for the mean when the population standard deviation is unknown.                        | \( \bar{X} \pm t \left( \frac{s}{\sqrt{n}} \right) \)               |

### Scenarios Where Normal Distribution Can Be Used Instead of Another Distribution

- **Central Limit Theorem (CLT)**: For large sample sizes, the distribution of the sample mean will be approximately normal, regardless of the population distribution.
- **Approximating Binomial Distribution**: When \( n \) is large and \( p \) is not too close to 0 or 1 (\( np \geq 5 \) and \( n(1 - p) \geq 5 \)).
- **Approximating Poisson Distribution**: When the rate parameter \( \lambda \) is large (\( \lambda > 10 \)).

### Summary for Normal Distribution

- **PDF**:
  $$
  f(x) = \frac{1}{\sqrt{2\pi\sigma^2}} e^{-\frac{(x - \mu)^2}{2\sigma^2}}
  $$

- **CDF**:
  $$
  F(x) = \frac{1}{2} \left[ 1 + \text{erf} \left( \frac{x - \mu}{\sigma \sqrt{2}} \right) \right]
  $$

- **Z-Scores and Z-Tests**:
  $$
  z = \frac{X - \mu}{\sigma}
  $$
  $$
  z = \frac{\bar{X} - \mu}{\frac{\sigma}{\sqrt{n}}}
  $$
  $$
  z = \frac{\bar{X} - \mu}{\frac{\sigma}{\sqrt{n}} \sqrt{\frac{N - n}{N - 1}}}
  $$

- **T-Tests**:
  $$
  t = \frac{\bar{X} - \mu}{\frac{s}{\sqrt{n}}}
  $$

- **Confidence Intervals**:
  $$
  \bar{X} \pm z \left( \frac{\sigma}{\sqrt{n}} \right)
  $$
  $$
  \bar{X} \pm z \left( \frac{\sigma}{\sqrt{n}} \sqrt{\frac{N - n}{N - 1}} \right)
  $$
  $$
  \bar{X} \pm t \left( \frac{s}{\sqrt{n}} \right)
  $$

---
---

## Normal Distribution

### Relevant Parameters and Characteristics

| **Term/Concept**                  | **Definition/Explanation**                                                                                 | **Formula/Notation**                                               |
|-----------------------------------|-------------------------------------------------------------------------------------------------------------|--------------------------------------------------------------------|
| Population Size       | Total number of observations in the population.                                                             | $$ N $$                                                            |
| Sample Size            | Number of observations in the sample.                                                                       | $$ n $$                                                            |
| Population Mean       | The average value in the population.                                                                        | $$ \mu $$                                                          |
| Sample Mean      | The average value in the sample.                                                                            | $$ \bar{X} $$                                                      |
| Population Variance | The measure of variability in the population.                                                              | $$ \sigma^2 $$                                                     |
| Sample Variance       | The measure of variability in the sample.                                                                   | $$ s^2 $$                                                          |
| Population Standard Deviation | The square root of the population variance.                                                              | $$ \sigma $$                                                       |
| Sample Standard Deviation | The square root of the sample variance.                                                                   | $$ s $$                                                            |
| Population Standard Error (Infinite Population) | Standard error of the mean for an infinite or very large population.                                    | $$ \text{SE} = \frac{\sigma}{\sqrt{n}} $$                          |
| Population Standard Error (Finite Population) | Standard error of the mean for a finite population.                                                      | $$ \text{SE} = \frac{\sigma}{\sqrt{n}} \sqrt{\frac{N - n}{N - 1}} $$ |
| Sample Standard Error            | Standard error of the mean for a sample.                                                                    | $$ \text{SE} = \frac{s}{\sqrt{n}} $$                               |

### Functions

| **Function**                      | **Explanation**                                                                                              | **Formula/Notation**                                               |
|-----------------------------------|--------------------------------------------------------------------------------------------------------------|--------------------------------------------------------------------|
| Probability Density Function (PDF)| Used to find the probability that a continuous random variable falls within a particular range.              | $$ f(x) = \frac{1}{\sqrt{2\pi\sigma^2}} e^{-\frac{(x - \mu)^2}{2\sigma^2}} $$ |
| Cumulative Distribution Function (CDF) | Used to find the probability that a random variable is less than or equal to a certain value.              | $$ F(x) = \frac{1}{2} \left[ 1 + \text{erf} \left( \frac{x - \mu}{\sigma \sqrt{2}} \right) \right] $$ |
| Standard Normal Distribution      | Used to standardize normal distributions.                                                                     | $$ Z = \frac{X - \mu}{\sigma} $$                                   |

### Tests

| **Test**                          | **Explanation**                                                                                              | **Formula/Notation**                                               |
|-----------------------------------|--------------------------------------------------------------------------------------------------------------|--------------------------------------------------------------------|
| Z-Test                            | Used when the population standard deviation __sigma__ is known, and the sample size is large.           | $$ z = \frac{\bar{X} - \mu}{\frac{\sigma}{\sqrt{n}}} $$             |
|                                   |                                                                                                              | For Finite Population: $$ z = \frac{\bar{X} - \mu}{\frac{\sigma}{\sqrt{n}} \sqrt{\frac{N - n}{N - 1}}} $$ |
| T-Test                            | Used when the population standard deviation __sigma__ is unknown, especially for small sample sizes.    | $$ t = \frac{\bar{X} - \mu}{\frac{s}{\sqrt{n}}} $$                  |

### Confidence Intervals

| **Confidence Interval**           | **Explanation**                                                                                              | **Formula/Notation**                                               |
|-----------------------------------|--------------------------------------------------------------------------------------------------------------|--------------------------------------------------------------------|
| For Population Mean with Known \( \sigma \) | Confidence interval for the mean when the population standard deviation is known.                          | $$ \bar{X} \pm z \left( \frac{\sigma}{\sqrt{n}} \right) $$          |
|                                   |                                                                                                              | For Finite Population: $$ \bar{X} \pm z \left( \frac{\sigma}{\sqrt{n}} \sqrt{\frac{N - n}{N - 1}} \right) $$ |
| For Population Mean with Unknown \( \sigma \) | Confidence interval for the mean when the population standard deviation is unknown.                        | $$ \bar{X} \pm t \left( \frac{s}{\sqrt{n}} \right) $$               |

### Scenarios Where Normal Distribution Can Be Used Instead of Another Distribution

- **Central Limit Theorem (CLT)**: For large sample sizes, the distribution of the sample mean will be approximately normal, regardless of the population distribution.
- **Approximating Binomial Distribution**: When __n__ is large and __p__ is not too close to 0 or 1 (__np >= 5__ and __n(1 - p) >= 5__).
- **Approximating Poisson Distribution**: When the rate parameter is large: $$ \lambda (\lambda > 10)$$
n
## Z-Score vs Z-Test

| **Concept**              | **Definition/Explanation**                                                                                              | **Formula/Notation**                                               |
|--------------------------|-------------------------------------------------------------------------------------------------------------------------|--------------------------------------------------------------------|
| **Z-Score**              | Measures the number of standard deviations a data point is from the mean.                                                | $$ Z = \frac{X - \mu}{\sigma} $$                                   |
|                          | Used to standardize scores on the same scale by considering the mean and standard deviation of the data.                  |                                                                    |
| **Z-Test**               | Used to determine if there is a significant difference between the sample mean and the population mean.                  | $$ z = \frac{\bar{X} - \mu}{\frac{\sigma}{\sqrt{n}}} $$             |
|                          | Applicable when the population standard deviation (\( \sigma \)) is known and the sample size is large (n > 30).         |                                                                    |
|                          |                                                                                                                         | For Finite Population: $$ z = \frac{\bar{X} - \mu}{\frac{\sigma}{\sqrt{n}} \sqrt{\frac{N - n}{N - 1}}} $$ |