# Statistical Analysis

## Awareness
1. Awareness of advanced statistical concepts and applications.
2. Understanding advanced statistical concepts, methods, and applications.

***

### Statistical Modelling
#### Generalized Linear Models (GLM)

Three components of the GLM:
1. **Random Component**: response variable, the $ Y $ component.
2. **Systematic Component**: explanatory variables, $ x_{1} $, $ x_{2} $, ..., $ x_{k} $.
3. **Link Function**: the link between the random and the systematic components, $ g(\mu) $.


Following are the examples of GLMs:

##### 1. Simple Linear Regression (SLR)

$$ \mu_{i} = \beta_{0} + \beta x_{i} $$

1. **Random component**: the normal distribution of $ Y $ with mean $ \mu $ and constant variance $ \sigma^{2} $.
2. **Systematic Component**: linear in the parameters $ \beta_{0} + \beta x_{i} $.
3. **Link Function**: the identity link $ \eta = g(E(Y)) = E(Y) $ is used.


##### 2. Binary Logistic Regression (Logit)

$$ logit(\pi_{i}) = log (\frac{\pi_{i}}{1- \pi_{i}}) =\beta_{0} + \beta x_{i} $$

1. **Random component**: the binomial distribution with a single trial and success probability $ E(Y) = \pi $.
2. **Systematic Component**: linear in the parameters $ x $.
3. **Link Function**: the log-odds or logit link $ \eta = g(\pi) = log (\frac{\pi_{i}}{1- \pi_{i}}) $ is used.


##### 3. Poisson Regression

$$ \log \lambda_i=\beta_0+\beta x_i $$

1. **Random component**: the poisson distribution with mean $ \mu $.
2. **Systematic Component**: linear in the parameters $ x $.
3. **Link Function**: the log link is used.


Source: [Introduction to GLMs](https://online.stat.psu.edu/stat504/lesson/6/6.1)


#### Structural Equation Modeling (SEM)
A multivariate statistical technique for estimating complex relationships between observed and latent variables.

![image.png](attachment:image.png)

SEM model example
+ $ y_{k} $ are indicator/ observed variables.
+ $ \eta_{k} $ are latent factors.
+ $ x_{k} $ are observed variables (but not indicators).
+ unindirectional arrows are regression.
+ bidirectional arrows are parameterised covariance.

Source: [semopy: A Python Package for Structural Equation Modeling](https://stat.paperswithcode.com/paper/semopy-2-a-structural-equation-modeling)

***

### Design of Experiments (DOE)
A statistical analysis to study the relationship between multiple input variables (or factors) and key output (or responses).

Source: [Design of Experiments](https://www.jmp.com/en_my/statistics-knowledge-portal/what-is-design-of-experiments.html)

***

### Bayesian Statistics
Bayesian use probability to describe degrees of belief in parameter values.

$$ p(\beta | Y) ∝  f(Y | \beta) × \pi(\beta)$$
$$ Posterior ∝  Likelihood × Prior $$

+ Prior distribution: what you know about parameter $ \beta $, excluding the information in the data.
+ Likelihood: based on model assumptions.

Source: [Bayesian Inference](https://faculty.washington.edu/kenrice/BayesIntroClassEpi2018.pdf)

***

### Time Series Forecasting
Predicting future values baesed on historical data.

***

### Statistical Process Control
A data-driven approach to improve the quality of processes by identifying and eliminating sources of variation.

***

### Sampling
Process of selecting a representative subset of a population to collect data and learn about the whole population.

****

### Bootstraping
A statistical technique that uses resampling with replacement to estimate the statistical properties of a sample.

## Knowledge
1. Apply advanced statistical  methods effectively.
2. Understand advanced statistical methods. methods.
3. Able to use statistical software or packages with advanced statistical techniques.