## Hypothesis Testing
- Steps:
    1. Assume $H_0$, then null hypothesis.
<br/><br/>
    2. Check if data supports $H_0$ (fails to reject null hypothesis) or $H_1$ (rejects null hypothesis).
        - If $H_0$, $\beta_1 = 0$, (coefficients/parameters are zero) and 95% of Confidence Interval <strong>does not include</strong> zero.
        - If $H_1$, $\beta_1 \not= 0$, (coefficients/parameters are not zero) and 95% of Confidence Interval <strong>includes</strong> zero.

## p-value
- Represents the probability that the coefficient is actually zero.
<br/><br/>
- If the 95% of CI (Confidence Interval) does not include zero:
    - p < 0.05
    - Reject $H_0$
    - There is a relationship.
<br/><br/>
- If the 95% of CI (Confidence Interval) includes zero:
    - p > 0.05
    - Fail to reject $H_0$
    - There is no relationship.

## $R^2$: How well does the model fit data?
- $R^2$ used to evaluate linear model.
<br/><br/>
- $R^2$ (Coef. of determination) is the proportion of variance explained.
<br/><br/>
- Mathematically:
<br/><br/>
    - $R^2 = 1 - \frac{SS_{res}}{SS_{tot}}$ and $R^2 \in [0,1]$
<br/><br/>    
    - where $SS_{res} = \sum_i(y_i-f_i)^2 = \sum_ie_i^2$ and $SS_{tot}=\sum_i(y_i-\bar y)^2$
<br/><br/>
    - and $\bar y = \frac{1}{n}\sum y_i$ as the mean of the observed data
<br/><br/>
- Higher value of $R^2$ is better since it means that more variance is explained by the model.
<br/><br/>
- Threshold for a <strong>'good'</strong> $R^2$ depends on the domain, so the most useful tool is to compare different models.
<br/><br/>
- $R^2$ is susceptible to <strong>overfitting</strong> and thus there is no guarantee that a model with high $R^2$ value will generalize.
<br/><br/>
- <strong>Issue with $R^2$</strong>: R-squared will always increase as you add more features to the model, even if they are unrelated to the response. So selecting a model with the highest $R^2$ is not a reliable approach for choosing the best linear model.
<br/><br/>
- <strong>So how do we go around this issue?</strong>
    - Adjusted $R^2$: penalizes model complexity, but it generally <strong>under-penalizes complexity</strong>.
    - Train/test split or Cross validation: 
        - More reliable estimate of out-of-sample error (better for choosing which of the models will best <strong>generalize</strong> to out-of-sample data.

## Residual Plots
$$Residual = Observed - Predicted$$
- Positive values for the residual (on the y-axis) mean the prediction was too low, and negative values mean the prediction was too high. 0 means that the guess was exactly correct.
- Following are some examples of ideal residual plots:
![img](assets/ideal-residual.png)
![img](assets/ideal-residual1.png)
![img](assets/ideal-residual2.png)
![img](assets/ideal-residual3.png)
- An ideal residual plot are:
    - symmetrically distributed, tend to cluster around the center of the plot.
    - clustered around the lower single digits of the y-axis (i.e. 0.5, 1.5 instead of 30 or 150).
    - have no clear patterns.
<br/><br/>
- Following are some examples of bad residual plots:
![img](assets/bad-residual.png)
![img](assets/bad-residual1.png)
![img](assets/bad-residual2.png)
![img](assets/bad-residual3.png)
- These plots are:
    - not evenly distributed vertically, or have outliers, or have a clear shape.
    - have clear pattern that is apparent.
- So how much does this matter?
    - <strong>“Essentially, all models are wrong, but some are useful”</strong> - George Box
    - Performance and accuracy may be an issue if you are trying to publish your thesis in particle physics.
    - If you are trying to run a quick dirty analysis, less than a perfect model might be good enough to answer whatever questions you have.
    - Most of the time a descent model is better than none at all!
<br/><br/>

Source:
- https://www.ritchieng.com/machine-learning-evaluate-linear-regression-model/
- https://en.wikipedia.org/wiki/Coefficient_of_determination#Definitions
- http://scott.fortmann-roe.com/docs/MeasuringError.html