# Interpreting diagnostic plots

The objective of this notebook is to provide some insight into diagnositic plots. If we consider a plot as being "ok" or "not ok". What are we actually looking for.

_tldr; diagnostic plots are graphical representations of data that can be used to identify outliers, assess the normality of a dataset, detect heteroscedasticity, and assess the linearity of a regression model. They are used to diagnose potential problems with a dataset or model, and can help inform decisions about data pre-processing and model selection._

A good article delving into the various diagnostic plots can be found [here](https://data.library.virginia.edu/diagnostic-plots/). I have lent on this article much within this notebook.

We see four common diagnostic plots including scatter plots, histograms, boxplots, residual plots, and Q-Q plots. Our focus in this notebook will be those that are plotted by default in R, these include _Residual vs Fitted_ , _Normal Q-Q_, _Scale Location_, _Residual vs Leverage_. Each of these plots has a specific purpose and can help uncover different issues with a dataset or model. We will take a look at each one a break down what it measures, this will help us understand how we should be intreting the model fit.

## Residual vs Fitted
The residual versus fitted plot is a diagnostic plot used to assess the linearity of a regression model. It shows the residuals (the difference between the observed values and the predicted values) on the vertical axis and the predicted values (or fitted values) on the horizontal axis. If the points on the plot are randomly scattered around a horizontal line, then the regression model is linear. If the points form a pattern, such as a U-shape or a curve, then the model is not linear.

![residual vs fitted](./figures/22_11_29-hh-residual_vs_fitted.png)

ToDo: Show a full example of what an incorrect function looks like, this should include, true function (quadratic), model (linear), residuals and residual vs fitted.

## Normal Q-Q
A normal Q-Q (quantile-quantile) plot is a diagnostic plot used to assess the normality of a dataset. It compares the quantiles of a dataset with the quantiles of a normal distribution. If the points on the plot fall on a straight line, then the dataset is normally distributed. If the points deviate from the line, then the dataset is not normally distributed.

![normal q-q](./figures/22_11_29-hh-normal_q_q.png)

ToDo: Show examples of various gaussian distributions and their corresponding q-q plots.

## Scale-Location

A scale location plot is a diagnostic plot used to detect heteroscedasticity (non-constant variance) in a dataset. It shows the square root of the standardized residuals (the difference between the observed values and the predicted values) on the vertical axis and the predicted values (or fitted values) on the horizontal axis. If the points on the plot are randomly scattered around a horizontal line, then the variance is constant. If the points form a pattern, such as a U-shape or a curve, then the variance is not constant.

![scale location](./figures/22_11_29-hh-scale_location.png)

ToDo: Show examples heteroscedatic models and non-heteroscedastic model and then show corresponding q-q plot.

Diagnostic plots are useful because they can help identify potential problems with a dataset or model. By visually inspecting diagnostic plots, data scientists can quickly identify outliers, assess the normality of a dataset, detect heteroscedasticity, and assess the linearity of a regression model. This can help inform decisions about data pre-processing and model selection.