## TODO - Further Reads

A few interesting Wikipedia articles:

Generalities
+ https://en.wikipedia.org/wiki/Sampling_distribution
+ https://en.wikipedia.org/wiki/Statistical_hypothesis_testing 

Probabilities
+ https://en.wikipedia.org/wiki/Probability_interpretations
+ https://en.wikipedia.org/wiki/Frequentist_probability
+ https://en.wikipedia.org/wiki/Bayesian_probability

Inference paradigms:
+ https://en.wikipedia.org/wiki/Frequentist_inference
+ https://en.wikipedia.org/wiki/Bayesian_inference
+ https://en.wikipedia.org/wiki/Lindley%27s_paradox
+ https://www.stat.berkeley.edu/~stark/Preprints/611.pdf

PArametric vs Ordinal
+ https://tech.snmjournals.org/content/46/3/318.2#:~:text=Currie%20writes%2C%20%E2%80%9CThe%20Likert%20scale,the%20data%20ordinal%20in%20nature.&text=Moreover%2C%20he%20concludes%20that%20parametric,distribution%20of%20data)%20are%20violated.
+ https://www.researchgate.net/post/What_is_the_most_suitable_statistical_test_for_ordinal_data_eg_Likert_scales


## TODO - LIMITS OF SUMMARY STATISTICS - ANSCOMBES QUARTET

[Anscombe's quartet](https://en.wikipedia.org/wiki/Anscombe%27s_quartet) comprises four data sets of eleven data points (see below) that have nearly identical descriptive statistics, yet have very different distributions and appear very different when [graphed](https://matplotlib.org/3.2.1/gallery/specialty_plots/anscombe.html). They were constructed in 1973 by the statistician Francis Anscombe to demonstrate both the importance of graphing data before analyzing it and the effect of outliers and other influential observations on statistical properties.

1. simple linear relationship with gaussian noise. 
1. clear non-linear relationship between variables; the Pearson correlation coefficient is not relevant here. A more general regression and the corresponding [coefficient of determination](https://en.wikipedia.org/wiki/Coefficient_of_determination) would be more appropriate. 
1. the relationship is linear but  one outlier has enough influence to offset the calculated regression; it lowers the correlation coefficient from 1 to 0.816. A [robust regression](https://en.wikipedia.org/wiki/Robust_regression) would be more appropriate here.
1. example when one [high-leverage point](https://en.wikipedia.org/wiki/Leverage_(statistics)) is enough to produce a high correlation coefficient, even though the other data points do not indicate any relationship between the variables.


In [None]:
# data 
x = [10, 8, 13, 9, 11, 14, 6, 4, 12, 7, 5]
y1 = [8.04, 6.95, 7.58, 8.81, 8.33, 9.96, 7.24, 4.26, 10.84, 4.82, 5.68]
y2 = [9.14, 8.14, 8.74, 8.77, 9.26, 8.10, 6.13, 3.10, 9.13, 7.26, 4.74]
y3 = [7.46, 6.77, 12.74, 7.11, 7.81, 8.84, 6.08, 5.39, 8.15, 6.42, 5.73]
x4 = [8, 8, 8, 8, 8, 8, 8, 19, 8, 8, 8]
y4 = [6.58, 5.76, 7.71, 8.84, 8.47, 7.04, 5.25, 12.50, 5.56, 7.91, 6.89]

datasets = {
    'I': (x, y1),
    'II': (x, y2),
    'III': (x, y3),
    'IV': (x4, y4)
}

# create fig
fig, axs = plt.subplots(nrows=1, ncols=4, sharex=True, sharey=True, figsize=(15, 4))
x_lin = np.array([np.min(x+x4), np.max(x+x4)])

for ax, (label, (x, y)) in zip(axs.flat, datasets.items()):

    # linear regression
    p1, p0 = np.polyfit(x, y, deg=1)
    y_lin = p1 * x_lin + p0

    # plot
    ax.plot(x, y, 'o')
    ax.plot(x_lin, y_lin, 'r-', alpha=0.5, lw=2)

    # add title
    ax.set_title(label)

plt.tight_layout(rect=[0, 0, 0.9, 0.9])
