# Statistics for everyone

How to use statistics to make better decisions

In [1]:
import numpy as np
import pandas as pd
import altair as alt
from helpers.svg_wrapper import SVGImg
import helpers.plotting as pt
pt.enable_slide_theme()
pt.import_slide_theme_font_in_notebook()

## Intuitive decision making can lead you astray

- B. F. Skinner [showed](https://www.youtube.com/watch?v=TtfQlkGwE2U) that rewards reinforce behaviours.
- Hence, animals can be [trained](https://www.ncbi.nlm.nih.gov/pmc/articles/PMC1473025/) to do 
  [many tasks](https://www.theatlantic.com/technology/archive/2013/06/skinner-marketing-were-the-rats-and-facebook-likes-are-the-reward/276613/)
- Randomly rewarding [pigeons](https://www.all-about-psychology.com/support-files/superstition-in-the-pigeon.pdf) 
  [reinforced](https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2615824/)<br> apparently random behaviours 
  ("[superstitions](https://www.psychologicalscience.org/observer/the-many-lives-of-superstition)").

In [2]:
SVGImg('images/pigeon.svg', width='100%', center=False, output_dir='slides')

## Reasoning and discussing isn't always enough

- Participants in the TV show [Trick or Treat (S2/E6)](https://en.wikipedia.org/wiki/Trick_or_Treat_%28TV_series%29) had to find out how to score points.
- They developed complex ideas on how using various available objects scored points

In [3]:
SVGImg('images/experiment.svg', width='50%', output_dir='slides')

## Humans detect patterns where there are none

In the show, points increased every time a fish swam across an aquarium

In [4]:
SVGImg('images/fish.svg', width='45%', output_dir='slides')

"We got to a hundred points, so we obviously did it right." - A participant.

## Focusing on spurious signals can be costly

- Participants missed actual hints that would have allowed them<br> to win more money more easily 
- See [part 1](https://www.youtube.com/watch?v=IDi2NlsA4nI) and [part 2](https://www.youtube.com/watch?v=yzXSSPp4Epg)

In [5]:
SVGImg('images/sign.svg', width='70%', output_dir='slides')

## Even quantitative data can mislead without proper analysis

In [6]:
SVGImg('images/spurious_correlations.svg', width='90%', output_dir='slides')

See also: [Spurious correlations](http://www.tylervigen.com/spurious-correlations), [Why not to use two axes](https://blog.datawrapper.de/dualaxis)

## Algorithms can make similar mistakes

- One common problem in machine learning is [over- or underfitting](https://en.wikipedia.org/wiki/Overfitting) variations in training data.
- The resulting models don't generalise well to future observations.

In [7]:
rng = np.random.RandomState(seed=1)
x = np.linspace(-1.5, 1., 10)
y = -x**2 + rng.normal(0, 0.1, size=x.shape)
df = pd.DataFrame({'x': x, 'y': y})

# Define the degree of the polynomial fits
degree_list = [1, 2, 9]
title_list = ['Underfit','Good Fit','Overfit']

yscale = alt.Scale(domain=[y.min()-1.5, y.max()+1.2])
xscale = alt.Scale(padding=1)

base = alt.Chart(df).mark_circle(color="black").encode(
    alt.X("x", axis=None, scale=xscale), 
    alt.Y("y", axis=None, scale=yscale)
)
polynomial_fit = [
    base + base.transform_regression(
        "x", "y", method="poly", order=order
    )
    .mark_line()
    .properties(width=150, height=100, title=title)
    for title, order in zip(title_list, degree_list)
]

alt.hconcat(
    *polynomial_fit
).configure_view(
    strokeWidth=0
).configure_title(
    orient='bottom'
).display(
    renderer='svg'
)

## Statistics can help to tell the signal from the noise

### <br>For example:
- In 2016, many were certain that Clinton would win the US election.
- On Nov. 4, four days before the election, [a statistics website reported](https://fivethirtyeight.com/features/trump-is-just-a-normal-polling-error-behind-clinton):<br> "Trump Is Just A Normal Polling Error Behind Clinton"

## You can get better at statistics!

- Follow this course to improve your decision making skills
- Suitable for general audiences

Next section: [An introductory experiment](1_introductory_card_experiment.slides.html)