## Bias
***

### Exercise 1

#### *Give three real-world examples of cognitive bias*

* Confirmation bias: selectively absorbing only information which reinforces your existing beliefs. An example of this could be if you leaned towards a conservative political outlook, but only consumed news/information from Fox News etc., you will inevitably end up with a distorted picture of the social/political landscape which validates your existing prejudices.  Daniel Kahneman wrote about this phenomenon in his book "Thinking, Fast and Slow", in a chapter entitled "A Machine for Jumping to Conclusions" : "Contrary to the rules of philosophers of science, who advise testing hypotheses by trying to refute them, people (and scientists, quite often) seek data that are likely to be compatible with the beliefs they currently hold".


* Hyperbolic discounting: "our inclination to choose immediate rewards over rewards that come later in the future, even when these immediate rewards are smaller." An example of this: we are all increasingly aware of the environmental cost of air travel, yet we continue to fly because it suits us in the short-term and we do not have an immediate sense of the damage which harmful emissions are causing to the environment in the long-term.


* Bandwagon effect: "Uptick of beliefs and ideas increases the more that they have already been adopted by others."  An example of this would be the recent surge in investments in cryptocurrency and NFTs. Despite the relative lack of knowledge of such a new and untested phenomenon and the extremely high risk involved, many jumped at the chance to invest in crypto and NFT purely because other (often high-profile) people had shown an interest in same.


#### *References:* 

https://positivepsychology.com/cognitive-biases/

Kahneman, Daniel. *Thinking Fast and Slow* Penguin Books, 2011

### Mean

In [None]:
import numpy as np

# Generate a sample of 1000 values from a normal distribution.
x = np.random.normal(10.0, 1.0, 1000)
print(x)

In [None]:
# We expect the mean of the sample to be close to the mean of the population.
x.mean()

In [None]:
# Let's run a simulation of taking 1000 samples of size 1000.
samples = np.random.normal(10.0, 1.0, (1000, 1000))
samples

In [None]:
# Get the mean of the first sample.
samples[0].mean()

In [None]:
# Calculate the mean of all samples.
sample_means = samples.mean(axis=1)
sample_means

In [None]:
import matplotlib.pyplot as plt
plt.hist(sample_means)

### Standard Deviation

In [None]:
# A list of numbers - four small and one big.
numbers1 = np.array([1, 1, 1, 1, 10])

# Their mean.
np.mean(numbers1)

In [None]:
# A list of numbers - all close to each other.
numbers2 = np.array([2, 2, 3, 3, 4])

# Their mean.
np.mean(numbers2)

In [None]:
# Calculate the mean.
x_mean = x.mean()

# Subtract the mean from each of the values.
zeroed = x - x_mean

# Have a look at the zeroed values.
zeroed


In [None]:
# What do you think the mean of zeroed is?
zeroed.mean()

In [None]:
# Create a plot.
fig, ax = plt.subplots(figsize=(12, 6))

# Plot the zeroed array, each value spaced out evenly along the x axis.
# Note the x axis is just the position of the value in the zeroed array.
ax.plot(range(len(zeroed)), zeroed, 'k.')

# Plot the y=0 line.
ax.axhline(y=0.0, color='grey', linestyle='-');

In [None]:
# Absolute values.
np.abs(zeroed)

In [None]:
# Average absolute value.
np.mean(np.abs(zeroed))

In [None]:
# Square the values.
np.square(zeroed)

In [None]:
# Create a plot.
fig, ax = plt.subplots(figsize=(12, 6))

# Plot the squared zeroed array, each value spaced out evenly along the x axis.
# Note the x axis is just the position of the value in the zeroed array.
ax.plot(range(len(zeroed)), np.square(zeroed), color='green', marker='.', linestyle='none')

# Plot the zeroed array, each value spaced out evenly along the x axis.
# Note the x axis is just the position of the value in the zeroed array.
ax.plot(range(len(zeroed)), zeroed, 'k.')

# Plot the y=0 line.
ax.axhline(y=0.0, color='grey', linestyle='-');

In [None]:
# Calculate the average squared result.
np.mean(np.square(zeroed))

In [None]:
# Calculate the square root of the average squared result.
np.sqrt(np.mean(np.square(zeroed)))

In [None]:
# The full calculation using the original array.
np.sqrt(np.mean(np.square(x - np.mean(x))))

In [None]:
# Note that the function is built into numpy.
x.std()

## Exercise 2

#### *Calculate the standard deviations of the following two lists of numbers from earlier in the lecture:*

In [None]:
# list of numbers (4 small, 1 large)
numbers1 = np.array([1, 1, 1, 1, 10])
numbers1.std()

In [None]:
# list of numbers close together
numbers2 = np.array([2, 2, 3, 3, 4])
numbers2.std()

The above results make logical sense because the spread in the first list of numbers spans from 1 to 10, so is much wider than the second list of numbers, which are grouped tightly together.

## Exercise 3

#### *Show that the difference between the standard deviation calculations is greatest for small sample sizes.*

In [None]:
# calculate std for small sample size
sample_small = np.random.normal(0.0, 2.0, (10000, 5))
sample_small

In [None]:
# Calculate standard deviation without correction.
stdevs_small = sample_small.std(axis=1)
stdevs_small

In [None]:
# Create histogram to verify if estimate is too small.
fig, ax = plt.subplots(figsize=(12, 6))

# Plot histogram.
plt.hist(stdevs_small, bins=10)

# Draw a vertical line where the actual standard deviation is.
plt.axvline(x=2.0, color='red');

As we can see from the histogram of the small sample size above, the estimate is clearly off as the bell curve is not centered on the std of 2.
Next I will perform the same calculations on a large sample size for comparison.

In [None]:
# calculate std for large sample size
sample_large = np.random.normal(0.0, 2.0, (100, 10000))
sample_large

In [None]:
# Calculate standard deviation without correction.
stdevs_large = sample_large.std(axis=1)
stdevs_large

In [None]:
# Create histogram to check accuracy of estimate
fig, ax = plt.subplots(figsize=(12, 6))

# Plot histogram.
plt.hist(stdevs_large, bins=10)

# Draw a vertical line where the actual standard deviation is.
plt.axvline(x=2.0, color='red');

As we can see from the histogram of the much larger sample size above, the estimate is much more accurate as the bell curve is almost centered on the std of 2.

As the website statisticshowto.com states: "Warne (2017) advocates using Bessel’s correction only if you have a sufficiently large sample and if you are actually trying to approximate the population mean. If you are just interested in finding the sample mean, and don’t want to extrapolate your findings to the population, just omit the correction."

#### *References*

https://www.statisticshowto.com/bessels-correction/

Warne, T. (2017). Statistics for the Social Sciences: A General Linear Model Approach. Cambridge University press.    

## End
***