# Means of continuous distributions

In class, you've learned that expected value is not defined or infinite for some continuous distributions. 

But what does it mean in practice? If you collect some samples from such a distribution, you'll always be able to compute a *sample mean*. So what's the difference with the 'regular' distributions for which the mean is defined?

Let's try to find that out.

We will consider the following destributions:

- Exponential ($\lambda = 2$);
- Standard normal ($\mu = 0, \sigma=1$);
- Pareto ($\alpha = 1$)
- Cauchy ($x_0 = 0, \gamma = 1$).

In [None]:
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns

## A closer look at the distributions

To begin with, let's plot the PDFs of the distributions we'll be working with.

Let's start with Exponential and Pareto:



In [None]:
# Run this code

x = np.arange(0.001, 100, 0.01)

l = 2
y_exp = l*np.exp(-l*x)

a = 1
y_pareto = a/(x**(a+1))

fig, (ax1, ax2) = plt.subplots(1,2)
fig.set_size_inches((15,6))

ax1.plot(x, y_exp, label='lambda = ' + str(1./l))
ax1.set_title('Exponential distribution')
ax1.legend()

ax2.plot(x, y_pareto, label='alpha = ' + str(a))
ax2.set_title('Pareto distribution')
ax2.legend()

Do you observe any difference in shape when looking at the PDF plots for these two distributions?



Let's make similar plots for the Normal and Cauchy:

In [None]:
# Run this code

x = np.arange(-50, 50, 0.01)

y_normal = (1/np.sqrt(2*np.pi))*np.exp(-0.5*x**2)
y_cauchy = 1./(np.pi*(1+x**2))

fig, (ax1, ax2) = plt.subplots(1,2)
fig.set_size_inches((15,6))

ax1.plot(x, y_normal, label='mu = 0, s = 1')
ax1.set_title('Normal distribution')
ax1.legend()

ax2.plot(x, y_cauchy, label='alpha = ' + str(a))
ax2.set_title('Cauchy distribution')
ax2.legend()

Do you observe any difference in shape when looking at the PDF plots for these two distributions?

## Investigating sample means

You've learned in class that two out of four distributions in question don't have the mean. Let's see what it means in practice. 

Perform the following experiment. For each distribution in question, sample $n$ samples in compute their sample mean. Repeat this $k = 50$ times, recording the sample mean you obtain each time. 

In [None]:
n = 10000
k = 50

In [None]:
# EXPONENTIAL (L = 2)

means_exp = []

# lambda
l = 0.5

# Your code here
# use np.random.exponential(1./l, size=n) for sampling from the exponential


In [None]:
# STANDARD NORMAL

means_normal = []

# Your code here
# use np.random.normal(size=n)for sampling from the normal

In [None]:
# PARETO (a=1)

means_pareto = []
a = 1

# Your code here
# Use np.random.pareto(a, n) for sampling from Pareto

In [None]:
# CAUCHY 

means_cauchy = []

# Your code here
# use np.random.standard_cauchy() to sample from Cauchy

Now, plot the 50 sample means per each distribution. 

In [None]:
# Your code here

Do you see any difference between the sample mean plots for the distributions?

## Detecting heavy tails

Being able to detect heavy-tailed distributions is important in practice: such distributions don't have many important properties, so most of the statistical approaches aren't applicable to them.

At the same time, you can't see right away whether the samples are coming from a heavy-tailed distribution or not just from looking at the histogram (or even when looking at the true PDF function, as we saw before). 

Go back to Section 1 (plotting PDFs) and switch the scale from linear to log-log. Do you see any difference between the heavy- and light-tailed distributions?

Hint: you can switch the scale of an axis to log by applying the *set_yscale('log')* or *set_yscale('log')* method to it, e.g. *ax1.set_yscale('log')* and *ax1.set_xscale('log')*.
