<a href="https://colab.research.google.com/github/jay10440/academics/blob/main/Chapter_3_Mendenhall_Probability_Distributions.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# The Expected Value of a Random Variable

In [1]:
#We'll go ahead and start loading packages, mainly for use of analysis.
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
from scipy.stats import binom, uniform, norm, poisson

%matplotlib inline
sns.set(style='whitegrid')

Starting with some basic definitions:

Definition: Let $Y$ be a discrete random variable with the probability function $p(y)$. Then the **expected value** of $Y$, $E(y)$, is defind to be:

$$E(Y)=\sum_y y p(y)$$

If $p(y)$ is an accurate characterization of the population frequency distribution, then $E(y)=\mu$.

Say we have the table where $y=(0, 1, 2)$ and $p(y)= 1/4, 1/2, 1/4$. The expected value can be computed this way:

$$E(Y)=0(1/4) + 1(1/2) + 2(1/4)$$

In [2]:
0*.25 + 1*.5 + 2*.25

1.0

Theorem: Let $Y$ be a discrete random variable with probability function $p(y)$ and $g(Y)$ be a real-valued function of $Y$. Then the expected value of $g(Y)$ is:
$$ E[g(Y)]=\sum_{\forall y} g(y) p(y)$$.

See the proof in the book.

Definition: IF $Y$ is a random variable with mean $E(Y)=\mu$, the variance of a random variable $Y$ is defined to be the expected value of $(Y-\mu)^2$. That is:
$$V(Y)= E[(Y-\mu)^2]$$.

The standard deviation follows.

Here's a brief example. Say we have the list $y=0,1,2,3$ and $p(y)=1/8, 1/4, 3/8, 1/4$. Let's calculate the expected value, variance and standard deviation.

In [7]:
#First, we define the lists as shown.
y_val1=[0,1,2,3]
pyval1=[1/8, 1/4, 3/8, 1/4]
#Turn them into arrays so we can multiply and sum them.
y=np.array(y_val1)
p=np.array(pyval1)
#Apply the expected value equation.
sum(y*p)

np.float64(1.75)

From our definition above, we can apply the expected value using the following:

$$E[(Y-\mu)^2]=\sum_y=0^3 (y-\mu)^2p(y)$$. Before continuing, to make things computationally easy, we introduce a theorem:

Combination Theorem: Let $Y$ be a discrete random variable with probability function. The following are true:

$E[c(g(Y))]= cE[g(Y)]$.

$E[g_1(Y) + g_2(Y)+...+g_k(Y)]=E[g_1(Y)]+E[g_2(Y)]+...+E[g_k(Y)]$.

Important:
For $V(Y)=\sigma^2$, $E[(Y-\mu)^2]=E(Y^2)-\mu^2$.

Proof:

$E[(Y-\mu)^2]=E[Y^2-2Y\mu+\mu^2]=E[Y^2]-2E[Y]\mu + E[\mu^2]$.

Notice that $E[Y]=\mu$ Further, we notice that $E[\mu^2]=\sum \mu^2 p(y) = \mu^2 \sum p(y) = \mu^2$. Therefore:

$= E[Y^2]-2\mu^2+\mu^2 = E[Y^2]-\mu^2$.

Now, we can compute the previous example in Python using these new results.



In [8]:
#Computing E[Y^2] first:
sum(y**2 * p)

np.float64(4.0)

In [9]:
#Now, we compute \mu^2:
1.75*1.75

3.0625

In [10]:
#Finally the Variance:
4-3.0625

0.9375

Let's get some experience creating functions for this type of computation, though we may already have one in place.

In [17]:
def measures_ensemble(y_vals,p_vals, verbose=True):
  y=np.array(y_vals)
  p=np.array(p_vals)

  if not np.isclose(np.sum(p), 1.0):
    raise ValueError("Probabilities must sum to 1.")

  expected=sum(y*p)

  expected_squared=np.sum((y**2)*p)
  variance = expected_squared - expected**2
  stndev=np.sqrt(variance)

  if verbose:
    print(f"Outcomes: {y_vals}")
    print(f"Probabilities: {p_vals}")
    print(f"Expected value E[Y]: {expected:.4f}")
    print(f"Variance Var[Y]: {variance:.4f}")
    print(f"Standard deviation SD[Y]: {stndev:.4f}")
  else:
    return{
        'E[Y]': expected,
        'Var[Y]': variance,
        'SD[Y]': stndev
    }

In [18]:
y_vals=[1,3,5,7]
p_vals=[.1, .3, .5, .1]
measures_ensemble(y_vals, p_vals)

Outcomes: [1, 3, 5, 7]
Probabilities: [0.1, 0.3, 0.5, 0.1]
Expected value E[Y]: 4.2000
Variance Var[Y]: 2.5600
Standard deviation SD[Y]: 1.6000
