# Random Variables

Frequently, when an experiment is performed, we are interested mainly in some function of the outcome as opposed to the actual outcome itself.

For instance,<br>
1) In recent flipping a coin experiment, we may be interested in the total number of heads that occur and not care at all about the actual **Head(H)–Tail(T)** sequence that results. <br>

2) In throwing dice, we are often interested in the sum of the two dice and are not really concerned about the separate values of each die. That is, we may be interested in knowingthat the sum is 7 and may not be concerned over whether the actual outcome was: (1, 6), (2, 5), (3, 4), (4, 3), (5, 2), or (6, 1). <br>

Also, These quantities of interest, or, more formally, these real-valued functions defined on the sample space, are known as '**Random Variables**'.

## Lets do an experiment with Python to demostrate 
> ### *Why we need Random Variables?*

## & show its importance

In [19]:
import numpy as np
import pandas as pd
from itertools import product
# from IPython.core.display import HTML
# css = open('media/style-table.css').read() + open('media/style-notebook.css').read()
# HTML('<style>{}</style>'.format(css))

In [20]:
one_toss = np.array(['H', 'T'])

In [21]:
two_tosses = list(product(one_toss, repeat=2))
two_tosses

[('H', 'H'), ('H', 'T'), ('T', 'H'), ('T', 'T')]

In [22]:
# For three tosses, just change the number of repetitions:
three_tosses = list(product(one_toss, repeat=3))
three_tosses

[('H', 'H', 'H'),
 ('H', 'H', 'T'),
 ('H', 'T', 'H'),
 ('H', 'T', 'T'),
 ('T', 'H', 'H'),
 ('T', 'H', 'T'),
 ('T', 'T', 'H'),
 ('T', 'T', 'T')]

As shown earlier in slide,<br>
A *probability space* $(\Omega, P)$ is an outcome space accompanied by the probabilities of all the outcomes.
<br>If you assume all eight outcomes of three tosses are equally likely, the probabilities are all 1/8:


In [23]:
three_toss_probs = (1/8)*np.ones(8)

In [24]:
three_toss_space = pd.DataFrame({
    'Omega':three_tosses,
    'P(omega)':three_toss_probs
})
three_toss_space

Unnamed: 0,Omega,P(omega)
0,"(H, H, H)",0.125
1,"(H, H, T)",0.125
2,"(H, T, H)",0.125
3,"(H, T, T)",0.125
4,"(T, H, H)",0.125
5,"(T, H, T)",0.125
6,"(T, T, H)",0.125
7,"(T, T, T)",0.125


As you can see above, Product spaces(Probability spaces) get large very quickly. 
If we are tossing 10 times, the outcome space would consist of the $2^{10}$  sequences of 10 elements where each element is H or T. <br>
The outcomes are a pain to list by hand, but computers are good at saving us that kind of pain.

Lets take example of rolling die,<br>
If we roll a die 5 times, there are almost 8,000 possible outcomes:

In [52]:
die = np.arange(1, 7, 1)
five_rolls = list(product(die, repeat=5))
# five_rolls = [list(i) for i in product(die, repeat=5)]
five_roll_probs = (1/6**5)**np.ones(6**5)
five_roll_space = pd.DataFrame({
   'Omega':five_rolls,
    'P(omega)':five_roll_probs
})
five_roll_space

Unnamed: 0,Omega,P(omega)
0,"(1, 1, 1, 1, 1)",0.000129
1,"(1, 1, 1, 1, 2)",0.000129
2,"(1, 1, 1, 1, 3)",0.000129
3,"(1, 1, 1, 1, 4)",0.000129
4,"(1, 1, 1, 1, 5)",0.000129
5,"(1, 1, 1, 1, 6)",0.000129
6,"(1, 1, 1, 2, 1)",0.000129
7,"(1, 1, 1, 2, 2)",0.000129
8,"(1, 1, 1, 2, 3)",0.000129
9,"(1, 1, 1, 2, 4)",0.000129


## A Function on the Outcome Space

Suppose you roll a die five times and add up the number of spots you see. If that seems artificial, be patient for a moment and you'll soon see why it's interesting.

The sum of the rolls is a numerical function on the outcome space $\Omega$  of five rolls. The sum is thus a random variable. Let's call it  $S$ . Then, formally,
$S: \Omega \rightarrow \{ 5, 6, \ldots, 30 \}$
 
The range of  $S$  is the integers 5 through 30, because each die shows at least one and at most six spots. We can also use the equivalent notation

$\Omega \stackrel{S}{\rightarrow} \{ 5, 6, \ldots, 30 \}$
 
From a computational perspective, the elements of  $\Omega$  are in the column omega of five_roll_space. Let's apply this function and create a larger table.

In [54]:
five_rolls_sum = pd.DataFrame({
    'Omega':five_rolls,
    'S(omega)':five_roll_space['Omega'].map(lambda val: sum(val)),
    'P(omega)':five_roll_probs
})
five_rolls_sum

Unnamed: 0,Omega,P(omega),S(omega)
0,"(1, 1, 1, 1, 1)",0.000129,5
1,"(1, 1, 1, 1, 2)",0.000129,6
2,"(1, 1, 1, 1, 3)",0.000129,7
3,"(1, 1, 1, 1, 4)",0.000129,8
4,"(1, 1, 1, 1, 5)",0.000129,9
5,"(1, 1, 1, 1, 6)",0.000129,10
6,"(1, 1, 1, 2, 1)",0.000129,6
7,"(1, 1, 1, 2, 2)",0.000129,7
8,"(1, 1, 1, 2, 3)",0.000129,8
9,"(1, 1, 1, 2, 4)",0.000129,9


### Functions of Random Variables,
A random variable is a numerical function on  $\Omega$ . Therefore by composition, a numerical function of a random variable is also a random variable.

For example,  $S^2$  is a random variable, calculated as follows:

$S^2(\omega) = \big{(} S(\omega)\big{)}^2$
 
Thus for example  $S^2(\text{[6 6 6 6 6]}) = 30^2 = 900$.

### Events Determined by $S$ ###
From the table `five_rolls_sum` it is hard to tell how many rows show a sum of 6, or 10, or any other value. To better understand the properties of $S$, we have to organize the information in `five_rolls_sum`.

For any subset $A$ of the range of $S$, define the event $\{S \in A\}$ as

$$
\{S \in A \} = \{\omega: S(\omega) \in A \}
$$

That is, $\{ S \in A\}$ is the collection of all $\omega$ for which $S(\omega)$ is in $A$.

If that definition looks unfriendly, try it out in a special case. Take $A = \{5, 30\}$. Then $\{S \in A\}$ if and only if either all the rolls show 1 spot or all the rolls show 6 spots. So 
$$
\{S \in A\} = \{\text{[1 1 1 1 1], [6 6 6 6 6]}\}
$$

It is natural to ask about the chance the sum is a particular value, say 10. That's not easy to read off the table, but we can access the corresponding rows:

In [60]:
five_rolls_sum[five_rolls_sum['S(omega)']==10]

Unnamed: 0,Omega,P(omega),S(omega)
5,"(1, 1, 1, 1, 6)",0.000129,10
10,"(1, 1, 1, 2, 5)",0.000129,10
15,"(1, 1, 1, 3, 4)",0.000129,10
20,"(1, 1, 1, 4, 3)",0.000129,10
25,"(1, 1, 1, 5, 2)",0.000129,10
30,"(1, 1, 1, 6, 1)",0.000129,10
40,"(1, 1, 2, 1, 5)",0.000129,10
45,"(1, 1, 2, 2, 4)",0.000129,10
50,"(1, 1, 2, 3, 3)",0.000129,10
55,"(1, 1, 2, 4, 2)",0.000129,10


There are 126 values of $\omega$ for which $S(\omega) = 10$. Since all the $\omega$ are equally likely, the chance that $S$ has the value 10 is 126/7776. 

We are informal with notation and write $\{ S = 10 \}$ instead of $\{ S \in \{10\} \}$:
$$
P(S = 10) = \frac{126}{7776} = 1.62\%
$$

### This is how Random Variables help us quantify the results of experiments for the purpose of analysis. 
i.e., Random variables provide *numerical* summaries of the experiment in question. - Stats110 harvard (also below paragraph)

>This definition is abstract but fundamental; one of the most important skills to
develop when studying probability and statistics is the ability to go back and forth
between abstract ideas and concrete examples. Relatedly, it is important to work
on recognizing the essential pattern or structure of a problem and how it connectsto problems you have studied previously. We will often discuss stories that involve
tossing coins or drawing balls from urns because they are simple, convenient sce-
narios to work with, but many other problems are isomorphic: they have the same
essential structure, but in a different guise.

we can use mathematical opeartion on these variables since they are real valued function nowto problems you have studied previously. We will often discuss stories that involve
tossing coins or drawing balls from urns because they are simple, convenient sce-
narios to work with, but many other problems are isomorphic: they have the same
essential structure, but in a di↵erent guise.