## Definition of a Real Random Variable 

In the introduction, we provided an informal definition for a random variable. The informal definition is helpful to build some intuition about what we mean by a random variable. However, we want to develop random variables in such a way that we can take advantage of all of the properties of the *probability spaces* that we developed in {doc}`../04-probability1/axiomatic-prob.ipynb`.  In this class, we will only consider the case of **real** random variables, but random variables can also be complex-valued.




Let's start by introducing some notation that we will use for  random variables and in our discussion below:
* We will generally use upper-case (capital) letters to represent a random variable, such as $X$, $Y$, or $Z$. We may choose the letter to represent a particular quantity, such as representing a random time by $T$ or a random rate by $R$. For a generic random variable, we will usually use $X$. 
* We want to ask about the probability that a random variable's value lies in some set. Until now, we have used $P$ to denote either a probability or a probability measure. We will draw a distinction between these below, where we will write $\operatorname{Pr}()$ to denote a probability and $P()$ to denote a particular probability measure.
* We will have the need to provide notation for an arbitrary value that a random variable may take on. Such a value will be represented by the lower-case version of the letter used to denote the random variable. In other words, we can ask about the probability that the random variable $X$ takes on a value $x$, which we can write as $\operatorname{P}(X=x)$. In particular, we can define functions this way; for instance, we can create a function of $x$ that returns $\operatorname{Pr}(X=x)$ for each real $x$, like $p_X(x) = \operatorname{Pr}(X=x)$.

````{warning}

Many people learning probability get confused by the notation of using an upper-case letter for a random variable and using a lower-case letter for the value of a random variable. 

**This confusion is especially bad when it comes to handwritten math because many people use very similar letter styles for both the upper-case and lower-case versions of their letters.** 

Here are my recommendations to reduce this confusion:
* Write your random variables in a *san serif* style. Serifs are decorative strokes at the ends of letters, so write your upper-case letters in a block style without serifs.
* Write the lower case versions with curvy *serifs*.

Some examples are below:
```{image} writing-letters.jpg
:alt: Image of handwritten versions of X,Y,Z and x,y,z with upper-case letters drawn without serifs and lower-case letters drawn with curly serifs
:width: 300 px
```

I recommend crossing the stem of the upper-case $Z$ to help distinguish it from the number two.
````

As mentioned above, we wish to be able to evaluate the probabilities that a random variable takes on certain sets of values. There are several immediate problems:
* How can we define a random variable so that we can evaluate the probability that it will take on a set of values in  a way that is consistent with our work on probability.
* As in the case of event classes and probability measures, it turns out that we cannot come up with a rule that will assign a meaningful probability to any arbitrary set of values that a random variable might take on. How can we choose reasonable sets of values to assign probability to, and how can we evaluate those probabilities?

Our approach may seem a bit strange at first, but it defines random variables in a way that directly leverages our work on probability spaces. Suppose we have a probability space $(S, \mathcal{F}, P)$, and we want to use the probability measure $P$ from this probability space to determine the probabilities for a random variable $X$.  Then if $B$ is a set of real values, we want to translate $\operatorname{Pr}(X \in B)$ to some probability $P(E_B)$, where $E_B \in \mathcal{F}$ is an event in our probability space. To achieve this, we must have that the outcomes (which do not have to be numbers) in $E_B$ correspond in some way to the real values in $B$. Since we need to be able to determine what outcomes in $E_B$ correspond to the set of reals $B$, then there must be a **function** that maps from the outcomes in $E_B$ to the real values in $B$.  We define the random variable $X$ as that function: it is a function that maps the outcomes in $S$ to the real line.

Formally, we do not write a random variable as $X$ but instead write it as $X(s)$ to indicate that it is a function of an outcome in the sample space. Note also that the function itself **is not random**. In other words, if we know that some particular value $s_1 \in S$ occurred, then $X(s_1)$ is a deterministic real number.

```{note}
In the discussion below, we get into some very fine details about random variables, and this discussion is more mathematically challenging than is usually included in a book at this level.  I have chosen to include this level of detail for two reasons:
1. This careful approach establishes a solid foundation for further study of probability.
2. This approach will motivate our use of a particular function in working with random variables. 

**Do not be concerned if you do no understand every detail of these arguments: in practice, we will use a very pragmatic approach for working with random variables that does not require us to worry about the following details.**
```

Thinking back to our work on probability spaces, you should recall that we cannot define $P(A)$ for all $A \subset S$ if $S$ is uncountably infinite. We can only define $P(A)$ if $A \in \mathcal{F}$. Thus, we will need to introduce some additional restrictions on $X$ to ensure that the sets of values $B$ for which we define the probability $P(X \in B)$ correspond to events $E_B \in \mathcal{F}$.  There is no way to create a random variable that ensures that this will be true for every set of real values $B$, so we must introduce a restriction on the sets $B_i \subset \mathbb{R}$ for  which we will define $P(X \in B_i)$. 

Let $S_X$ be a set of all of the subsets of $\mathbb{R}$ for which we will define $P(X \in B)$ if $B \in S_X$.  
We want to at least be able to ask questions like what is $\operatorname{Pr}(X=x)$ and what is $\operatorname{Pr} \left\{X \in [a,b)\right\}$, and we can easily see that we would like to create unions of such sets: for instance, if $X$ represents the value on the top-face of a six-sided die, we may want to determine the probability that the dies is even. If $A$ represents the age of people who have been hospitalized for Covid, we may wish to assess the impact on people outside of the usual working age by  determining the probability that $A$ is less than  18 or greater than or equal to 65.  Thus, we can use the following generalization of the intervals:

````{panels}
DEFINITION
^^^
```{glossary}
Borel sets (of $\mathbb{R}$)
    The collection of all countable unions, intersections, and complements of intervals.
```
````

```{index} Borel field
```
```{index} Borel $\sigma$-algebra
```

The collection of Borel sets is called the *Borel $\sigma$-algebra* or the *Borel field* (technically, including the operators $\cup$ and $\cap$). For conciseness, we will use the terminology *Borel field* to refer to the collection of Borel sets of $\mathbb{R}$.  

```{note}
The Borel field for $\mathbb{R}$ can be formed from the countable unions, intersections, and complements of the half-open intervals $(-\infty, x)$ for all $x \in \mathbb{R}$.
```

To be able to define the probability $\operatorname{Pr}(X \in B)$ for every Borel set $B$, we must require that if we collect the set of outcomes that result in $X \in B$, then that set must be an event. In mathematical notation, we require that $\left\{s \left \vert X(s) \in B \right. \right\} \in \mathcal{F}$.  

Putting all of this together, we have our formal definition for a random variable:



````{panels}
DEFINITION
^^^
```{glossary}
random variable
  *(Formal definition)* Let $(S, \mathcal{F}, P)$ be a probability space. Then a random variable is a *function* that maps from the sample space $S$ to the real line $\mathbb{R}$, such that if $B$ is in the Borel field for $\mathbb{R}$, then $\left\{s \left \vert X(s) \in B \right. \right\} \in \mathcal{F}$.
```
````




Note that $\left\{s \left \vert X(s) \in B \right. \right\}$ defines the set of input values that produce a given output, so we can abbreviate this as $X^{-1}(B)$.

Let's investigate this with some simple examples. In each example, we show a graphical depiction of the sample space for an experiment with directed arrows that indicate the mapping from the sample space to values of the random variable, which are shown on a number line.

**Example 1**

Let's create a binary RV from tossing a fair coin. 

<!--The sample space is $S=\{H,T\}$, we can represent H with value 1 in the real line and T with value 0 in the real line. Then we can define the random variable $X$ that is going to represent the outcomes of the experiment "tossing a fair coin": 
$$X(x) = \begin{cases}1, & x=H \\0, & x=T\end{cases}$$ -->

**Example 2**

* Let's create a binary RV from tossing a fair coin twice.

<!--The sample space is $S=\{HH,HT,TH,TT\}$.  We want the RV $Y$ to be binary so the only real value it can take are 0 or 1. Let's consider the case where we **map** the events $\{HH,HT,TH\}$ to 1 and the event $\{TT\}$ to 0. Then we can define the random variable $Y$ as: 
$$Y(y) = \begin{cases}1, & y=\{HH,HT,TH\} \\0, & y=\{TT\}\end{cases}$$ -->

**Example 3**

* Let's create another RV from tossing a fair coin twice.

<!-- The sample space is $S=\{HH,HT,TH,TT\}$. Here we are not told $Z$ can only be binary, then we can **map** each possible event to a different real-value number. So, we can define the random variable $Z$ as: 
$$Z(z) = \begin{cases}1, & z=\{HH\} \\2, & z=\{HT\}\\3, & z=\{TH\}\\4, & z=\{TT\}\end{cases}$$ -->