# Random Variables {#sec-random-variables}



## Overview

The previous sections introduced some core concepts from probability theory such as the Bayes' rule and the
law of total probability. However, Statistics and machine learning are concerned with data. We want to be able
to link sample spaces and events to data. 
In this chapter, we will introduce the concept of a <a href="https://en.wikipedia.org/wiki/Random_variable">random variable</a> that allows us to link between data and sample spaces. 

A random variable is a function of an outcome and therefore is depends on chance. Being a function it has a domain that it is defined; this is the sample space $\Omega$. 
There are two types of random variables we will deal with herein, discrete and continuous variables. 

## Random Variables

A random variable is a function of an outcome and therefore is depends on chance. Being a function it has a domain that it is defined; this is the sample space $\Omega$. 
The range of a random variable is finite or countable. This means that their values can be listed or arranged in a sequence. Specifically, we have the following definition [1].


----
**Definition: Random Variable**

A random variable is a mapping



$$X: \Omega \rightarrow R$$

that assigns a real number to each outcome $\omega \in \Omega$. We denote this outcome with $X(\omega)$


----

We will distinguish below between discrete and continuous random variables.

### Discrete random variables

We have the following definition for a discrete variable [1]

----
**Definition: Discrete Random Variable**

A ranodm variable $X$ is called discrete if it takes countably many values i.e the random variable can take values
from a countable set $\{x_1, x_2, \dots \}$. The probability function also referred as probability mass function or PMF for short is given by


\begin{equation}
f_{X}(x) = P(X = x)
\end{equation}



----


Given that $f_X$ is a probability function, then it has to satisfy the following:

- $f_X(x) \geq 0$
- $\sum_i f_X(x_i) = 1$


It is important to realize that for every outcome $\omega$ the variable $X$ takes one and only one value $x$. 
Thus the events $\{X=x\}$ are disjoint and exhaustive.

---

**Remark** 

Discrete
variables don’t have to be integers. For example, the proportion of defective components in
a lot of 100 can be 0, 1/100, 2/100, ..., 99/100, or 1. This variable assumes 101 different
values, so it is discrete, although not an integer.

---

### Continuous random variables

The second type of a random variable we are interested in, is a continuous random variable. We have the following definition [1]


----
**Definition: Continous Random Variable**

A random variable $X$ is called continuous if there exists a function $f_X$ that has the following two properties


- $f_X(x) \geq 0$
- $\int_{-\infty}^{\infty} f_X(x) dx= 1$


These are analogous to the properties we saw for the PMF above. In addition for every $a \leq b$

\begin{equation}
P(a < X < b) = \int_{a}^{b} f_X(x) dx 
\end{equation}

The function $f_X$ is not called the probability density function or PDF for short

----



A continuous random variable assumes values in a whole interval. The latter can be a bounded interval e.g. $(a, b)$ or unbounded $(a, +\infty)$ or $(-\infty, b)$ or a union of such intervals. The important point here is that intervals are uncountable which means that contrary to the discrete case, we cannot list the values of the random variable.
The latter implies that for a continuous random variable the probability of getting an exact value will always be zero i.e.


$$P(X=x)=0$$


## Summary

## References

1. Larry Wasserman, _All of Statistics. A Concise Course in Statistical Inference_, Springer 2003.
2. <a href="https://en.wikipedia.org/wiki/Random_variable">Random variable</a>