<p>Remember that a <a href="probability.html">Random Variable</a> is a mapping $ X: \Omega \rightarrow \mathbb{R}$ that assigns a real number $X(\omega)$ to each outcome $\omega$ in a sample space $\Omega$.  The definitions below are taken from Larry Wasserman’s All of Statistics.</p>

<h2 id="cumulative-distribution-function">Cumulative distribution Function</h2>

<p>The <strong>cumulative distribution function</strong>, or the <strong>CDF</strong>, is a function</p>

<p>$$F_X : \mathbb{R} → [0, 1]$$,</p>

<p>defined by</p>

$$F_X (x) = p(X \le x).$$

<p>A note on notation: $X$ is a random variable while $x$ is a particular value of the random variable.</p>

<p>Let $X$ be the random variable representing the number of heads in two coin tosses. Then $x$ can take on values 0, 1 and 2. The CDF for this random variable can be drawn thus (taken from All of Stats):</p>

<p><img src="../images/2tosscdf.png" alt="" /></p>

<p>Notice that this function is right-continuous and defined for all $x$, even if $x $does not take real values in-between the integers.</p>

<h2 id="probability-mass-and-distribution-function">Probability Mass and Distribution Function</h2>

<p>$X$ is called a <strong>discrete random variable</strong> if it takes countably many values ${x_1, x_2,…}$. We define the <strong>probability function</strong> or the <strong>probability mass function</strong> (<strong>pmf</strong>) for X by:</p>

$$f_X(x) = p(X=x)$$

<p>$f_X$ <strong>is a probability</strong>.</p>

<p>The pmf for the number of heads in two coin tosses (taken from All of Stats) looks like this:</p>

<p><img src="../images/2tosspmf.png" alt="" /></p>

<p>On the other hand, a random variable is called a <strong>continuous random variable</strong> if there exists a function $f_X$ such that $f_X (x) \ge 0$ for all x,  $\int_{-\infty}^{\infty} f_X (x) dx = 1$ and for every a ≤ b,</p>

$$
p(a < X < b) = \int_{a}^{b} f_X (x) dx $$

<p>The function $f_X$ is called the probability density function (pdf). We have the CDF:</p>

$$F_X (x) = \int_{-\infty}^{x}f_X (t) dt$$

<p>and $f_X (x) = \frac{d F_X (x)}{dx}$ at all points x at which $F_X$ is differentiable.</p>

<p>Continuous variables are confusing. Note:</p>

<ol>
  <li>$p(X=x) = 0$ for every $x$. You <strong>cant think</strong> of $f_X(x)$ as $p(X=x)$. This holds only for discretes. You can only get probabilities from a pdf by integrating, if only over a very small paty of the space.</li>
  <li>A pdf can be bigger than 1 unlike a probability mass function, since probability masses represent actual probabilities.</li>
</ol>

<h3 id="a-continuous-example-the-uniform-distribution">A continuous example: the Uniform Distribution</h3>

<p>Suppose that X has pdf
$$
f_X (x) =
\begin{cases}
1 & \text{for } 0 \leq x\leq 1\\
    0             & \text{otherwise.}
\end{cases} $$
A random variable with this density is said to have a Uniform (0,1) distribution. This is meant to capture the idea of choosing a point at random between 0 and 1. The cdf is given by:

$$
F_X (x) =
\begin{cases}
0 & x \le 0\\
x & 0 \leq x \leq 1\\
1 & x > 1.
\end{cases} $$
and can be visualized as so (again from All of Stats):</p>

<p><img src="../images/unicdf.png" alt="" /></p>

<h3 id="a-discrete-example-the-bernoulli-distribution">A discrete example: the Bernoulli Distribution</h3>

<p>The <strong>Bernoulli Distribution</strong> represents the distribution a coin flip. Let the random variable $X$ represent such a coin flip, where $X=1$ is heads, and $X=0$ is tails. Let us further say that the probability of heads is $p$ ($p=0.5$ is a fair coin).</p>

<p>We then say:</p>

$$X \sim Bernoulli(p)$$

<p>which is to be read as $X$ <strong>has distribution</strong> $Bernoulli(p)$. The pmf or probability function associated with the Bernoulli distribution is

$$
f(x) =
\begin{cases}
1 - p & x = 0\\
p & x = 1.
\end{cases}$$

<p>for p in the range 0 to 1. This pmf may  be written as</p>

$$f(x) = p^x (1-p)^{1-x}$$

<p>for x in the set {0,1}.</p>

<p>$p$ is called a parameter of the Bernoulli distribution.</p>

<h2 id="conditional-and-marginal-distributions">Conditional and Marginal Distributions</h2>

<p>Marginal mass functions are defined in analog to <a href="probability.html">probabilities</a>. Thus:</p>

$$f_X(x) = p(X=x) =  \sum_y f(x, y);\,\, f_Y(y) = p(Y=y) = \sum_x f(x,y)$$

<p>Similarly, marginal densities are defined using integrals:</p>

$$f_X(x) = \int dy f(x,y);\,\, f_Y(y) = \int dx f(x,y)$$

<p>Notice there is no interpretation of the marginal densities in the continuous case as probabilities. An example here if $f(x,y) = e^{-(x+y)}$ defined on the positive quadrant. The marginal is an exponential defined on the positive part of the line.</p>

<p>Conditional mass function is similarly, just a conditional probability. So:</p>

$$f_{X \mid Y}(x \mid y) = p(X=x \mid Y=y) = \frac{p(X=x, Y=y)}{p(Y=y)} = \frac{f_{XY}(x,y)}{f_Y(y)}$$

<p>The similar formula for continuous densities might be suspected to a bit more complex, because we are conditioning on the event $Y=y$ which strictly speaking has 0 probability. But it can be proved that the same formula holds for densities with some additional requirements and interpretation:</p>

$$f_{X \mid Y}(x \mid y)  = \frac{f_{XY}(x,y)}{f_Y(y)}$$

<p>where we must assume that $f_Y(y) \gt 0$. Then we have the interpretation that for some event A:</p>

$$p(X \in A \mid Y=y) = \int_{x \in A} f_{X \mid Y}(x,y) dx$$

<p>An example of this is the uniform distribution on the unit square. Suppose then that $y=0.3$. Then the conditional density is a uniform density on the line between 0 and 1 at $y=0.3$.</p>