In [1]:
from IPython.display import display, Math, Latex
from enum import Enum
# useful: https://archive.ph/eqE2W

In [18]:
%%latex
\begin{aligned}
&\text{Begin by thinking about how you many ways you can select} \hspace{.1cm} 
    \textit{k} \hspace{.1cm}  \text{items from a set of} \hspace{.1cm} \textit{n} \hspace{.1cm}  \text{when you can replace them
     and you care about the order.}                                                                                                   
\\ &\text{A paradigm case is flipping a fair coin: you can get the same outcome (H or T) more than once.}
\\ &\text{Here, the number of outcomes at each stage is two, so the total sample space is given by} 
    \hspace{.1cm} 2^5 = 32. 
\\ &\text{Then,} \hspace{.1cm} \mathbb{P}(H T T H H)\text{, e.g., would be} \hspace{.1cm}  0.5^5 = 0.03125. 
    \\ & \hspace{0.5cm} \Rightarrow \text{Another way to see this is that our particular outcome is one 
    of the 32;} \hspace{.1cm} 1/32 = 0.03125.
\\[10pt] &\text{Now, consider a case where you} \hspace{0.1cm} \textbf{cannot} \hspace{0.1cm}  \text{get the same outcome more than once
    (you still care about order in this scenario).}
\\ &\text{A paradigm case is selecting, say, six billiard balls (a standard set has 16). How many ways can we do this?}
\\ &\text{For the first choice, we have 16 options; for the second, 15; the third, 14; and, so on. How can we 
        express this compactly?}
\\ & 16 \cdot 15 \cdot 14 \cdot 13 \cdot 12 \cdot 11 = 
    \frac{16 \cdot 15 \cdot 14 \cdot 13 \cdot 12 \cdot 11 \cdot 10 \cdot 9 \cdot ...}{10 \cdot 9 \cdot ...}
\\ & = \frac{16!}{10!} = \frac{n!}{(n-k)!} \text{where "n!" indicates} \prod_{i=1}^{n} i

\\[10pt] &\text{Now, consider a case where you} \hspace{0.1cm} \textbf{do not} \hspace{0.1cm}  \text{care about order.}
\\ &\text{In the billiard ball case, for example, we don't care know if we have the set {16, 15, 14} or the set {14, 15, 16}}.
\\ &\text{A helpful trick in these situations is to think about the concept of overcounting: does a way of counting
        that we've already} \cr & \hspace{.5cm} \text{learned somehow count what we want but overshoot it by a known factor?}
\\ &\text{Think about the billiard ball example. For each unique order of the billiard balls, we have overcounted by a known factor.}
\\ &\text{What is that factor? It would be the number of ways to arrange a given set of k chosen items, i.e., k!}
\\ &\text{So, the number of possible combination of size k from a set of n is} \hspace{.1cm} \frac{n!}{(n-k)! k!}
     \\ & \hspace{0.5cm} \Rightarrow \text{Notably, we get the same answer if we select not k items but (n-k) items.}
     \\ & \hspace{1cm} \Rightarrow \text{There is both a conceptual and a very simple mathematical proof, left as an exercise 
        (hint: just plug in (n-k) in place of k).}
\\ &\text{This is so important that we have special notation for this} \hspace{.1cm}  \textbf{binomial coefficient:} 
          \hspace{.1cm} {n \choose k}
\\ & \hspace{0.5cm} \Rightarrow \text{Remarkably, the rows of Pascal's triangle are binomial coefficients, among other things.} 
\\[10pt] &\text{Now, we need to derive the binomial distribution. The first step is to discuss the formula for binomial expansion.}
\\ &\text{In high school, you may have learned about "FOILing" or the "box-and-diamond" method for expanding} \hspace{.1cm} (X+Y)^2
\\ &\text{The rule: take each element of one binomial term (X+Y) and pair it with every element 
          from the other binomial term (see accompanying graphic).}
\\ &\text{We generalize this to the following algorithm: find every unique pairing of elements from each term.}
\\ &\text{This obviously leads to many duplicate terms; the question is "how many of each duplicate?" for an expansion} \hspace{.1cm} (X+Y)^n...
\\ &\text{...and the answer is ...} {n \choose k} \text{satisfyingly!}
     \\ & \hspace{0.5cm} \Rightarrow \text{The logic: each term represents a selection of} \hspace{.1cm} \textit{k X}\text{s} \hspace{.1cm} 
    \text{and} \hspace{.1cm} \textit{n-k Y}\text{s since there are n buckets from which to select an X or a Y}.
\\ &\text{This leads to the proper "Binomial Theorem"}: (X+Y)^n = \sum_{k=0}^n {n \choose k} X^{n-k}Y^k
\\ &\text{The final step, then, is just to think about how this applies to the selection of k successes from n trials.}
\end{aligned}

<IPython.core.display.Latex object>

In [15]:
%%latex
\begin{aligned}
&\text{First, define} \hspace{.1cm} \prod_ \hspace{.1cm} 
    \text{as the "product of a sequence" operator (much like} \hspace{.05cm} 
    \Sigma \hspace{.05cm} \text{is for addition.)} \cr
&\prod_{i=1}^{n} X_i \hspace{.05cm} \text{would mean "multiply together every number from} 
    \hspace{.05cm} X_1 \hspace{.05cm} \text{to} \hspace{.05cm} X_n". \cr
&\text{So, formally, n! just means...} \cr
&\prod_{i=1}^{n} i
\end{aligned}

<IPython.core.display.Latex object>

In [2]:
%%latex
\begin{aligned}
\mathbb{P}(X = k) = {n \choose k} p^{k}(1-p)^{n-k} \cr
\mathbb{P}(X = k) = {10 \choose k} 0.7^{k}(0.3)^{n-k} \cr
\mathbb{P}(X = k) = {10 \choose 7} 0.7^{7}(0.3)^{3}
\end{aligned}

<IPython.core.display.Latex object>

In [8]:
%%latex
\begin{array}{11}
&\text{X ~ B(n, p)} \cr
&\mathbb{E}(X) = \sum_{k=0}^n x_k p(X=k) \hspace{.2cm} \text{where k are the possible outcomes} \cr
&\text{Our next step is to expand} \hspace{.1cm}  k \cdot {n \choose k} \cr
&\ k \cdot {n \choose k} = \frac{k \cdot n!}{(n-k)! k!} \cr
&\ = \frac{n!}{(k-1)! \cdot (n-k)!} \cr
\\[1pt] &= \frac{n \cdot (n-1)!}{(k-1)! \cdot (n-k)!} = n \cdot {n-1 \choose k-1}  &\text{Note that} {n-1 \choose k-1} = \frac{(n-1)!}{(n-1-k+1)!(k-1)!} = \frac{(n-1)!}{(n-k)!(k-1)!} \cr\cr
\end{array}

<IPython.core.display.Latex object>

In [11]:
%%latex
\begin{aligned}
&\text{X ~ B(n, p)} \cr
&\mathbb{E}(X) = \sum_{k=0}^n x_k p(X=k) \hspace{.2cm} \text{where k are the possible outcomes} \cr
&\text{We now recall that} \hspace{.1cm} p(X=k) = \sum_{k=0}^n {n \choose k} p^{k}(1-p)^{n-k} \cr
&\mathbb{E}(X) = \sum_{k=0}^n k \cdot {n \choose k} p^{k}(1-p)^{n-k} \cr 
&= \sum_{k=0}^n n \cdot {n-1 \choose k-1} p^{k}(1-p)^{n-k} \cr 
&= n \cdot \sum_{k=0}^n {n-1 \choose k-1} p^{k}(1-p)^{n-k} \cr 
&\text{The next part is a bit tricky: we modify our index to run from 0 to n-1.} \cr
&\text{To make sure that we get the same result, we add 1 to k, starting the index one earlier.} \cr
&= n \cdot \sum_{k=0}^{n-1} {n-1 \choose k} p^{k+1}(1-p)^{n-k-1} \cr 
&= n \cdot p \sum_{k=0}^{n-1} {n-1 \choose k} p^{k}(1-p)^{n-k-1} \hspace{.2cm} \text{...factor out a p} \cr 
&\text{Finally, by the binomial theorem} \cr 
& [p + (1-p)]^{n-1} = \sum_{k=0}^{n-1} {n-1 \choose k} p^{k}(1-p)^{n-k-1}  \hspace{.2cm} \text{...so...} \cr
&= n \cdot p [p + (1-p)]^{n-1} \hspace{.2cm} \text{and since p + 1-p = 1} \cr
&= n \cdot p \hspace{.5cm} QED. 
\end{aligned}

<IPython.core.display.Latex object>

In [47]:
%%latex
\begin{aligned}
\mathbb{V}(\sum_{i=1}^n X_i) &= \mathbb{V}(X_1 + X_2 + ... X_n) \cr
&= [(X_1 + X_2 + ... X_n)-(\overline{X_1+X_2+X_n})]^2 \cr
&\text{Now, by linearity of a mean when the sample size is the same for all variables...} \cr 
&= [(X_1 + X_2 + ... X_n)-(\overline{X_1}+\overline{X_2}+...\overline{X_n})]^2 \cr
&= [(X_1-\overline{X_1})+(X_2 - \overline{X_2}) +... (X_n -\overline{X_n})]^2 \cr
&\text{Picture this as a field, gridded by the deviations on length and width.} \cr 
&\text{Then, to take the area, simply multiply every column by all the rows (changing the metaphor a bit.)} \cr 
&\text{For the first X variable, this will look like the following; then we'll generalize.} \cr 
&= [(X_1-\overline{X_1})^2 + (X_1-\overline{X_1})(X_2 - \overline{X_2}) +... (X_1-\overline{X_1})(X_n -\overline{X_n})] \cr
&\text{So, the pattern for one X is its variance plus its covariance with all other variables.} \cr 
&\text{Doing this will count each covariance twice (prove this to yourself by considering what happens
    when we move to} \hspace{.1cm} X_2). \cr 
&\text{Generalizing, we have...} \cr 
& \mathbb{V}(X_1 + X_2 + ... X_n) = \sum_{i =1}^{n} \sum_{j =1}^{n} \mathbb{V}(X_i, X_j) \cr
&\text{The pattern in simpler terms is ...} \cr
& \mathbb{V}(X_1 + X_2 + ... X_n) = \sum_{i =1}^{n} \mathbb{V}(X_i) + 2 \sum_{i \leq i < j \leq n}^{n} \mathbb{V}(X_i, X_j) \cr
&\text{That complicated index is a fancy way of saying "start with i = 1. As long as j is greater than i, add up the covariances.} \cr
&\text{Then, pass through more values of i once you get to i + 1 = j (when i = j, it's a variance and we already counted those).} \cr 
&\text{Each time, the number of j greater than i falls, which reflects the fact that we are moving along columns of the covariance matrix...} \cr
&\text{...which I somewhat sneakily asked you to envision above with the "field" simile, without telling you}. \cr
&\text{Logically, as we go along, the number of entries below the main diagonal falls, i.e. i and j get closer.} \cr
&\text{We multiply by two because the upper triangle of the matrix is symmetrical to the lower half.} \cr 
&\text{Finally, and crucially, if the covariance is zero, e.g. if the Xs are IID random variables, as would be...} \cr
&\text{...say, people in a sample, the variance of X is just the sum of the individual RVs.} \cr
\end{aligned}

<IPython.core.display.Latex object>

In [21]:
%%latex
\begin{aligned}
\mathbb{P}(X > 450) = \sum_{k=451}^{1175} {1175 \choose k} 0.37^{k}(0.63)^{n-k} \cr
\end{aligned}

<IPython.core.display.Latex object>

In [1]:
%%latex
\begin{aligned}
& \mathbb{P}(X = k) = {n \choose k} \theta^{k}(1-\theta)^{n-k}, \hspace{.1cm} n = \text{sample size}; \theta = 
    \text{probability of success on one trial}; k = \text{number of successes} \cr
& \mathbb{P}(X = k) = 
    \underbrace{{n \choose k}}_\text{number of distinct groups of size k from set of size n} \cdot
    \overbrace{\theta^{k}}^\text{probability of k successes} \cdot
    \underbrace{(1-\theta)^{n-k}}^\text{probability of n-k failures}
\end{aligned}

<IPython.core.display.Latex object>