In [1]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
plt.style.use('ggplot')
%matplotlib inline

## Discrete Distributions

__random variable__ is a variable that gets a numerical value assigned based on the outcome of a chance event, e.g., the monthly energy costs for homeowners in Chicago during the month of January--the number can and does vary.  Additionally, this could be the number of teenagers diagnosed with high blood pressure in a random sample of 100 teenagers in the state of Texas--the number could be any whole number between 0 and 100 and would vary from sample to sample.

If we remember, a __discrete random variable__ is one which actually counts the possible responses, and that a __continuous random variable__ is one that takes on the values in an interval.  Now we classify each of the random variables in the examples above as either "discrete" or "continuous"

+ The energy example is continusous because costs are not something we count; they are not whole numbers.
+ The teenager example is discrete because number is something we can count using whole numbers

We will spend this chapter focusing on discrete random variables.  A large number of applied problems in probability and statistics involve situations where an experiment is performed several times, and in which there are only two possible outcomes on any given __trial__.  

A __trial__ is a word used to describe performing the experiment one time.

### Examples

+ Consider testing the effectiveness of a drug.  Several patients take the drug (the trials) and, for each patient, the drug is either effective or not effective(the two possible outcomes). 
+ Consider the weekly sales of a car salesperson.  The salesperson has several customers during the week (the trials) and, for each customer, the salesperson either makes a sale or does not make a sale (the two possible outcomes)
+ Consider a taste test for colas.  A number of people taste two different colas (the trials) and, for each person, the preference is either for the first cola or the second cola (the two possible outcomes)

__Terminology__
+ __Trial__ describes repitition of the basic experiment
+ __Success__ the outcome we are expecting
+ __Failure__ not the outcome we are expecting

__Notation__
+ S = a success
+ F = a failure
+ p = probability of a success on any trial
+ q = probability of a failure on any trial

$$p = P(S) = the\;probability\;of\;success \\
q = 1 - p = P(F) = the\;probability\;of\;failure$$

Let's take a quick look at the formula for failure:

$$q = 1 - p\;\;because\;\;P(S) + P(F) = 1 \therefore p + q = 1 \therefore q = 1 - p$$

__Bernoulli trials__ are repeated trials--provided each of the following are true:
+ the result of each trial is classified as either the occurence (success) or non-occurence (failure) of a specified event
+ the trials are independent of one another
+ the proability, p, of a success remains the same from trial to trial

### Bottom Line
A __binomial experiment__ consists of running _n_ Bernoulli trials and counting the number of successes.  That is, X is said to be a binomial random variable if X is the number of successes in _n_ Bernoulli trials.

We return once again to the simple experiment of tossing a balanced coin twice and recording the results of each toss--remember our outcomes and associated probabilities.

|?|?|?|?|
|---|---|---|---|
|TT|TH|HT|HH|
|$$\frac{1}{4}$$|$$\frac{1}{4}$$|$$\frac{1}{4}$$|$$\frac{1}{4}$$|

Now, we introduce a __random variable__ into the mix.  This enables us to organize the above table into a different table which displays the information in a more useful form for us.

$$\frac{Define\;the\;random\;variable\;X\;as:}{X = the\;number\;of\;heads}$$

One at a time consider the four outcomes:

+ TT --> X = 0
+ TH --> X = 1
+ HT --> X = 1
+ HH --> X = 2

So, we can redo our table from above:

|X=0|X=1|X=1|X=2|
|---|---|---|---|
|TT|TH|HT|HH|
|$$\frac{1}{4}$$|$$\frac{1}{4}$$|$$\frac{1}{4}$$|$$\frac{1}{4}$$|

Notice that we can do things like

$$P(X=0) = P(TT) = \frac{1}{4} \;\;\;or, \\
P(X=1) = P(TH) + P(HT) =  \frac{1}{4} +  \frac{1}{4} =  \frac{1}{2}$$

Since X=1 corresponds to the two outcomes of TH and HT we are going to create a new table which collapses those two outcomes into one column.  __We create what is called a _Probability Distribution_ for the random variable X__

|X|0|1|2|
|--|--|--|--|
|P(X)|$$\frac{1}{4}$$|$$\frac{1}{2}$$|$$\frac{1}{4}$$|

If we look at the table we can see what the histogram would look like based on the values.

Now we can talk about things like _mean_ and _standard deviation_ of a probability distribution.

### Formulas for mean and standard deviation of a probability distribution

$$\mu = \sum X\;P(X) \\
\sigma^{2} = \sum (X - \mu)^{2}\;P(X)$$

For our table above, this works out like:
$$\mu = \sum X\;P(X) = 0(\frac{1}{4}) + 1(\frac{1}{2}) + 2(\frac{1}{4}) = 1 \\
\mu = 1$$

Now, let's do the set up for the standard deviation:

|$$X$$|$$\;\;\;\;0\;\;\;\;\;\;\;\;$$|$$\;\;\;\;1\;\;\;\;\;\;\;\;$$|$$\;\;\;\;2\;\;\;\;\;\;\;\;$$|
|---|---|---|---|
|$$P(X)$$|$$\frac{1}{4}$$|$$\frac{1}{2}$$|$$\frac{1}{4}$$|
|$$(X-\mu)$$|$$(0-1)$$|$$(1-1)$$|$$(2-1)$$|
|$$(X-\mu)^{2}$$|$$1$$|$$0$$|$$1$$|
|$$(X-\mu)^{2}\;P(X)$$|$$1(\frac{1}{4})$$|$$0(\frac{1}{2})$$|$$1(\frac{1}{4})$$|

We can now solve for the standard derivation

$$\sigma = \sqrt{\sigma}^{2} = \sqrt{0.5} = 0.707$$