### Probability

In [2]:
# Topic: Probability Theory
# Question: Let A and B be events on the same sample space, with P(A) = 0.6 and P(B) = 0.7. Can these two events be disjoint?
# What does disjoint mean? means the two events don't intersect
P_A = 0.6
P_B = 0.7
P_A + P_B

# Answer: No, since the total probability must be 1, and P(A) + P(B) is greater than 1, then they must have some intersection

1.2999999999999998

In [3]:
# Question: Alice has 2 kids and one of them is a girl. What is the probability that the other child is also a girl? You can assume that there are an equal number of males and females in the world.
# Outcome space for two kids: {BB, GG, GB, BG}
# Can only eliminate BB (not BG since the girl can be the first child or the second child) 

from math import factorial
def perm(n, k):
    return factorial(n) / factorial(n-k)
print int(perm(3,1))

# 1 / {GG,GB,BG} = 1/3
# {GG,GB,BG} -> 3_P_1 = 3
# Answer: 1/3

3


In [17]:
# Question: Anita randomly picks 4 cards from a deck of 52-cards and places them back into the deck ( Any set of 4 cards is equally likely ). Then, Babita randomly chooses 8 cards out of the same deck ( Any set of 8 cards is equally likely). Assume that the choice of 4 cards by Anita and the choice of 8 cards by Babita are independent. What is the probability that all 4 cards chosen by Anita are in the set of 8 cards chosen by Babita?

from math import factorial
def comb(n, k):
    return factorial(n) / (factorial(k) * factorial(n-k))

print float(int(comb(52,4)) * int(comb(48,4))) / (int(comb(52,4)) * int(comb(52,8)))

0.000258564964447


In [4]:
# Question: A fair six-sided die is rolled twice. What is the probability of getting 2 on the first roll and not getting 4 on the second roll?
# The two events are independent (the first dice roll is independent of the second dice roll) -> the probabilities can be directly multiplied
# P(dice) = 1/6
# P(2) = 1/6
# P(not 4) = 5/6
# P(get 2 on first and not 4 on second) = P(2) * P(not 4) = 1/6 * 5/6 = 5/36

# Answer: 5/36

In [11]:
# Question: Is this equation correct? P(A U B U C) = P(C) + P(A \cap C^c) + P(B \cap A^c \cap C^c)
# P(A \cap C^c) = P(only A) since P(C^c) is all events other than P(C) and only P(A) intersects that
# P(B \cap A^c \cap C^c) = P(only B) with the same logic as above
# P(C) + P(only B) + P(only A) = P(A U B U C)
# Answer: True

In [12]:
# Question: Consider a tetrahedral die and roll it twice. What is the probability that the number on the first roll is strictly higher than the number on the second roll? A tetrahedral die has only four sides (1, 2, 3 and 4). 
# 4 * 4 = 16 permutations
print int(perm(4,1)) * int(perm(4,1))
# {11,22,33,44} = 4
# {43,42,41,32,31,21} = 6
# {12,13,14,23,24,34} = 6
# make possible permutation table:
# (1,1) ... (4,1)
#  ...       ...
# (1,4) ... (4,4)
# Answer: 6/16 = 3/8

16


### Probability Distributions

In [13]:
# Probability distribution describes specific random processes
# Example: You have a process that behaves like X, what distribution would you use to model that process?
# Examples of random processes that can be modeled:
    # Single coin flip turns out to be heads
    # Number of coin flips out of 100 that turn out of to be heads
    # Number of trials until coin flip turns out to be heads
    # Urn with two types of marbles (red and green) and draw marbles without replacement
    # Number of taxis passing a street corner in a given hour (on avg 10/hr)
    # Rolling a single fair die
    # IQ score
    # Time until taxi will pass street corner
    # Useful in estimating success
    # Time until n events in a process with no memory
    # Useful for Goodness of Fit tests
    # Useful for null distribution of a test statistic, analysis of variance (ANOVA) or F-test

In [14]:
from math import factorial
# number of coin flips out of 2 (or n) that turn out of to be heads
# models the binomial distribution (P(X=k)), expectation: np
# {0,0} = zero heads = 1/4
# {1,0} = one head = 1/4
# {0,1} = one head = 1/4
# {1,1} = two heads = 1/4
# O = {(0,0),(1,0),(0,1),(1,1)}
n = 2
p = .5
for k in xrange(n+1):
    c = factorial(n) / (factorial(k)*factorial(n-k))
    print "probability of {k} heads in {n} coin flips:".format(k=k,n=n), c * p**k * (1-p)**(n-k)
# get the expectation of number of coins out of 2 that turn out to be heads
E_X = 0*.25 + 1*.5 + 2*.25
print "expected value:", E_X
print "binomial expectation:", n*p
# get variance, variance: np(1-p)
print "variance:", ((0-E_X)**2) * .25 + ((1-E_X)**2) * .5 + ((2-E_X)**2) * .25
print "binomial variance:", n*p*(1-p)

probability of 0 heads in 2 coin flips: 0.25
probability of 1 heads in 2 coin flips: 0.5
probability of 2 heads in 2 coin flips: 0.25
expected value: 1.0
binomial expectation: 1.0
variance: 0.5
binomial variance: 0.5


### CLT and Hypothesis Testing

In [None]:
# Hypothesis testing will either be about mapping certain scenarios to an appropriate test, or about elaborating on some of the key ideas of hypothesis tesing: p-values, standard errors, etc.

# What is the CLT and how does it apply?
# Central limit theorem is the theory that if we draw enough i.i.d. samples from the population distribution and use the mean of those samples, it will result in an approximately normal distribution

# What are the steps for Hypothesis Testing?
# 1. Define the null and alternative hypothesis
# 2. Choose confidence level (95%) or significance level (5%)
    # a 95% confidence level implies that 95% of the confidence intervals would include the true population parameter (e.g. if there were 100 samples, 95 of them would contain the true population parameter)
# 3. Select test statistic
    # 2 means or less: z (n >= 30) else t
    # 3 means or more: F test
    # non-param: Chi-square
# 4. Calculate the p-value
    # given that the null hypothesis true, the probability that we observe the a more extreme test statistic in the direction of the alternative hypothesis

### Bayesian Statistics

### Randomization and Inference

In [15]:
# How do we setup an A/B Test properly?
# 1. Define your metric that measures the outcome of the treatment
# 2. Randomize group into equal sizes (e.g. one treatment group and one control group)
# 3. Limit the variance of our metric (minimize any confounding factors that also effect our metric)
    # e.g. Seasonal periods, intrinsic characteristics of users, etc.
# 4. Run the test for the full duration
# 5. Evaluate the results
    # Maximize effect of the treatment by increasing the effect size of the treatment

### Prediction and Machine Learning

### Time Series