# Probability &mdash; Distributions

This notebook is a collection of probability problems that can be solved using discrete and continuous probability distributions.

You should review problems in sides:

 * [Discrete random variables](https://kmurphy.bitbucket.io/modules/Computational_Physics/topics/02-Computational_Problems_involving_Probability/index.html#10-Discrete_Random_Variables)
 * [Continuous random variables](https://kmurphy.bitbucket.io/modules/Computational_Physics/topics/02-Computational_Problems_involving_Probability/index.html#20-Continuous_Random_Variables)


In [1]:
import numpy as np
import matplotlib.pyplot as plt
plt.style.use("seaborn-darkgrid")

from scipy import stats

## Discrete Distributions

---
### Question 1

Suppose it takes at least 9 votes from a 12-member jury to convict a defendant. Every defendant is either guilty or innocent. If the defendant is guilty, then every juror has a probability of 0.75 of casting a "guilty" vote. If the defendant is innocent, then every juror has a probability of 0.15 of casting a "guilty" vote. Each juror acts independently.

 __(a)__ If a defendant is actually guilty, find the probability that the jury renders the correct decision.  

 __(b)__ If a defendant is actually innocent, find the probability that the jury renders the correct decision.  
 
 __(c)__ If 70% of defendants are actually guilty, find the probability that the jury renders a correct decision.  
  
 __(d)__ Determine the percentage of defendants found guilty by the jury.
    

**(a) defendant is guilty ... probability ... the correct decision?**

**(b) defendant is innocent ... probability ... the correct decision?**

**(c) If 70% of defendants are guilty ... probability ... correct decision**

**(d) Determine the percentage of defendants found guilty by the jury**

---
### Question 2

Lyft line gets 2 requests per 5 minutes, on average, for a particular route. A user requests the route and Lyft commits a car to take her. All users who request the route in the next five minutes will be added to the car &mdash; as long as the car has space. The car can fit up to three users. Lyft will make 6 euro for each user in the car (the revenue) minus 7 euro (the operating cost). How much does Lyft expect to make from this trip?

 __(a)__ How much does Lyft expect to make from this trip?
 
 __(b)__ Lyft has one space left in the car and wants to wait to get another user. What is the probability that another user will make a request in the next 15 seconds?

**(a) How much does Lyft expect to make from this trip?**

**(b) ...probability that another user will make a request in the next 15 seconds?**

---
### Question 3

When sending binary data to satellites (or really over any noisy channel) the bits can be flipped with high probabilities. In 1947 Richard Hamming developed a system to more reliably send data. By using Error Correcting Hamming Codes, you can send a stream of 4 bits with 3 redundant bits. If zero or one of the seven bits are corrupted, using error correcting codes, a receiver can identify the original 4 bits.
Lets consider the case of sending a signal to a satellite where each bit is independently flipped with probability $p = 0.1$.


__(a)__ If you send 4 bits, what is the probability that the correct message was received (i.e. none of the bits are flipped).

__(b)__ If you send 4 bits, with 3 Hamming error correcting bits, what is the probability that a correctable message was received?

__(c)__ Instead of using Hamming codes, you decide to send 5 copies of each of the four bits. If for every single bit, more than half of the copies are not flipped, the signal will be correctable. What is the probability that a correctable message was received?

Note: 
You should find probability of a correctable message in __(c)__ is higher than in __(b)__ but in __(c)__ the number of bits used, 20, is nearly three times the 7 bits used in __(b)__.

**(a) ... send 4 bits ... probability that the correct message was received**

**(b) ... send 4 bits, with 3 Hamming error correcting bits ... probability ... correctable message**

**(c) ... send 5 copies of each of the four bits ... probability ... correctable message**

## Continuous Distributions

---
### Question 4

Let $X$ be a continuous random variable with probability density function:
$$
f(x) = \left\{\begin{array}{cc}
    c\left(2-2 x^{2}\right) & -1<x<1 \\
    0 & \text { otherwise }
    \end{array}\right.
$$

__(a)__ What is the value of $c$?
 
__(b)__ Generate a graph of the PDF and the CDF of $X$?
 
__(c)__ What is $\mathrm{E}[X]$?
 
__(d)__ What is $\mathrm{Var}[X]$?

**(a) What is the value of $c$?**

**(b) Generate a graph of the PDF and the CDF of $X$?**

**(c) What is $\mathrm{E}[X]$?**

**(d) What is $\mathrm{Var}[X]$?**

### Question 5

On average, visitors leave your website after 5 minutes. Assume that the length of stay is exponentially distributed. What is the probability that a user stays more than 10 minutes?

---
### Question 6

Your website has 100 users and each day each user independently has a 20% chance of logging into your website. 

__(a)__ Calculate the probability that more than 21 users log in.

__(b)__ Use a normal approximation to estimate the probability that more than 21 users log in.

**(a) Calculate the probability that more than 21 users log in.**



**(b) Use a normal approximation to estimate the probability that more than 21 users log in.**

---
### Question 7

You are testing software and discover that your program has a non-deterministic bug that causes catastrophic failure (such bugs are often called __hindenbug__). Your program was tested for 400 hours and the bug occurred twice.

__(a)__ Each user uses your program to complete a three hour long task. If the bug manifests they will immediately stop their work. What is the probability that the bug manifests for a given user?

__(b)__ Your program is used by one million users, each for the three hour long task. Use a normal approximation to the binomial to estimate the probability that more than 15,000 users experience the bug. Use your answer from part (a) for parameter $p$.

    

**(a) ... a three hour long task ... probability that the bug occurs**

**(b) ... used by one million users ... the probability that more than 15,000 users experience the bug.**

---
### Question 8
 
Two pseudo random number generators are used to simulate a sequence 300 independent flips of a fair coin (T means a tails was flipped, H means a head was flipped). Below are the two sequences (from the two random generators). 

Construct a test to determine which one is a better random generator? 

Sequence a:

    TTHHTHTTHTTTHTTTHTTTHTTHTHHTHHTHTHHTTTHHTHTHTTHTHHTTHTHHT
    HTTTHHTTHHTTHHHTHHTHTTHTHTTHHTHHHTTHTHTTTHHTTHTHTHTHTHTT
    HTHTHHHTTHTHTHHTHHHTHTHTTHTTHHTHTHTHTTHHTTHTHTTHHHTHTHTH
    TTHTTHHTTHTHHTHHHTTHHTHTTHTHTHTHTHTHTHHHTHTHTHTTHTHHTHTH
    TTHTTTHHTHTTTHTHHTHHHHTTTHHTHTHTHTHHHTTHHTHTTTHTHHTHTHTH
    HTHTTHTTHTHHTHTHTTT

Sequence b:

    HTHHHTHTTHHTTTTTTTTHHHTTTHHTTTTHHTTHHHTTHTHTTTTTTHTHTTTTH
    HHHTHTHTTHTTTHTTHTTTTHTHHTHHHHTTTTTHHHHTHHHTTTTHTHTTHHHH
    THHHHHHHHTTHHTHHTHHHHHHHTTHTHTTTHHTTTTHTHHTTHTTHTHTHTTHH
    HHHTTHTTTHTHTHHTTTTHTTTTTHHTHTHHHHTTTTHTHHHHHHTHTHTHTHHH
    THTTHHHTHHHHHHTHHHTHTTTHHHTTTHHTHTTHHTHHHTHTTHTTHTTTHHTH
    THTTTTHTHTHTTHTHTHT
    
The sequences are converted to integer arrays to simplify analysis in the following cell. 

In [2]:
a = "TTHHTHTTHTTTHTTTHTTTHTTHTHHTHHTHTHHTTTHHTHTHTTHTHHTTHTHHTHTTTHHTTHHTTHHHTHHTHTTHTHTTHHTHHHTTHTHTTTHHTTHTHTHTHTHTTHTHTHHHTTHTHTHHTHHHTHTHTTHTTHHTHTHTHTTHHTTHTHTTHHHTHTHTHTTHTTHHTTHTHHTHHHTTHHTHTTHTHTHTHTHTHTHHHTHTHTHTTHTHHTHTHTTHTTTHHTHTTTHTHHTHHHHTTTHHTHTHTHTHHHTTHHTHTTTHTHHTHTHTHHTHTTHTTHTHHTHTHTTT"
b = "HTHHHTHTTHHTTTTTTTTHHHTTTHHTTTTHHTTHHHTTHTHTTTTTTHTHTTTTHHHHTHTHTTHTTTHTTHTTTTHTHHTHHHHTTTTTHHHHTHHHTTTTHTHTTHHHHTHHHHHHHHTTHHTHHTHHHHHHHTTHTHTTTHHTTTTHTHHTTHTTHTHTHTTHHHHHTTHTTTHTHTHHTTTTHTTTTTHHTHTHHHHTTTTHTHHHHHHTHTHTHTHHHTHTTHHHTHHHHHHTHHHTHTTTHHHTTTHHTHTTHHTHHHTHTTHTTHTTTHHTHTHTTTTHTHTHTTHTHTHT"
a = np.array(list(a.replace("H", "0").replace("T", "1")), dtype=int)
b = np.array(list(b.replace("H", "0").replace("T", "1")), dtype=int)