# Modeling 1

In [10]:
import numpy as np

## Predicting Scores

In [11]:
scores = [89, 92, 78, 94, 88]

We want to model this data using a Normal (Gaussian) distribution. This distribution takes two parameters: $\mu$ and $\sigma^2$.

Let's come up with some values for these parameters based on our data.

In [12]:
mu = np.mean(scores)
mu

88.2

In [13]:
sigmaSq = np.var(scores)
sigmaSq

30.559999999999995

what happens if we sample from a Normal distribution with mean mu and variance sigmaSq?

In [14]:
np.random.normal(mu, sigmaSq)

79.45314180111733

In [20]:
mySum = 0
for i in range(1000):
    mySum = mySum + np.random.normal(mu, sigmaSq)
mySum/1000

87.70659757379165

We want to predict the next 5 scores. So, we sample 5 times from this distribution.

In [37]:
numSamples = 5
next5 = 0
for i in range(numSamples):
    next5 = next5 + np.random.normal(mu, sigmaSq)
next5

458.9406710867733

In [38]:
next5/numSamples

91.78813421735467

We have a mean of 88.2 on the first 5 scores

We expect a mean of 88.2 on the next 5 scores

This gives us a total of $88.2 * 10 = 882$ for the expected sum on the mean of all the scores

95% confidence on **sum**: $882 \pm 2 \times 12.36 = 882 \pm 24.7$

Where does the $\pm 2 \times 12.36$ come from? 

<br><br><br><br>
<br><br><br><br>




It's the standard error for the **sum** of the samples.

In [39]:
np.sqrt(numSamples) * np.sqrt(sigmaSq)

12.361229712289955

Divide by the number of samples to get the range for a single predicted score

In [40]:
np.sqrt(numSamples) * np.sqrt(sigmaSq) / numSamples

2.472245942457991

The 95% Confidence Interval is the sample value $\pm$  2 standard errors

In [31]:
12.36/5

2.472

## Assignment Turn-in

Model using the Exponetial Distribution
https://en.wikipedia.org/wiki/Exponential_distribution

In [42]:
turnIn100 = [18,22,45,49,86]
turnIn100

[18, 22, 45, 49, 86]

Know mean of exponential is $\lambda^{-1}$

In [43]:
np.mean(turnIn100)

44.0

In [45]:
lambdaInv = 1/np.mean(turnIn100)
lambdaInv

0.022727272727272728

The CDF for the Exponential Distribution is 	

$1 − e^{−\lambda x}$

We want to evaluate our probabilty at 1 hour before the due date, so 





In [49]:
x = 167 - 100
x

67

Plugging in our values:


In [50]:
1 - np.exp(-lambdaInv * x)

0.781883787359828

The probability of each remaining person turning in the assignment by the deadline is 0.781

### Applying the model

In [58]:
b = np.random.binomial(1, 0.781)
b

0

The Binomial Probability Mass Function takes, as parameters:

k = number of successes

n = number of trials

p = probability of success


In [74]:
from scipy.stats import binom
p = 0.781

Pr(N = 5): All 10 turn in the assignment on time

In [71]:
binom.pmf(5,5,p)

0.29057294120790106

Pr(N ≥ 4): Probability 9+ turn in the assignment on time.

Recall that the CDF tells us $\le$

To get $\ge$, subtract from 1

In [73]:
1 - binom.cdf(3, 5, p)

0.6979703427733961

Pr(N ≥ 3): Probability 8+ turn in the assignment on time.


In [75]:
1 - binom.cdf(2, 5, p)

0.926446734432406

Pr(N < 3): Probability < 8 turn in the assignment on time.


In [76]:
binom.cdf(2, 5, p)

0.07355326556759396