# Introduction to Modeling 1

In [None]:
import numpy as np

## Predicting Scores

In [None]:
scores = [89, 92, 78, 94, 88]

We want to model this data using a Normal (Gaussian) distribution. This distribution takes two parameters: $\mu$ and $\sigma^2$.

Let's come up with some values for these parameters based on our data.

In [None]:
mu = np.mean(scores)
mu

In [None]:
sigmaSq = np.var(scores)
sigmaSq

what happens if we sample from a Normal distribution with mean mu and variance sigmaSq?

In [None]:
np.random.normal(mu, sigmaSq)

In [None]:
mySum = 0
for i in range(1000):
    mySum = mySum + np.random.normal(mu, sigmaSq)
mySum/1000

We want to predict the next 5 scores. So, we sample 5 times from this distribution.

In [None]:
numSamples = 5
next5 = 0
for i in range(numSamples):
    next5 = next5 + np.random.normal(mu, sigmaSq)
next5

In [None]:
next5/numSamples

We have a mean of 88.2 on the first 5 scores

We expect a mean of 88.2 on the next 5 scores

This gives us a total of $88.2 * 10 = 882$ for the expected sum on the mean of all the scores

95% confidence on **sum**: $882 \pm 2 \times 12.36 = 882 \pm 24.7$

Where does the $\pm 2 \times 12.36$ come from? 

<br><br><br><br>
<br><br><br><br>




It's the standard error for the **sum** of the samples.

In [None]:
np.sqrt(numSamples) * np.sqrt(sigmaSq)

Divide by the number of samples to get the range for a single predicted score

In [None]:
np.sqrt(numSamples) * np.sqrt(sigmaSq) / numSamples

The 95% Confidence Interval is the sample value $\pm$  2 standard errors

In [None]:
12.36/5

## Assignment Turn-in

Model using the Exponetial Distribution
https://en.wikipedia.org/wiki/Exponential_distribution

In [None]:
turnIn100 = [18,22,45,49,86]
turnIn100

Know mean of exponential is $\lambda^{-1}$

In [None]:
np.mean(turnIn100)

In [None]:
lambdaInv = 1/np.mean(turnIn100)
lambdaInv

The CDF for the Exponential Distribution is 	

$1 − e^{−\lambda x}$

We want to evaluate our probabilty at 1 hour before the due date, so 





In [None]:
x = 167 - 100
x

Plugging in our values:


In [None]:
1 - np.exp(-lambdaInv * x)

The probability of each remaining person turning in the assignment by the deadline is 0.781

### Applying the model

In [None]:
b = np.random.binomial(1, 0.781)
b

The Binomial Probability Mass Function takes, as parameters:

k = number of successes

n = number of trials

p = probability of success


In [None]:
from scipy.stats import binom
p = 0.781

Pr(N = 5): All 10 turn in the assignment on time

In [None]:
binom.pmf(5,5,p)

Pr(N ≥ 4): Probability 9+ turn in the assignment on time.

Recall that the CDF tells us $\le$

To get $\ge$, subtract from 1

In [None]:
1 - binom.cdf(3, 5, p)

Pr(N ≥ 3): Probability 8+ turn in the assignment on time.


In [None]:
1 - binom.cdf(2, 5, p)

Pr(N < 3): Probability < 8 turn in the assignment on time.


In [None]:
binom.cdf(2, 5, p)

Copyright ©2019 Christopher M Jermaine (cmj4@rice.edu), and Risa B Myers  (rbm2@rice.edu)

Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at

    https://www.apache.org/licenses/LICENSE-2.0

Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.