# Before your start:
- Read the README.md file.
- Comment as much as you can and use the resources (README.md file).
- Happy learning!

In [None]:
# Libraries
from scipy import stats
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt

# Challenge 1 - Generate and Plot Normal Distributions
#### Step 1: Generate samples and test normal distribution.

Use mean=50, standard_deviation=5, and sample_size=[10, 50, 500, 5000] to generate 4 random samples that are normally distributed.

**Hint**: Read the documentation for `scipy.stats.norm.rvs`. The function will help you create the normal random samples.

In [None]:
# your code here

mu=50
sigma=5
sample_size=[10, 50, 500, 5000]

# Distributions & Tests
dists = [np.random.normal(mu, sigma, size) for size in sample_size]
tests = [stats.normaltest(dist) for dist in dists]

tests

#### Step 2: Plot normal distributions.

To check the shape of the samples you have just created, you can use matplotlib. For example, you can use it to plot a histogram of each sample. 

If you did, your outputs would be really similar to the ones below:

![normal distributions with different sample sizes](../images/ch-1.png)

#### Compare the distributions above. What do you observe? Explain with the Central Limit Theorem.

In [None]:
"""
your comments here
"""

#### Bonus: Plot normal distributions.

Even though you still don't know how to use matplotlib, as a bonus challenge, you can try to reproduce the plot above using the samples you have created. This plotting library will be introduced later this week, so don't worry if you don't manage to get the results you want now: you can always come back to solve this challenge later.

In [None]:
# your code here
# Create figure and axes
f, ax = plt.subplots(1,4)
f.set_figwidth(15)

# Variables
bins = 20

# Plot each histogram
for x in range(4):
    ax[x].set_title('n = ' + str(sample_size[x]))
    ax[x].hist(dists[x], bins)

# Challenge 2 - Plot Probability Mass Function (PMF)

### Background knowledge

[PMF](https://en.wikipedia.org/wiki/Probability_mass_function) shows the probability distribution of a **discrete random variable**. A [discrete random variable](https://en.wikipedia.org/wiki/Random_variable#Discrete_random_variable) has random numeric values that are not continuous. For example, the number of people in a household can only be integers but not floats. Therefore the number of people in a household is a discrete variable.

### Challenge

We assume that the probability of clicking an advertisement in a Youtube video is 0.15. We have a sample of 5 people who watched the video and we want to plot the PMF for a binomial distribution.

#### Step 1: Create the binomial distribution mentioned above. Store the result in a variable called `dist`. 
**Hint**: use binom from `scipy.stats.binom`. This object takes *n* and *p* as shape parameters, where *n* is the number of independent experiments and *p* the probability of success of the experiment.

In [None]:
# your code here
# Variables 
n = 5
p = 0.15
x = range(n)

# Binomial distribution
dist = stats.binom(n, p)

# Plot line + bullet point -> Discrete Distribution Representation
plt.plot(x, dist.pmf(x), 'bo', markersize=4)
plt.vlines(x, 0, dist.pmf(x), colors='b')
plt.title('Probability Mass Function', fontweight=700)
plt.xlabel('Values')
plt.ylabel('Probability')
plt.show()

#### Step 2: Plot the PMF of the distribution. 
To do it, run the code in the cell below. 

**Hint**: Your output should look like the one below:

![binom 10](../images/ch-2.png)

In [None]:
# run this code
x = range(n)

fig, ax = plt.subplots(1, 1)

plt.plot(x, dist.pmf(x))

plt.show()

#### Step 3: Explain what you observe from the plot above.

In [None]:
"""
your comments here
"""

#### Step 4: Now plot PMF with 50, 500, and 5000 visitors.
To plot the PMF, you can copy the code given above and replace variable `dist` for the name of the variables where you have stored the new binomial objects for 50, 500 and 5000 visitors.

In [None]:
# your code here
# Create figure and axes
f, ax = plt.subplots(1,3)
f.set_figwidth(15)

# Variables
sample_size = [50, 500, 5000]
p = 0.15

# Plot each PMF
# For visualization reasons, the next discrete distributions are represented using continuous lines
for i in range(3):
    x = range(sample_size[i])
    dist = stats.binom(sample_size[i], p)
    ax[i].set_title('n = ' + str(sample_size[i]))
    ax[i].plot(x, dist.pmf(x))
    ax[i].set_xlabel('Values')
ax[0].set_ylabel('Probability');

#### Step 5: What did you notice from the distribution plots? Comment your findings.

In [None]:
"""
your comments here
"""

# Challenge 3
#### Step 1: Research the Poisson distribution. Write about your own understanding of the Poisson distribution.

In [None]:
"""
your comments here
"""

#### Step 2: A website has an average of 300 visits per day. What is the probability of getting 320 visitors in a day?

**Hint**: use `scipy.stats.poisson.pmf`.

In [None]:
# your code here
mu = 300
visits = 320

# Probability
stats.poisson.pmf(visits, mu)

#### Step 3: What is the probability of getting 60 visits?

In [None]:
# your code here
visits = 60

# Probability
stats.poisson.pmf(visits, mu)

#### Step 4: Create a list to store the Poisson distribution probabilities for 0 to 1000 visitors. Store your list in a variable called `arr`.

In [None]:
# your code here
# Variables
visits = range(1000)

# Plot of the distribution
plt.plot(stats.poisson.pmf(visits, mu))
plt.title('Probability Mass Function', fontweight=700)
plt.xlabel('Values')
plt.ylabel('Probability');
plt.show()

#### Step 5: Plot the probabilities.
To do it, run the code in the cell below. Your plot should look like the one below:

![poisson distribution](../images/ch-3.png)

In [None]:
# run this code
plt.plot(arr)
plt.show()

# Challenge 4 - Central Limit Theorem

A delivery company needs 35 minutes to deliver a package, with a standard deviation of 8 minutes. Suppose that in one day, they deliver 200 packages.

**Hint**: `stats.norm.cdf` can help you find the answers.

#### Step 1: What is the probability that the mean delivery time today is between 30 and 35 minutes?

In [None]:
# your code here
# Variables
mu = 35;
stdev = 8;
n_samples = 200;
sigma = stdev / np.sqrt(n_samples)

# Probability
p_3035 = stats.norm.cdf(35, mu, sigma) - stats.norm.cdf(30, mu, sigma)
p_3035

#### Step 2: What is the probability that in total, it takes more than 115 hours to deliver all 200 packages?

In [None]:
# your code here
# Variables
time = 115 * 60
time_x_package = time / n_samples

# Probability
1  - stats.norm.cdf(time_x_package, mu, sigma)

# Challenge 5 - Normal Variables
The value (in thousands) of the monthly sales of a publishing company follows a normal distribution with a mean equal to 200 and a standard deviation equal to 40.

<div align="center">$X → N(200,40)$</div>

**Hint**: `stats.norm.cdf` can help you find the answers.

#### Step 1: Find the probability that the monthly sales are more than 300.

In [None]:
# your code here
# Variables
mu = 200
sigma = 40

# Probability
1 - stats.norm.cdf(300,200,40)

#### Step 2: Find the probability that the monthly sales fall between 160 and 240.

In [None]:
# your code here
# Probability
stats.norm.cdf(240,200,40) - stats.norm.cdf(160,200,40)

#### Step 3: Find the probability that the monthly sales do not exceed 150.

In [None]:
# your code here
stats.norm.cdf(150,200,40)

#### Step 4: Find the probability that the monthly sales exceed 3000.

In [None]:
# your code here
1 - stats.norm.cdf(3000,200,40)

# Challenge 6 - Poisson distribution
The mean number of violent robberies per month that are registered in a particular barrio is 4.

**Hint**: `stats.poisson.cdf` can help you find the answers.

#### Step 1: Find the probability that in a particular month there is no violent robbery.

In [None]:
# your code here
mu = 4

# Probability
stats.poisson.cdf(0,mu)

#### Step 2: Find the probability that there are at least 1 robbery in a given month.

In [None]:
# your code here
stats.poisson.cdf(1,mu)

#### Step 3: Find the probability that there are between 2 and 6 (inclusive) robberies in a given month.

In [None]:
# your code here
stats.poisson.cdf(6,mu)-stats.poisson.cdf(2,mu)

#### Step 4: Find the probability that there are more than 2 robberies in 15 days.

In [None]:
# your code here
1 - stats.poisson.cdf(2,mu/2)