# 3. thinking-probabilistically--discrete-variables

**Generating random numbers using the np.random module**

In [0]:
# Import numpy
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns

# Seed the random number generator
np.random.seed(42)

# Initialize random numbers: random_numbers
random_numbers = np.empty(100000)


In [0]:
# Generate random numbers by looping over range(100000)
for i in range(100000):
    random_numbers[i] = np.random.random()
    print(random_numbers[i])


In [0]:
# Plot a histogram
_ = plt.hist(random_numbers)

# Show the plot
plt.show()

**How many defaults might we expect?**


Instructions
- Seed the random number generator to 42.
- Initialize n_defaults, an empty array, using np.empty(). It should contain 1000 entries,
since we are doing 1000 simulations.
- Write a for loop with 1000 iterations to compute the number of defaults per 100 loans using
the perform_bernoulli_trials() function. It accepts two arguments: the number of trials n -
in this case 100 - and the probability of success p - in this case the probability of a default,
which is 0.05. On each iteration of the loop store the result in an entry of n_defaults.
- Plot a histogram of n_defaults. Include the normed=True keyword argument so that the height of
the bars of the histogram indicate the probability.
- Show your plot.

In [0]:
def perform_bernoulli_trials(n, p):
    """Perform n Bernoulli trials with success probability p
    and return number of successes."""
    # Initialize number of successes: n_success
    n_success = 0

    # Perform trials
    for i in range(n):
        # Choose random number between zero and one: random_number
        random_number = np.random.random()

        # If less than p, it's a success so add one to n_success
        if random_number < p:
            n_success += 1

    return n_success

In [0]:
# Seed random number generator
np.random.seed(42)

# Initialize the number of defaults: n_defaults
n_defaults = np.empty(1000)

# Compute the number of defaults
for i in range(1000):
    n_defaults[i] = perform_bernoulli_trials(100, 0.05)

In [0]:
# Plot the histogram with default number of bins; label your axes
_ = plt.hist(n_defaults, normed=True)
_ = plt.xlabel('number of defaults out of 100 loans')
_ = plt.ylabel('probability')

# Show the plot
plt.show()

**Plotting the Binomial PMF**

Instructions
- Using np.arange(), compute the bin edges such that the bins are centered on the integers.
Store the resulting array in the variable bins.
- Use plt.hist() to plot the histogram of n_defaults with the normed=True and bins=bins
keyword arguments.
- Leave a 2% margin and label your axes.
-Show the plot.

In [0]:
versicolor_petal_length = np.array([ 4.7,  4.5,  4.9,  4. ,  4.6,  4.5,  4.7,  3.3,  4.6,  3.9,  3.5,
        4.2,  4. ,  4.7,  3.6,  4.4,  4.5,  4.1,  4.5,  3.9,  4.8,  4. ,
        4.9,  4.7,  4.3,  4.4,  4.8,  5. ,  4.5,  3.5,  3.8,  3.7,  3.9,
        5.1,  4.5,  4.5,  4.7,  4.4,  4.1,  4. ,  4.4,  4.6,  4. ,  3.3,
        4.2,  4.2,  4.2,  4.3,  3. ,  4.1])

# Set default Seaborn style
sns.set()

# Plot histogram of versicolor petal lengths
_ = plt.hist(versicolor_petal_length)

# Show histogram
plt.show()

**plotting-the-ecdf**

**Instructions**
- Use ecdf() to compute the ECDF of versicolor_petal_length. Unpack the output into x_vers and y_vers.
- Plot the ECDF as dots. Remember to include marker = '.' and linestyle = 'none' in addition to x_vers and y_vers as arguments inside plt.plot().
- Set the margins of the plot with plt.margins() so that no data points are cut off. Use a 2% margin.
- Label the axes. You can label the y-axis 'ECDF'.
- Show your plot.

In [0]:
def ecdf(data):
    """Compute ECDF for a one-dimensional array of measurements."""

    # Number of data points: n
    n = len(data)

    # x-data for the ECDF: x
    x = np.sort(data)

    # y-data for the ECDF: y
    y = np.arange(1, n + 1) / n

    return x, y

In [0]:
# Seed random number generator
np.random.seed(42)

# Take 10,000 samples out of the binomial distribution: n_defaults
n_defaults = np.random.binomial(n=100, p=0.05, size=10000)

# Compute bin edges: bins
bins = np.arange(min(n_defaults), max(n_defaults) + 1.5) - 0.5

In [0]:
# Generate histogram
_ = plt.hist(n_defaults, normed=True, bins=bins)

# Set margins
_ = plt.margins(0.02)

# Label axes
_ = plt.xlabel('x')
_ = plt.ylabel('y')

# Show the plot
plt.show()

**Relationship between Binomial and Poisson distributions**

Instructions
- Using the np.random.poisson() function, draw 10000 samples from a Poisson distribution with
a mean of 10.
- Make a list of the n and p values to consider for the Binomial distribution. Choose n =
[20, 100, 1000] and p = [0.5, 0.1, 0.01] so that np is always 10.
- Using np.random.binomial() inside the provided for loop, draw 10000 samples from a Binomial
distribution with each n, p pair and print the mean and standard deviation of the samples.
There are 3 n, p pairs: 20, 0.5, 100, 0.1, and 1000, 0.01. These can be accessed inside the
loop as n[i], p[i].

In [0]:
# Seed random number generator
np.random.seed(42)

# Draw 10,000 samples out of Poisson distribution: samples_poisson
samples_poisson = np.random.poisson(10, size=10000)

# Print the mean and standard deviation
print('Poisson:     ', np.mean(samples_poisson),
      np.std(samples_poisson))

In [0]:
# Specify values of n and p to consider for Binomial: n, p
n = [20, 100, 1000]
p = [0.5, 0.1, 0.01]

# Draw 10,000 samples for each n,p pair: samples_binomial
for i in range(3):
    samples_binomial = np.random.binomial(n[i], p[i], 10000)

    # Print results
    print('n =', n[i], 'Binom:', np.mean(samples_binomial),
          np.std(samples_binomial))

**Sampling out of the Binomial distribution**

Instructions
- Draw samples out of the Binomial distribution using np.random.binomial(). You should use
parameters n = 100 and p = 0.05, and set the size keyword argument to 10000.
- Compute the CDF using your previously-written ecdf() function.
- Plot the CDF with axis labels. The x-axis here is the number of defaults out of 100 loans,
while the y-axis is the CDF.
- Show the plot.

In [0]:
def ecdf(data):
    """Compute ECDF for a one-dimensional array of measurements."""

    # Number of data points: n
    n = len(data)

    # x-data for the ECDF: x
    x = np.sort(data)

    # y-data for the ECDF: y
    y = np.arange(1, n + 1) / n

    return x, y

In [0]:
# Seed random number generator
np.random.seed(42)

# Take 10,000 samples out of the binomial distribution: n_defaults
n_defaults = np.random.binomial(n=100, p=0.05, size=10000)

# Compute CDF: x, y

x, y = ecdf(n_defaults)

In [0]:
# Plot the CDF with axis labels
_ = plt.plot(x, y, marker='.', linestyle='none')
_ = plt.xlabel('Defaults out of 100')
_ = plt.ylabel('CDF')

# Show the plot
plt.show()

**The np.random module and Bernoulli trials**

Instructions
- Define a function with signature perform_bernoulli_trials(n, p).
    -Initialize to zero a variable n_success the counter of Trues, which are Bernoulli trial successes.
    - Write a for loop where you perform a Bernoulli trial in each iteration and increment the number of success if the result is True. Perform n iterations by looping over range(n).
    - To perform a Bernoulli trial, choose a random number between zero and one using np.random.random(). If the number you chose is less than p, increment n_success (use the += 1 operator to achieve this).
    - The function returns the number of successes n_success.

In [0]:
def perform_bernoulli_trials(n, p):
    """Perform n Bernoulli trials with success probability p
    and return number of successes."""
    # Initialize number of successes: n_success
    n_success = 0

    # Perform trials
    for i in range(n):
        # Choose random number between zero and one: random_number
        random_number = np.random.random()

        # If less than p, it's a success so add one to n_success
        if random_number < p:
            n_success += 1

    return n_success


In [0]:
print(perform_bernoulli_trials(10000, 0.65))

**Was 2015 anomalous?**

Instructions
- Draw 10000 samples from a Poisson distribution with a mean of 251/115 and assign to n_nohitters.
- Determine how many of your samples had a result greater than or equal to 7 and assign to n_large.
- Compute the probability, p_large, of having 7 or more no-hitters by dividing n_large by the total
number of samples (10000).
- Hit 'Submit Answer' to print the probability that you calculated.

In [0]:
# Seed random number generator
np.random.seed(42)

# Draw 10,000 samples out of Poisson distribution: n_nohitters
n_nohitters = np.random.poisson((251 / 115), size=10000)

# Compute number of samples that are seven or greater: n_large
n_large = np.sum(n_nohitters >= 7)

# Compute probability of getting seven or more: p_large
p_large = n_large / 10000

In [0]:
# Print the result
print('Probability of seven or more no-hitters:', p_large)

**Will the bank fail?**

Instructions
- Compute the x and y values for the ECDF of n_defaults.
- Plot the ECDF, making sure to label the axes. Remember to include marker = '.' and
linestyle = 'none' in addition to x and y in your call plt.plot().
- Show the plot.
- Compute the total number of entries in your n_defaults array that were greater than or
equal to 10. To do so, compute a boolean array that tells you whether a given entry of
n_defaults is >= 10. Then sum all the entries in this array using np.sum(). For example,
np.sum(n_defaults <= 5) would compute the number of defaults with 5 or fewer defaults.
- The probability that the bank loses money is the fraction of n_defaults that are greater
than or equal to 10. Print this result by hitting 'Submit Answer'!

In [0]:
def ecdf(data):
    """Compute ECDF for a one-dimensional array of measurements."""

    # Number of data points: n
    n = len(data)

    # x-data for the ECDF: x
    x = np.sort(data)

    # y-data for the ECDF: y
    y = np.arange(1, n + 1) / n

    return x, y

In [0]:
n_defaults = np.array([6,  5,  7,  8,  5,  5,  3,  2,  7,  6,  7,  3,  8,  3,  8,  5,  2,
                       6,  4,  5,  3,  2,  9,  5,  5,  3,  8,  7,  7,  5,  4,  3,  4,  5,
                       6,  1,  8,  4,  2,  9,  6,  5,  2,  6,  3,  6,  2,  6,  4,  4,  6,
                       4,  5,  4,  3,  4,  6,  3,  4,  8,  2,  4,  1,  6,  6,  6,  2,  2,
                       3,  8,  7,  2,  6,  6,  3,  6,  3, 10,  7,  6,  4,  5,  8,  4,  6,
                       4,  6,  1, 10,  4,  4,  4,  5,  4,  5,  2,  8,  7,  3,  7,  9,  6,
                       8,  2,  5,  4,  3,  6,  2,  6,  9,  5,  6,  6,  4,  4,  7,  6,  6,
                       7,  1,  5,  4,  1,  4,  6,  3,  2,  3,  8,  8,  6,  7,  6,  4,  4,
                       7,  2,  4,  7,  5,  4,  6,  6,  8,  4,  2,  5,  6,  3,  7,  6,  5,
                       10,  4,  4,  5,  7,  7,  6,  6,  4,  6,  9,  4,  4,  7,  4,  8,  5,
                       4,  3,  3,  6,  6,  1,  5,  7,  3,  7,  3,  4,  4,  3,  2,  2,  0,
                       7,  3,  7,  7,  3,  8,  6,  4,  3,  4,  4,  5,  2,  4,  4,  3,  7,
                       3,  6,  4,  2,  6,  6,  6,  7,  6,  4,  6,  6,  7,  8,  7,  3,  3,
                       3,  4,  7,  6,  5,  3,  6,  4,  2,  3,  2,  4,  6,  3,  4,  5,  1,
                       3,  6,  3,  6,  1,  6,  6,  7,  4,  7,  7,  4,  2,  8,  6,  3,  7,
                       3,  4,  4,  5,  1,  7,  1,  5,  3,  4,  8,  9,  3,  9,  8,  4,  0,
                       11,  7,  6,  7,  6,  7,  4,  5,  4,  5,  3,  6,  3,  3,  2,  5,  4,
                       6,  4,  3, 14,  5,  8,  1,  1,  4,  4,  4,  6,  4,  5,  3,  5,  6,
                       4,  5,  6,  3,  3,  8,  4,  1,  8,  5,  7,  5,  3,  2,  8,  4,  5,
                       4,  5,  8,  4,  2,  4,  3,  2,  9,  5,  6,  6,  3,  5,  8,  6,  4,
                       5,  1,  5,  8,  6,  5,  5, 10,  1,  0,  2,  3,  6, 10,  1,  6,  7,
                       5,  6, 13,  6,  4,  3,  3,  6,  3,  6,  3,  8,  4,  4,  6,  3,  5,
                       5,  1,  5,  7,  8,  9,  5,  7,  7,  6,  5,  7,  6,  2,  6,  4,  5,
                       6,  6,  2, 15,  8,  3,  5,  5,  6,  4,  4,  4,  2,  6,  7,  5,  3,
                       6,  7,  4,  2,  4,  7,  4,  5,  3,  8,  5,  3,  2,  8,  2,  4,  6,
                       3,  5,  1,  2,  5,  5,  4,  5,  5,  4,  7,  7,  4,  7,  6,  5,  6,
                       8,  5,  5,  4,  4,  1,  4,  3,  2,  4,  2,  6,  4,  6,  1,  6,  5,
                       5,  8,  7,  6,  3,  7,  6,  9,  5,  6,  1,  6,  6,  5,  6,  6,  5,
                       6,  2,  7,  4,  3, 13,  6,  7,  3,  4,  2,  8,  6,  5,  6,  4,  3,
                       7,  6,  4,  6,  6,  6,  6,  3,  7,  3,  7,  6,  2,  4,  4,  8,  6,
                       8,  4,  4,  5,  6,  7,  2,  8,  1,  6,  3,  9,  7,  1,  2,  7,  5,
                       4,  5, 11,  4,  3,  4,  7,  5,  7,  9,  4,  7,  7,  4,  7,  8,  4,
                       2,  7,  5,  3, 10,  3,  4,  4,  2,  2,  7,  3,  6,  7,  6,  6,  2,
                       2,  5,  7,  7,  6,  5,  3, 10,  6,  4,  4,  4,  4,  2,  6,  5,  7,
                       4,  5,  3,  8,  4,  2, 15,  3,  7,  9,  6,  3,  3,  3,  2,  3,  8,
                       8,  3,  8,  7,  3,  2,  8,  3,  5,  7,  1,  6,  5,  2,  4,  5,  2,
                       4,  4,  2,  5,  0,  6,  8,  5,  6,  3,  6,  5,  7,  3,  3,  4,  5,
                       7,  1,  6,  5,  7,  6,  6,  7,  2,  6,  3,  6,  4,  5,  5,  5,  7,
                       8,  3,  5,  4,  3,  8,  7,  3,  6,  4,  7,  4,  5,  4,  6,  1,  2,
                       2,  6,  3,  4,  7,  1,  7,  5,  6,  3,  7,  2, 10,  5,  3,  9,  7,
                       8,  2,  4,  5,  6,  5,  6,  5,  3,  4,  6,  8, 10,  7,  2,  3,  4,
                       5,  2,  5,  6,  6, 11,  4,  8,  4,  2,  4,  7,  3,  5,  1,  6,  4,
                       2,  5,  3,  3,  3,  3,  4,  3,  6,  6,  7,  6,  5,  6,  6,  9,  1,
                       2,  4,  4,  6,  8,  6,  4,  5,  5,  6,  6,  9,  4,  6,  3,  3,  6,
                       3,  2,  4,  6,  3,  6,  5,  6,  8,  5,  2,  4,  6,  7,  7,  9,  8,
                       4,  5,  3,  1,  4,  6,  1,  6,  2,  3,  5,  5,  3,  4,  6,  4,  5,
                       4,  7,  6,  3,  2,  4,  7,  6,  3,  4,  2,  5,  6,  7,  4,  8,  3,
                       3, 10,  3,  2,  5,  4,  5,  6,  6,  5,  2,  5,  2,  4,  6,  8,  4,
                       3,  6,  2,  1,  6,  3,  5,  9,  6,  2,  9,  4,  4,  6,  4,  5,  7,
                       5,  7,  4,  8,  4,  4,  4,  4,  7,  3,  7,  8,  2,  6,  2,  7,  7,
                       4,  3,  7,  9,  5,  5,  7,  4,  5,  3,  7,  7,  4,  3,  3,  3,  6,
                       5,  6,  8,  2,  9,  9,  4,  7,  8,  3,  4,  4,  7,  4,  2,  6,  8,
                       5,  6,  5,  3,  6,  9,  3,  1,  6,  4,  4,  3,  5,  5,  5, 12,  5,
                       10, 11,  3,  9,  3,  7,  5,  5,  5,  4,  5,  8,  9,  6,  6,  4,  6,
                       4,  6,  3,  6,  7,  5,  4,  2,  4,  6,  2,  3,  4,  6,  6,  1,  3,
                       7,  3,  7,  3,  1,  5,  5,  1,  8,  6,  8,  5,  5,  8,  6,  3,  2,
                       6,  4,  9,  6,  4,  8,  3,  8,  4,  3,  3,  4,  4,  3,  4,  5,  2,
                       4,  4, 10,  9,  7,  2,  9,  5,  4,  3,  5,  4,  5,  4,  4,  4,  4,
                       3,  4,  5,  7,  3,  5,  3,  2,  2,  5,  5,  7,  7,  2])

In [0]:
# Compute ECDF: x, y
x, y = ecdf(n_defaults)

# Plot the ECDF with labeled axes
_ = plt.plot(x, y, marker='.', linestyle='none')
_ = plt.xlabel('x')
_ = plt.ylabel('y')

# Show the plot
plt.show()

In [0]:
# Compute the number of 100-loan simulations with 10 or more defaults: n_lose_money
n_lose_money = np.sum(n_defaults >= 10)

# Compute and print probability of losing money
print('Probability of losing money =', n_lose_money / len(n_defaults))