# Investigation into numpy.random package

### Introduction to numpy and numpy.random

NumPy is a library available in Python that aids in mathematical, scientific, engineering, and data analysis. NumPy is an incredible library to perform mathematical and statistical operations. It works perfectly well for multi-dimensional arrays and matrices multiplication.

In an array, you can store multiple items of the same data type. It's from the facilities around the array object that makes numpy very convenient for performing math and data manipulations.

Below I will be investigating the random module of the numpy package, this module provides a variety of functions to generate random numbers (and also statistical distributions) of any given shape.

#### Simple Random Data

A simple random data is a subset of a statistical population where each member of the subset has an equal probability of being chosen. A simple random sample is meant to be an unbiased representation of a group. An example of a simple random sample would be the names of 20 employees being chosen out of a hat from a company of 200 employees. In this case, the population is all 200 employees, and the sample is random because each employee has an equal chance of being chosen.


#### Permutation

A permutation is an arrangement of all or part of a set of objects, with regard to the order of the arrangement.

Example: Suppose we have a set of three letters: A, B, and C. We might ask how many ways we can arrange 2 letters from that set. Each possible arrangement would be an example of a permutation. The complete list of possible permutations would be: AB, AC, BA, BC, CA, and CB.

When they refer to permutations, statisticians use a specific terminology. They describe permutations as n distinct objects taken r at a time. Translation: n refers to the number of objects from which the permutation is formed; and r refers to the number of objects used to form the permutation. Consider the example from the previous paragraph. The permutation was formed from 3 letters (A, B, and C), so n = 3; and the permutation consisted of 2 letters, so r = 2.

The permutation in Numpy has 2 functions, the shuffle(x) and the permutation(x):

- shuffle(x):

This function modify a sequence in-place by shuffling its contents.
It only shuffles the array along the first axis of a multi-dimensional array. The order of sub-arrays is changed but their contents remains the same.

Examples:

In [9]:
import numpy as np
array = np.arange(20)
np.random.shuffle(array)
array

array([ 9, 10, 16,  0, 18, 17, 13, 11,  1,  5,  4, 15, 12,  2,  7, 19, 14,
        8,  6,  3])

- permutation(x):

The permutation funcion randomly permute a sequence, or return a permuted range.
If x is a multi-dimensional array, it is only shuffled along its first index.

Examples:

In [6]:
np.random.permutation(30)

array([27,  5, 23, 16,  2,  4,  0,  9, 20, 15,  8, 26,  3, 28, 11, 10, 22,
       24, 21,  1, 14, 13, 12, 29,  6, 18,  7, 19, 17, 25])

In [11]:
np.random.permutation([2, 6, 8, 14, 16])

array([16, 14,  2,  6,  8])

#### Probability Distribution

A probability distribution tells you what is the probability of an event of happening. Probability distributions can show simple events, like picking a card or flipping a coin. They can also show more complex events, like the probability of someone taking a new a drug that would cause side effects.

There are many different types of probability distributions in statistics including:

- Basic probability distributions which can be shown on a probability distribution table.
- Binomial distributions, which have “Successes” and “Failures.”
- Normal distributions (also called a Bell Curve).

Below I will discuss in more depth 5 of the functions from numpy.random:

- binomial
- hypergeometric
- multinomial
- negative binomial
- uniform

#### Binomial

The binomial distribution is a probability distribution that summarizes the likelihood that a value will take one of two independent values under a given set of parameters or assumptions. The underlying assumptions of the binomial distribution are that there is only one outcome for each trial, that each trial has the same probability of success, and that each trial is mutually exclusive, or independent of each other.

Samples are drawn from a binomial distribution with specified parameters, n trials and p probability of success where n an integer >= 0 and p is in the interval [0,1]. (n may be input as a float, but it is truncated to an integer in use)

Examples:


In [1]:
import numpy as np
# n = 20   # number of trials
# p = 0.5  # probability of each trial
example = np.random.binomial(20, 0.5, 50)

print(example)
# result of flipping a coin 10 times, tested 50 times.

[11  8  6 13  6 12  9 11 10  8 11 11 13 11 10  8 12  5  8 11 11 12 12 10
 10 13  9  8 10  8 14  8  9 14 12 15  6 10 12 10  9  8 13 13  8 11 13 12
 12  9]


#### Hypergeometric

The hypergeometric distribution is used to calculate probabilities when sampling without replacement. It is a discrete probability distribution that describes the probability of k successes (random draws for which the object drawn has a specified feature) in n draws, without replacement, from a finite population of size N that contains exactly K objects with that feature, wherein each draw is either a success or a failure. In contrast, the binomial distribution describes the probability of k successes in n draws with replacement.



Example:

Suppose you have an urn of 20 marbles - 10 red and 10 green. You randomly select 10 marbles without replacement, how likely is it that 8 or more of them are one color?

In [15]:
import numpy as np
s = np.random.hypergeometric(10, 10, 10, 100000)
sum(s>=8)/100000. + sum(s<=2)/100000 #quite unlikely

0.022600000000000002

After running this experiment, we coult compare it with the binomial experiment, but the binomial would require that the probability of success be constant on every trial. With the above experiment, the probability of a success changes on every trial.

#### Multinomial

In order to get a multinomial distribution an event that has multiple possible outcomes is needed, with usually more than two, although there does need to be a finite number of outcomes. A good example of this would be if you were to roll a dice there would be six possible outcomes, then the next step would be to repeat the event a given number of times, then we could find the probability of any ultimate outcome.

ex: roll a dice 5 times, find the probability to get 2 3s and 3 4s





A multinomial distribution is the probability distribution of the outcomes from a multinomial experiment. The multinomial distribution is a generalization of the binomial distribution. It models the probability of counts for rolling a k-sided die n times. For n independent trials each of which leads to a success for exactly one of k categories, with each category having a given fixed success probability, the multinomial distribution gives the probability of any particular combination of numbers of successes for the various categories.

In [14]:
import numpy as np
np.random.multinomial(5, [1/6.]*6, size=7)


array([[1, 0, 1, 0, 1, 2],
       [0, 2, 1, 0, 1, 1],
       [0, 0, 0, 0, 3, 2],
       [1, 0, 0, 1, 3, 0],
       [2, 1, 1, 0, 1, 0],
       [1, 1, 1, 1, 1, 0],
       [0, 0, 0, 2, 2, 1]])

#### Negative Binomial

The negative binomial distribution is a discrete probability distribution of the number of successes in a sequence of independent and identically distributed Bernoulli trials before a specified (non-random) number of failures (denoted r) occurs. For example, if we define a 1 as failure, all non-1s as successes, and we throw a die repeatedly until 1 appears the third time (r = three failures), then the probability distribution of the number of non-1s that appeared will be a negative binomial distribution.



Example: 
A company drills wild-cat oil exploration wells, each with an estimated probability of success of 0.1. What is the probability of having one success for each successive well, that is what is the probability of a single success after drilling 5 wells, after 6 wells, etc.?

In [18]:
import numpy as np
s = np.random.negative_binomial(1, 0.1, 100000)
for i in range(1, 11):
    probability = sum(s<i) / 100000.
    print(i, "wells drilled, probability of one success =", probability)

1 wells drilled, probability of one success = 0.0993
2 wells drilled, probability of one success = 0.18901
3 wells drilled, probability of one success = 0.27033
4 wells drilled, probability of one success = 0.34332
5 wells drilled, probability of one success = 0.4082
6 wells drilled, probability of one success = 0.46757
7 wells drilled, probability of one success = 0.52238
8 wells drilled, probability of one success = 0.56995
9 wells drilled, probability of one success = 0.6133
10 wells drilled, probability of one success = 0.65143
