# 5 Generating Data with Numpy
# 5_6 Probability Distributions in NumPy
- We'll simulate datasets to resemble a process that follows a specific probability distribution.
- Normal, Poisson, Binomial, Logaritmic...
- https://numpy.org/doc/stable/reference/random/generator.html


### Poisson - numpy.random.poisson
- random.poisson(lam=1.0, size=None) -> Draw samples from a Poisson distribution.
- The Poisson distribution is the limit of the binomial distribution for large N.
- Lambda = 1 --> Over a fixed interval of time, distance or space we expect an event to occur *exactly* once.

### Binomial - numpy.random.binomial
- random.binomial(n, p, size=None) -> Draw samples from a binomial distribution.
- Samples are drawn from a binomial distribution with specified parameters, n trials and p probability of success where n an integer >= 0 and p is in the interval [0,1].
- Measures how many times a certain outcome can appear over a series of trials, where there are only 2 possible outcomes.
- n: number ot trials, p: probability of getting our desired outcome.

### Logistics - numpy.random.logistic
- random.logistic(loc=0.0, scale=1.0, size=None) -> Draw samples from a logistic distribution.
- Samples are drawn from a logistic distribution with specified parameters, loc (location or mean, also median), and scale (>0).

In [22]:
import numpy as np
np.__version__

'2.1.1'

In [23]:
# Generator function as gen instead of numpy.random.Generator()
from numpy.random import Generator as gen 
# Bit-generator PCG64 as pcg instead of numpy.random.PCG64()
from numpy.random import PCG64 as pcg

In [24]:
array_RG = gen(pcg(seed=365))
array_RG.poisson(size=(5,5))
# The output is an array of 0s, 1s, and 2s because the default value
# for lambda is one.

array([[2, 0, 1, 1, 2],
       [1, 1, 0, 1, 1],
       [1, 2, 1, 1, 0],
       [0, 1, 0, 2, 1],
       [0, 1, 0, 0, 2]])

In [25]:
# Expexct the event ocurr 10 times
array_RG = gen(pcg(seed=365))
array_RG.poisson(lam = 10, size=(5,5))
# Statistical speaking, this result means that if we run an experiment
# 25 times (5,5), in one of those cases, the event has featured exactly
# 7 times even though it usually features exactly 10 times per experiment.
# Fixing our expectations for obtaining 10 features for every experiment
# is NOT realistic.

array([[11, 12, 12, 14, 13],
       [ 9, 10, 11, 11,  8],
       [11,  8, 10,  9, 14],
       [ 7,  8,  9, 15, 15],
       [13,  8,  8,  7,  9]])

In [26]:
array_RG = gen(pcg(seed=365))
display(array_RG.binomial(n=100, p=0.4))
array_RG.binomial(n=100, p=0.4, size=(5,5))
# Each element is the number of times (out of 100) we've gotten
# our preferred outcome
# 40% probability of getting what we want on each of th 100 individual
#  trials for a given experiment.

42

array([[44, 30, 36, 45, 36],
       [41, 38, 42, 41, 35],
       [31, 35, 46, 29, 41],
       [41, 46, 34, 48, 45],
       [45, 45, 40, 43, 47]])

In [27]:
array_RG = gen(pcg(seed=365))
display(array_RG.logistic(loc=9, scale=1.2))
array_RG.logistic(loc=9, scale=1.2, size=(5,5))
# Each element is a possible outcome of a logistic distribution
# with loc=9 and scale=1.2

10.377678220948258

array([[10.42451863,  9.63404367,  7.36153427,  9.82286787,  5.81223125],
       [10.09354231,  6.46790532, 11.38740256,  8.97147918, 10.85844698],
       [ 8.79081317,  5.962079  ,  9.99560681,  8.34539118,  7.97105522],
       [ 8.9981544 ,  8.93530194,  9.6253307 ,  9.23850869,  9.73729284],
       [ 5.3090678 , 10.13723528, 11.04372782,  7.11078651, 10.1929009 ]])