In [41]:
import numpy as np

### pseudorandom number generation
- not truly random but can simulate randomness
- computed from an initial value known as a seed
- Pseudorandom sequences exhibit statistical properties similar to those of random sequences, making them suitable for applications where true randomness is not feasible

- Submodule: handled by the `numpy.random` module
- The sequence of numbers generated can be controlled by setting a seed using `numpy.random.seed()`. This is important for reproducibility in experiments and simulations.

Properties of Pseudorandom sequences: 
- uniformity: The numbers generated should be uniformly distributed over the specified range
- independence: Each number should be independent of others
- periodicity: The sequence should have a long period, meaning it does not repeat in a short interval

Practical Applications: 
- simulations
- machine learning
- games and graphics

The **standard normal distribution** is a specific way of describing numbers that are spread out in a predictable way around a central value, with most numbers close to this center. Imagine you have a bunch of measurements (like people's heights or test scores), and they mostly cluster around an average value.

### Simple Breakdown
- **Center at 0**: The center, or middle value, of the standard normal distribution is 0. This means that if you looked at all your data, most values would be close to 0.
- **Spread of 1**: The numbers in this distribution spread out from the center in a way that most are within "1 unit" of it. This "1" is called the *standard deviation* (a way to measure spread), and here it's set to 1 by default.

### Shape of the Distribution
- If you made a graph of all the numbers, it would look like a smooth hill or a bell shape, where:
  - Most numbers are clustered around the center (close to 0).
  - Fewer numbers are further away from the center.
  
### Why It’s Useful
1. **Predictability**: It helps to make predictions about data. For example, if something follows a standard normal distribution, you know most values will be close to 0, and only a few will be far away.
2. **Comparing Data**: If you have different groups of data, you can use this distribution as a standard to compare them.


In [42]:
samples = np.random.standard_normal(size=(4,4))
samples

array([[ 1.36914012, -0.14884643, -0.5830833 ,  1.81981158],
       [-1.77155212, -0.13457015,  2.07750097, -1.17244064],
       [ 0.73490699, -0.76893758, -0.88157469,  0.85363248],
       [-1.17467656,  1.62875786, -0.7789911 ,  1.13503535]])

In [43]:
from random import normalvariate
import random

In Python's `random` module, the `normalvariate()` function generates random numbers following a **normal distribution** (also known as a Gaussian distribution), which is characterized by two parameters:

- **Mean (μ)**: This is the central or average value around which the numbers will be spread.
- **Standard deviation (σ)**: This represents the spread or “width” of the distribution. A higher standard deviation means values are more spread out from the mean.

### Syntax
```python
random.normalvariate(mu, sigma)
```

- `mu`: The mean (average) of the distribution.
- `sigma`: The standard deviation (spread) of the distribution.

### How It Works
- When you call `normalvariate(mu, sigma)`, it returns a random float number from a normal distribution with the specified mean (`mu`) and standard deviation (`sigma`).
- The numbers it generates are more likely to be close to the mean, with fewer numbers appearing as they get further from the mean, forming the typical "bell curve" shape.


### Use Cases
- **Simulating Real-World Data**: Many natural processes (like heights, test scores, measurement errors) often follow a normal distribution, so `normalvariate()` can simulate such data.
- **Sampling and Randomness**: It’s useful for simulations and generating sample data, where you want random values that follow a predictable, normal pattern.

### Difference from Standard Normal Distribution
- The `normalvariate()` function is not limited to a mean of 0 and a standard deviation of 1 (as in the standard normal distribution). You can set `mu` and `sigma` to whatever values you need for your specific situation.

So, `normalvariate()` is a handy function for generating normally distributed random values with any specified mean and spread!

In [44]:
random_number = random.normalvariate(100, 15)
random_number
# If you run this multiple times, you'll get a series of numbers that generally cluster around 100, with most values falling within 15 units of the mean (i.e., between about 85 and 115)

99.160812883207

In [45]:
N = 1_000_000

In [46]:
%timeit samples = [normalvariate(0,1) for _ in range(N)]

1.22 s ± 146 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)


In [47]:
%timeit np.random.standard_normal(N)

39.8 ms ± 1.5 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)


#### configure an explicit generator: 

In [48]:
rng = np.random.default_rng(seed=12345)
data = rng.standard_normal((2,3))
data

array([[-1.42382504,  1.26372846, -0.87066174],
       [-0.25917323, -0.07534331, -0.74088465]])

In [49]:
type(rng)

numpy.random._generator.Generator

### Numpy random number generator methods

**permutation(x)**: returns a random permutation of a sequence

In [50]:
# If x is an integer, it returns a randomly permuted range of integers from 0 to x-1. If x is an array, it returns a randomly shuffled copy of the array
np.random.permutation(5)

array([4, 1, 0, 2, 3])

In [51]:
np.random.permutation([1,2,3,4,5])

array([1, 4, 3, 5, 2])

**shuffle(x)**: modifies the input array itself instead of returning a new one

In [52]:
arr = np.array([1, 2, 3, 4])
np.random.shuffle(arr) # arr is now shuffled
arr

array([3, 4, 1, 2])

**uniform(low=0.0, high=1.0, size=None)**: draws samples from a uniform distribution over the interval [low, high)

In [53]:
np.random.uniform(1,11,5)

array([ 2.99254024, 10.26350994,  2.12924562, 10.26078162,  7.52135578])

**integers(low, high=None, size=None, dtype=int)**

In [55]:
# np.random.integers(1,10,size=5)

**standard_normal(size=None)**: 

In [56]:
np.random.standard_normal(5) 


array([ 2.31414015,  1.21808064, -0.75086824,  0.15143224,  1.27793444])

**binomial(n, p, size=None)**: draws samples from a binomial distribution with n trials and probability of success p

In [57]:
np.random.binomial(10, 0.5, size=5)  # Five samples from a binomial distribution with 10 trials, 0.5 success probability


array([3, 2, 3, 6, 4])

**normal(loc=0.0, scale=1.0, size=None)**: draws samples from a normal (Gaussian) distribution with specified `mean (loc)` and `standard deviation (scale)`

In [59]:
np.random.normal(5,2,10) # # Ten samples from N(5, 2)

array([3.16260202, 2.32883006, 5.65320482, 6.41057212, 7.66854188,
       1.98168718, 5.03909869, 5.96474926, 2.80807702, 1.26211323])

**beta(a, b, size=None)**: samples from a beta distribution. commonly used in bayesian statistics

In [60]:
np.random.beta(0.5,0.5,10)

array([0.29591079, 0.10513296, 0.95002605, 0.042954  , 0.45723243,
       0.37029304, 0.70949989, 0.99865987, 0.98299856, 0.98661603])

**chisquare(df, size=None)**: Draws samples from a chi-square distribution with degrees of freedom `df`. Often used in hypothesis testing

In [61]:
np.random.chisquare(2, size=5)  # Five samples from a chi-square distribution with 2 degrees of freedom


array([0.38406269, 2.77766033, 2.89555772, 0.53583832, 1.57877623])

**gamma(shape, scale=1.0, size=None)**: samples from a gamma distribution, often used to model waiting times

In [62]:
np.random.gamma(2, 2, 5)  # Five samples from a gamma distribution with shape=2 and scale=2


array([4.85850324, 4.40552689, 1.72981068, 2.51139238, 4.86453441])