![Numpy Logo](Numpy_logo.jpg)

# Assignment exploring numpy.random package in python

## Problem statement
1. Explain the overall purpose of the package.
2. Explain the use of the “Simple random data” and “Permutations” functions.
3. Explain the use and purpose of at least five “Distributions” functions.
4. Explain the use of seeds in generating pseudorandom numbers.

## Numpy explained
NumPy (Numerical Python) is an open source Python library considered as the fundamental package for scientific computing in Python. It is the universal standard Python library for working with numerical data in almost every field of science and engineering. The NumPy API is used extensively in many other Python data science packages such as Pandas, SciPy, Matplotlib, scikit-learn, scikit-image and many other data science and scientific Python packages. The NumPy package contains multidimensional array and matrix data structures, and enables fast operations on arrays, including mathematical, logical, shape manipulation, sorting, basic linear algebra, statistical operations and random simulation (and much more). [1,2]

## 1. Numpy.random explained
#### What is a random number?
"*Random - made, done, or happening without method or conscious decision*" (definition - Oxford English Dictionary). A Random number does NOT mean it has to be a different number every time. Random means something that can not be predicted logically. [3]  
  
The ability to generate random numbers is an important part of the configuration and evaluation of many numerical and machine learning algorithms. From shuffling datasets randomly or splitting data into random sub-sets, being able to generate random numbers (actually, repeatable pseudo-random numbers) is an essential part of data science. [2]
#### How it works  
Numpy’s random number function produces pseudo random numbers using combinations of a BitGenerator to create sequences and a Generator to use those sequences to sample from different statistical distributions:
  
* **BitGenerators:** Objects that generate random numbers. These are typically unsigned integer words filled with sequences of either 32 or 64 random bits.
  
* **Generators:** Objects that transform sequences of random bits from a BitGenerator into sequences of numbers that follow a specific probability distribution (such as uniform, Normal or Binomial) within a specified interval. [4]
  
For the purpose of this investigation the Numpy random module can be split into four parts
* Simple Random Data
* Permutations
* Distributions
* Random Generator
***
##### Code

In [1]:
# First import numpy package. 
import numpy as np
# Next check the version is V 1.19
np.version.version

'1.19.2'

In [2]:
# Next we import matplotlib and seaborn for data visualtion and charts
import matplotlib.pyplot as plt 
import seaborn as sns
# call "magic function" for matplotlib to show charts in jupyter notebook
# Ref - https://stackoverflow.com/questions/43027980/purpose-of-matplotlib-inline
%matplotlib inline

## 2. "Simple random data" and "Permutations" functions explained
Referencing the Numpy random documentation [5] the following are the available **simple random data** functions

| Numpy Function | Description |
|:---------------|:------------|
| **integers**(low[, high, size, dtype, endpoint]) | Return random integers from low (inclusive) to high (exclusive), or if endpoint=True, low (inclusive) to high (inclusive)|
| **random**([size, dtype, out]) | Return random floats in the half-open interval [0.0, 1.0]|
| **choice**(a[, size, replace, p, axis, shuffle]) | Generates a random sample from a given 1-D array|
| **bytes**(length) | Return random bytes |  
  
***

#### _Integers( )_ function
From the table above Return random integers from low (inclusive) to high (exclusive)

In [3]:
# start using the random number generator (rng) integers
rng = np.random.default_rng()
rng.integers(2, size=10)

array([0, 1, 1, 1, 0, 0, 0, 0, 1, 0])

From consulting the documentation it is noted that the above function produces a 1-D array with 10 entries returning random numbers between 0 (inclusive) and 2 (exclusive). [6]

In [4]:
rng.integers(3,10, size=(2,4)) # integers between 3 (incl) and 10 (excl) size 2 rows and 4 columns

array([[6, 7, 8, 5],
       [7, 7, 6, 8]])

***
#### _Random( )_ function
From the description in the table above "Return random floats in the half-open interval [0.0, 1.0]"

In [5]:
# Use random function to generate a random number between 0 (incl) and 1 (excl)
rng = np.random.default_rng()
rng.random()

0.3790530837506113

In [6]:
# Create a 3 x 2 array of random floating point numbers
rng = np.random.default_rng()
rng.random((3,2)) # note double brackets required

array([[0.85335229, 0.91076116],
       [0.66382549, 0.37755545],
       [0.47400118, 0.33570134]])

***
#### _Choice( )_ function
Generates a random sample from a given 1-D array
Parameters
a: {array_like, int}
   If an ndarray, a random sample is generated from its elements. If an int, the random sample is generated from np.arange(a).

size : Output shape {int, tuple[int]}, optional

replace : bool, optional - Whether the sample is with or without replacement

p : 1-D array_like, optional - the probabilities associated with each entry in a. If not given the sample assumes a uniform distribution over all entries in a.

axis : int, optional The axis along which the selection is performed. The default, 0, selects by row.

shuffle : bool, optional - whether the sample is shuffled when sampling without replacement. Default is True, False provides a speedup. 

In [7]:
rng = np.random.default_rng()
# Generate a uniform random sample from np.arange(6) of size 6
rng.choice(6,6)

array([0, 5, 4, 0, 4, 0])

In [8]:
# Generate a non-uniform random sample from np.arange(5) of size 10 without replacement (repetition):

rng.choice(5, 5, replace=False, p=[0.1, 0.2, 0.2, 0.3, 0.2])

array([0, 1, 2, 3, 4])

***
The function can also be used with any array not just integers. e.g.

In [9]:
counties = ['Wicklow', 'Dublin', 'Cork', 'Galway', 'Limerick']
rng.choice(counties, 5, replace=True)

array(['Galway', 'Galway', 'Dublin', 'Dublin', 'Dublin'], dtype='<U8')

***
#### _bytes( )_ function
Generates random bytes
Parameters: 
length : int - Number of random bytes.

In [10]:
rng.bytes(12)

b'`k\xcaNK\xbe\xe3\xa6\xeb\x04\xd4\xc6'

***

### References
[1] https://numpy.org/doc/stable/user/whatisnumpy.html#whatisnumpy  
[2] https://numpy.org/doc/stable/user/absolute_beginners.html#generating-random-numbers  
[3] https://www.w3schools.com/python/numpy_random.asp  
[4] https://numpy.org/doc/stable/reference/random/index.html  
[5] Numpy Random Generator; https://numpy.org/doc/stable/reference/random/generator.html  
[6] np.random.integers; https://numpy.org/doc/stable/reference/random/generated/numpy.random.Generator.integers.html#numpy.random.Generator.integers  
[7] np.random.random; https://numpy.org/doc/stable/reference/random/generated/numpy.random.Generator.random.html#numpy.random.Generator.random
[8] np.random.choice(); https://numpy.org/doc/stable/reference/random/generated/numpy.random.Generator.choice.html#numpy.random.Generator.choice  
[9] np.random.bytes(); https://numpy.org/doc/stable/reference/random/generated/numpy.random.Generator.bytes.html#numpy.random.Generator.bytes