# Programming for Data Analysis Assignment on the numpy.random package in Python.
### Lecturer: Brian McGinley
### Submission Date: 11/11/2019

# Introduction
The tasks due to be covered as per the assignment sheet are outlined as follows;

a) Explain the overall purpose of the package

b) Explain the use of the “Simple random data” and “Permutations” functions.

c) Explain the use and purpose of at least five “Distributions” functions.

d) Explain the use of seeds in generating pseudorandom numbers.

# a. Overall Purpose of the numpy.random Package in Python
##### Random
To explain the numpy.random package it could be beneficial to explain its base module in Python adapted from SciPy docs and NumPy.org tutorials. 

The random module produces a random number and random() is the basic function called for. Many functions within the random thereafter are dependant on random() within the module. 

It returns a random floating point number within the range [0.0, 1.0] and can be used in Python by using the following code structure; 

In [5]:
import random
print("Generating a random number within the range [0.0, 1.0] as explained using random.random()")
print(random.random())

Generating a random number within the range [0.0, 1.0] as explained using random.random()
0.5479062961083409


In [4]:
import random
print("Generating a different random number within the range [0.0, 1.0] than the above as explained using random.random()")
print(random.random())

Generating a different random number within the range [0.0, 1.0] than the above as explained using random.random()
0.5813546012531416


As seen the number is within the range above and each time it is run the number randomly generates a new number.

##### NumPy

NumPy is an abbreviation of numerical python and it is "a library consisting of multidimensional array objects and a collection of routines for processing those arrays" (Tutorialspoint,2019) including mathematical, logical, shape manipulation, sorting, selecting, I/O, discrete Fourier transforms, basic linear algebra, basic statistical operations, random simulation and much more" (NumPy, 2017). It is "one of the fundamental packages used for scientific computing in Python" (Phuong and Czygan, 2015, p. 8). To import the NumPy package using Python we use the following (with some variation of np as long as it is consistent):

In [4]:
>>> import numpy as np

Primary Features of NumPy edited from SciPy's quickstart tutorial (2019):

(i). NumPy's primary object is the multidimensional array.

(ii). Its dimensions are called axes. 

(iii). The array class is called ndarray.

(iv). Some of the main commands of an ndarray object (called using command >>> ndarray."#") {# = words within " " hereafter};

"ndim" = no. of axes in array.

"shape" = dimensions in array in output form of (rows, columns).

"size" = no. of elements in array.

"dtype" = describing type of elements in array.

"reshape" = alter rows and columns output.

"arange()"= input array numerical amount in ().

Please see the following example of NumPy;

In [6]:
>>> import numpy as eg
>>> a = eg.arange(20).reshape(4, 5)
>>> a

array([[ 0,  1,  2,  3,  4],
       [ 5,  6,  7,  8,  9],
       [10, 11, 12, 13, 14],
       [15, 16, 17, 18, 19]])

In [7]:
>>> a.shape

(4, 5)

In [8]:
>>> a.ndim

2

In [9]:
>>> a.size

20

In [10]:
type(a)

numpy.ndarray

In [11]:
>>> a.dtype.name

'int32'

##### Numpy vs. Python

The primary differences between NumPy arrays and Python sequences as explained, and adapted from SciPy.org's page "What is NumPy?" are detailed hereafter to justify its uses and in an attempt to explain its purpose in regards this assignment;

~ NumPy arrays are fixed in size, where changing size replaces the array whereas Python lists can grow.

~ NumPy elements, for the most part, have to be the same data type meaning the same size in memory. When arrays of Python include NumPy objects this requirement changes allowing for alternate sized elements.

~ NumPy allows for more in depth mathematical and other operations on large quantities of data, with efficiency emphasised and less code needed than for Python's sequences, most of the time.

~ Like many of the primary packages, NumPy arrays are built into Python software packages and they convert the inputs to NumPy arrays and vice versa emphasising the importance of knowing both sequences for efficiency. 

~ NumPy supports an object-oriented approach and it allows a greater freedom to code in whichever paradigm or approach they see fit.

This is merely a basic introduction to NumPy and its importance but there are many more elements and layers to the NumPy package which can be further explained and investigated. Further information and detailed documentation is contained withing the websites; NumPy.org or SciPy.org and many others.

##### NumPy Functions, Modules and Objects

NumPy contains functions, modules and objects and the NumPy manual and quickstart tutorials can explain all the aspects within, but some of the primary NumPy controls are divided into the following as per the reference sheet on SciPy.org:

- Array Objects
- Universal Functions
- Routines
- Packaging
- NumPy C-API
- NumPy Internals
- NumPy and SWIG.

##### NumPy Routines

For this assignment we are concerned with NumPy routines which are further subdivided in relation to their functionality, and explanations of such can be found within user guides of NumPy package;

- Array Creation Routines
- Array Manipulation Routines
- Binary operations
- String operations
- C-Types Foreign Function Interface
- Datetime Support Functions
- Data type routines
- Optionally Scipy-accelerated routines
- Mathematical functions with automatic domain
- Floating point error handling
- Discrete Fourier transform
- Financial Functions
- Functional Programming
- NumPy-specific help functions
- Indexing routines
- Input and output
- Linear algebra
- Logic functions
- Masked array functions
- Mathematical functions
- Matrix library
- Miscellaneous routines
- Padding arrays
- Polynomials
- Random Sampling
- Set routines
- Sorting, searching and counting
- Statistics
- Test Support
- Windows functions.


# numpy.random

"An important part of any simulation is the ability to generate random numbers" (Phuong and Czygan, 2015, p.8) and as such we are concerning ourselves with the random sampling routine within NumPy as listed above. There are various versions but the root of the NumPy package is [>>> numpy.random] after importing NumPy as above and NumPy uses the most common pseudorandom number generator (PRNG) to return random sampling of numbers and it is called the Mersenne Twister. An example of random sampling is displayed below; 

In [41]:
>>> import numpy as np
>>> np.random.rand(5)

array([0.04722731, 0.94975842, 0.45156508, 0.21591379, 0.54388535])

The above example displays a random array of 5 numbers, as defined, between the range 0.00 and 1.00. 

The rand() function used with the root numpy.random returns random values in the given shape defined and is an example of one of the functions defined within the Simple Random Data subset. Within the routine numpy.random there are four subsets and they are the aforementioned Simple Random Date, Permutations, Distributions, and Random Generator and these will discussed hereafter.

# b. "Simple Random Data" and "Permutations" Functions.

##### Simple Random Data
The Simple Random Data subset of the random sampling routine in the NumPy package concerns  

If further information or help with numpy.random is required, one call input the following and a helpful inline manual appears to explain all functions within numpy.random.

- help(np.random)

The subset include the following functions which have a brief explanation (adapted from the SciPy.org and GeeksforGeeks reference manuals, with the direct quotation contained within the ReadMe) and an example of each function after the explanation;

- rand() - returns random values in the given shape defined. The shape can be expanded from the above example by increasing the values inputted. In the following example rand() produces uniformly distributed random numbers between 0 and 1 in an array of 2D shape.



In [42]:
>>> import numpy as np
>>> np.random.rand(2, 2)

array([[0.68701446, 0.57261188],
       [0.43969989, 0.70405878]])

- randn() -	creates an array of specified shape and fills it with random values using standard normal (Gaussian) distribution. 

In [18]:
array = np.random.randn(2, 2, 2)
print("Creating 3D array with 2 random values: \n", array)

Creating 3D array with 2 random values: 
 [[[-0.54562728 -0.59849528]
  [ 0.25404277  0.19053193]]

 [[ 0.0959733   1.31923107]
  [ 0.35971666 -1.02613858]]]


- randint() - returns an array of inputted shape with random integers in range (low, high). The elements can be inputted as (low, high, size, datatype).

In [17]:
np.random.randint(low = 0, high = 5, size = 2)

array([2, 3])

- random_integers() - returns random integers of type np.int in style (low, high, size) between low and high, inclusive. Except a high does not have to be defined and the low can be thus the high as shown.

In [16]:
np.random.random_integers(low = 10, size = (4, 4))

  """Entry point for launching an IPython kernel.


array([[ 4,  5,  6,  6],
       [ 2,  9,  5, 10],
       [ 8,  6,  9,  6],
       [ 3,  6,  6,  8]])

- random_sample(), ranf(), sample() and random() - Return random floats as a “continuous uniform” distribution over the inputted interval[0.0, 1.0].

In [15]:
np.random.random_sample(4)

array([0.9957291 , 0.22593236, 0.67928434, 0.67087228])

The random_sample() distribution and size can edited using the format:

- a * function((x,y)) b

Where (b - a) * random_sample() + a.

Thus the following is showing a 2 by 2 array in the range -10 and 10: 

In [31]:
20 * np.random.random_sample((2,2)) -10

array([[-3.71306334,  8.95450415],
       [-9.57387172, -0.08441035]])

- Further to show the expansion using inputs in range [0.00, 1.00] the following examples were created;

In [9]:
import numpy as np
eg = np.random.sample()
print("Single or 1D Random Sample is: ", eg)

Single or 1D Random Sample is:  0.03317157910870705


In [12]:
import numpy as np
eg = np.random.sample(2)
print("2D Random Sample is: ", eg)

2D Random Sample is:  [0.85418997 0.84045991]


In [13]:
import numpy as np
eg = np.random.sample(3)
print("3D Random Sample is: ", eg)

3D Random Sample is:  [0.81949387 0.90310735 0.59306734]


- choice() - Generates a random sample from a given 1-D array. The format is in the following with defaults outlined;

[a = input],

[size = size of array], 

[replace is the boolean expression type for replacing, i.e. true/false]

[p = probabilites in line with a, if none a uniform distribution is performed]

numpy.random.choice(a, size = None, replace = True, p = None)

In [37]:
np.random.choice(4,4) #similar to np.random.randint(0, 4, 4)

array([2, 3, 2, 1])

Example including list of probabilities;

In [42]:
np.random.choice(4, 4, p=[0.1, 0.2, 0.3, 0.4])

array([2, 2, 3, 3], dtype=int64)

Example not using the replacement, using replace = False;

In [44]:
np.random.choice(4, 4, replace = False, p=[0.1, 0.2, 0.3, 0.4])

array([2, 1, 0, 3])

- bytes() - Returns a string of random bytes the length of the input.

In [45]:
np.random.bytes(10)

b'\xf0\x1f\x99Q\xc4O\xf6\xf4*\xd9'

##### Permutations - https://docs.scipy.org/doc/numpy-1.14.0/reference/routines.random.html#simple-random-data

Permutations are ...

- shuffle() - Modifies an array by shuffling its outputs. We use the arange() to create an array with input 20.

In [53]:
shuffexample = np.arange(20)
shuffexample

array([ 0,  1,  2,  3,  4,  5,  6,  7,  8,  9, 10, 11, 12, 13, 14, 15, 16,
       17, 18, 19])

In [55]:
np.random.shuffle(shuffexample)
shuffexample

array([ 9, 13,  6, 10, 14,  1, 16, 19,  5,  3,  7,  8, 15, 12,  2,  0, 17,
       18, 11,  4])

This example is one a single dimensional array. In multidimensional cases only the first axis is shuffled where the order in subsequent arrays changes but contents remain the same. For the example we use reshape to make the array multi dimensional and as seen only the first axis changes.

In [61]:
shuffexample = np.arange(20).reshape(4,5)
np.random.shuffle(shuffexample)
shuffexample

array([[15, 16, 17, 18, 19],
       [10, 11, 12, 13, 14],
       [ 5,  6,  7,  8,  9],
       [ 0,  1,  2,  3,  4]])

- permutation() - Randomly permute a sequence, or return a permuted range.
https://docs.scipy.org/doc/numpy-1.14.0/reference/generated/numpy.random.permutation.html

In [62]:
np.random.permutations()

AttributeError: module 'numpy.random' has no attribute 'permutations'

# c. Distribution Functions Explanation and Examples

###### Distributions - https://docs.scipy.org/doc/numpy-1.14.0/reference/routines.random.html#simple-random-data

beta(a, b[, size])	Draw samples from a Beta distribution.

binomial(n, p[, size])	Draw samples from a binomial distribution.

chisquare(df[, size])	Draw samples from a chi-square distribution.

dirichlet(alpha[, size])	Draw samples from the Dirichlet distribution.

exponential([scale, size])	Draw samples from an exponential distribution.

f(dfnum, dfden[, size])	Draw samples from an F distribution.

gamma(shape[, scale, size])	Draw samples from a Gamma distribution.

geometric(p[, size])	Draw samples from the geometric distribution.

gumbel([loc, scale, size])	Draw samples from a Gumbel distribution.

hypergeometric(ngood, nbad, nsample[, size])	Draw samples from a Hypergeometric distribution.

laplace([loc, scale, size])	Draw samples from the Laplace or double exponential distribution with specified location (or mean) 
and scale (decay).

logistic([loc, scale, size])	Draw samples from a logistic distribution.

lognormal([mean, sigma, size])	Draw samples from a log-normal distribution.

logseries(p[, size])	Draw samples from a logarithmic series distribution.

multinomial(n, pvals[, size])	Draw samples from a multinomial distribution.

multivariate_normal(mean, cov[, size, ...)	Draw random samples from a multivariate normal distribution.
negative_binomial(n, p[, size])	Draw samples from a negative binomial distribution.

noncentral_chisquare(df, nonc[, size])	Draw samples from a noncentral chi-square distribution.

noncentral_f(dfnum, dfden, nonc[, size])	Draw samples from the noncentral F distribution.

normal([loc, scale, size])	Draw random samples from a normal (Gaussian) distribution.

pareto(a[, size])	Draw samples from a Pareto II or Lomax distribution with specified shape.

poisson([lam, size])	Draw samples from a Poisson distribution.

power(a[, size])	Draws samples in [0, 1] from a power distribution with positive exponent a - 1.

rayleigh([scale, size])	Draw samples from a Rayleigh distribution.

standard_cauchy([size])	Draw samples from a standard Cauchy distribution with mode = 0.

standard_exponential([size])	Draw samples from the standard exponential distribution.

standard_gamma(shape[, size])	Draw samples from a standard Gamma distribution.

standard_normal([size])	Draw samples from a standard Normal distribution (mean=0, stdev=1).

standard_t(df[, size])	Draw samples from a standard Student’s t distribution with df degrees of freedom.

triangular(left, mode, right[, size])	Draw samples from the triangular distribution over the interval [left, right].

uniform([low, high, size])	Draw samples from a uniform distribution.

vonmises(mu, kappa[, size])	Draw samples from a von Mises distribution.

wald(mean, scale[, size])	Draw samples from a Wald, or inverse Gaussian, distribution.

weibull(a[, size])	Draw samples from a Weibull distribution.

zipf(a[, size])	Draw samples from a Zipf distribution.

# d. Use of Seeds in Generating Pseudorandom Numbers

###Random generator https://docs.scipy.org/doc/numpy-1.14.0/reference/routines.random.html#simple-random-data

RandomState	Container for the Mersenne Twister pseudo-random number generator.

seed([seed])	Seed the generator.

get_state()	Return a tuple representing the internal state of the generator.

set_state(state)	Set the internal state of the generator from a tuple.

https://www.sharpsightlabs.com/blog/numpy-random-seed/

PRNG - 
Explanation - "BitGenerator to create sequences and a Generator to use those sequences to sample from different statistical distributions:
BitGenerators: Objects that generate random numbers. These are typically unsigned integer words filled with sequences of either 32 or 64 random bits.

Generators: Objects that transform sequences of random bits from a BitGenerator into sequences of numbers that follow a specific probability distribution (such as uniform, Normal or Binomial) within a specified interval.

https://www.geeksforgeeks.org/random-sampling-in-numpy-random_sample-function/amp/


DISTRIBUTIONS

np.random.normal(mean,stdev,matsize) produces Gaussian random numbers with
specifed mean and standard deviation

np.random.uniform(low,high,matsize) produces uniform random numbers between low
and high

https://machinelearningmastery.com/how-to-generate-random-numbers-in-python/

How to Generate Random Numbers in Python
Tutorial Overview
This tutorial is divided into 3 parts; they are:

Pseudorandom Number Generators
Random Numbers with Python
Random Numbers with NumPy