# <font color='blue'>Programming for Data Analysis - Assignment</font> <img src="https://i0.wp.com/www.simplifiedpython.net/wp-content/uploads/2018/11/Python-NumPy-14.png?w=672&ssl=1" width="350" height="350" align="right"/>

- **Course** Higher Diploma in Data Analytics, GMIT, Ireland 
- **Lecturer** Brian McGinley
- **Author** Mark Cotter
- **Email** g00376335@gmit.ie
- **Dates** October 2019 to November 2019

### <font color='blue'>Introduction</font>
This document is my jupyter notebook file for the GMIT module 'Programming for Data Analysis' assignment. The assignment includes a review of the NumPy.Random(**1**) function included in the Python(**2**) library module NumPy(**3**). This jupyter notebook uses the Python programming language to illustrate routines included in this function.

### <font color='blue'>Random Numbers and Seeds</font>
Modern computers use randomly generated numbers for a multitude of purposes. Two important properties of a sequence of random numbers(**4**) are independence and uniformity.
- Each number must be statistically independent of the previous number i.e. the probability of one number occurring is not dependent on another number
- The numbers must have a equal/uniform probability of occurring. i.e. a sample of the numbers should be well distributed in the range

Computers can not really generate random numbers(**4**) on their own. They are limited by their programming and follow rules based on various algorithms contained in the programming. Programs can be used to generate what appears to be a random sequence of numbers that are called **pseudo-random** numbers, but these are in fact a pattern with a very long repeat period. No matter how complex an algorithm, if you know what algorithm was used and you know the starting point of the computation, the results can be predicted and repeated. This starting point is referred to as the **seed** number of the algorithm. Seeds can be either a fixed or randomly selected numbers themselves such as the current time in milliseconds.

Sharing of a **pseudo-random seed** is often used for syncing security measures, where the seed number is commonly known by two or more remote pieces of equipment and is unknown to an outside observer. If the starting point of the algorithm is unknown, the outcome of an algorithm pattern can not be easily replicated.

Another variation of computer based random numbers generation is called **'true random number generators'**, where the randomness of the number generated is based on a physical source connected to the computer such as background noise or the unpredictable decay of a radioactive source. This type of generator does not require a **seed**.

Where large datasets have been gathered, sampling random(**4**) selections of the data can be used to reduce the quantity of data, thereby reducing the processing time for data analysis. Random numbers are also useful in data analysis for simulating data. Simulation(**5**) is often used to verify solutions for mathematical models of natural scientific systems. This allows analysts to make inferences and predictions from the models without having to undertake real experiments. Such experiments are often referred to as _Monte Carlo Methods_ (**6**). **'Pseudo-random number generators'** are more widely used for these experiments as the processes can be repeated after changing some variables and produce comparable results if the **seed** remains constant.

### <font color='blue'>Description and Purpose of NumPy.Random</font>

#### <font color='blue'>NumPy Python Library</font>
NumPy(**3**) is a library module for Python. NumPy is an abbreviation of 'Numerical Python' or ‘Numeric Python’ and is generally used for scientific computing and number crunching. The core of NumPy are homogeneous multidimensional array objects referred to as _'ndarray'_. These objects can only contain items of the same type and size, are defined by their shape and are usually of fixed size. NumPy is often used with other Python libraries such as Pandas to overcome some of these limitations.
NumPy utilises elements of compiled C and C++ programming code in the background so that operations undertaken on NumPy objects are very efficient, which is very desirable when dealing with large quantities of data.

The NumPy library, its related functions such as NumPy.Random(**1**) and many other libraries are not initially loaded when Python is run and have to be imported into a live Python session.

In [None]:
# Import NumPy library
import numpy as np
# Import seaborn library
import seaborn as sns

#### <font color='blue'>NumPy.Random Purpose</font>
Python's NumPy library contains a function called NumPy.Random(**1**). This function includes various routines for **pseudo random numbers generation**. The random sequence of numbered generated from NumPy.Random routines are generally outputted to NumPy ndarray objects of the required value type, size and shape. The random number generation routines are divided into four main categories of as follows:
- Simple random data
- Permutations
- Distribution
- Random generator

### <font color='blue'>Simple random data</font>
Routines included in this NumPy.Random category include rand(), randn(), randint(), random(), choice() and bytes(). The following code explores a number of these routines.

#### <font color='blue'>NumPy.Random.rand()</font>
NumPy.Random(**1**) routine rand() when called can take a variable number of arguments or dimensions (d0,d1,...,dn).  
When run without an argument <font color='blue'>[1]</font>, it returns a random floating point number in the range 0.0 to 1.0. 

When the rand() routine is run with one or more integer number arguments <font color='blue'>[2]</font>, it returns a numpy ndarray containing random floating point numbers. The shape and size (number of dimensions) of the ndarray created depends on number of arguments and their numerical values. The values of the floating point numbers generated are uniformly distributed between 0.0 and 1.0.
The NumPy.Random(**1**) webpage for rand() notes that the range is **[0, 1)** meaning that the value range is **(0 =< value < 1)** so 0.0 is a possible number, but 1.0 is not a possible number.

Zero value, negative values and floating point values are not acceptable argument values for the rand() routine <font color='blue'>[3]</font>. These generate AttributeError, ValueError and TypeError respectively.

<font color='blue'>[1] _No arguments_ </font>

In [None]:
# Display type
print(type(np.random.rand()))
# rand() routine without an argument
np.random.rand()

<font color='blue'>[2] _One or more integer arguments_ </font>

In [None]:
# Display variable type
print(type(np.random.rand(2)))
# rand() routine with one integer argument
np.random.rand(2)

In [None]:
# rand() routine with two integer arguments
np.random.rand(2,3)

In [None]:
# rand() routine with three integer arguments
np.random.rand(2,3,2)

In [None]:
# Create test 1-d arrary with 10,000 values
test = np.random.rand(100000)
# Print Min and Max value in 'test' to 10 decimal places
print("Limits of values in 'test' Min: {:,.10f} and Max: {:,.10f}".format(test.min(), test.max()))
# Plot distribution of values in test
# Set seaborn plot size. Code adapted from https://stackoverflow.com/a/47955814
sns.set(rc={'figure.figsize':(10,5)})
# Code adapted from https://seaborn.pydata.org/tutorial/distributions.html
sns.distplot(test);

<font color='blue'>[3] _Argument errors_ </font>

In [None]:
# rand() routine with a zero value argument
#np.rand(0) # generates AttributeError

In [None]:
# rand() routine with negative value argument
#np.random.rand(-2) # generates ValueError

In [None]:
# rand() routine with one floating point number argument
#np.random.rand(2.2) # generates TypeError

#### <font color='blue'>NumPy.Random.randn()</font>
NumPy.Random(**1**) routine randn() when called can take a variable number of arguments or dimensions (d0,d1,...,dn).  
When run without an argument <font color='blue'>[1]</font>, it returns a random floating point number that can be positive or negative.

When the randn() routine is run with one or more integer number arguments <font color='blue'>[2]</font>, it returns a numpy ndarray containing random positive or negative floating point numbers. The shape and size (number of dimensions) of the ndarray created depends on number of arguments and their numerical values. The values of the floating point numbers generated are **XXXX**.

<font color='blue'>[1] _No arguments_ </font>

In [None]:
# Display type
print(type(np.random.randn()))
# rand() routine without an argument
np.random.randn()

<font color='blue'>[2] _One or more arguments_ </font>

In [None]:
# Display type
print(type(np.random.randn(2)))
# randn() routine without an argument
np.random.randn(2)

In [None]:
# randn() routine with two integer arguments
np.random.randn(2,3)

In [None]:
# randn() routine with three integer arguments
np.random.randn(2,3,2)

#### <font color='blue'>NumPy.Random.randint()</font>
NumPy.Random(**1**) routine randint() when called can take a variable number of arguments (low, high=None, size=None, dtype='l'). randint() requires at least one argument <font color='blue'>[1]</font>. Using no argument generates a TypeError.

When one argument is used <font color='blue'>[1]</font>, randint() generates



<font color='blue'>[1] _No arguments_ </font>

In [None]:
# rand() routine without an argument
#np.random.randint() # Generates a TypeError and notes the variable takes at least 1 positional argument


<font color='blue'>[2] _One or more integer arguments_ </font>

In [None]:
# Display variable type
print(type(np.random.randint(2)))
# rand() routine with one integer argument
np.random.randint(10)

In [None]:
# rand() routine with one integer argument
np.random.randint(10,20)

### <font color='blue'>Permutations</font>


### <font color='blue'>Distributions</font>


### <font color='blue'>Research References</font>

**(1)** _NumPy.Random_
- https://docs.scipy.org/doc/numpy-1.16.0/reference/routines.random.html
- https://numpy.org/doc/1.17/reference/random/index.html

**(2)** _Python Programming Language_
- https://www.python.org/

**(3)** _NumPy_
- https://docs.scipy.org/doc/numpy-1.16.0/
- https://www.quora.com/What-is-NumPy
- https://docs.scipy.org/doc/numpy-1.16.0/user/whatisnumpy.html
- https://numpy.org/devdocs/reference/arrays.ndarray.html
- https://cloudxlab.com/blog/numpy-pandas-introduction/

**(4)** _Random Numbers_
- https://analyticstraining.com/random-numbers-applications/
- https://engineering.mit.edu/engage/ask-an-engineer/can-a-computer-generate-a-truly-random-number/
- https://en.wikipedia.org/wiki/Random_seed
- https://www.random.org/randomness/
- https://www.eg.bucknell.edu/~xmeng/Course/CS6337/Note/master/node37.html
- https://en.wikipedia.org/wiki/Independence_(probability_theory)
- https://www.ques10.com/p/3213/explain-the-properties-of-random-numbers/
- https://machinelearningmastery.com/how-to-generate-random-numbers-in-python/

**(5)** _Computer_Simulation_
- https://en.wikipedia.org/wiki/Computer_simulation
- https://www.britannica.com/technology/computer-simulation

**(6)** _Monte Carlo Methods_
- https://towardsdatascience.com/an-overview-of-monte-carlo-methods-675384eb1694

**(7)** _Marked Down Formatting and Image Sources_
- https://stackoverflow.com/questions/46439874/display-image-jupyter-notebook-aligned-centre
- https://stackoverflow.com/questions/19746350/how-does-one-change-color-in-markdown-cells-ipython-jupyter-notebook
- https://i0.wp.com/www.simplifiedpython.net/wp-content/uploads/2018/11/Python-NumPy-14.png?w=672&ssl=1
