<center> <h1> Programming for Data Analytics 2021 NumPy Assignment </h1> </center>


<div align="center"> <b> Student Name: Kate McGrath </b> </div>
<div align="center"> <b> Student Number: G00398908 </b>  </div>
<div align="center"> <b> Submission Date: 11/11/2021 </b> </div>

<center> <h1> Part One: Overview of NumPy Random Package </h1> </center>

### Introduction

NumPy, or Numerical Python, is one of the most widely used scientific libraries in Python programming [1]. 

An open-source library, created in 2005 by Travis Oliphant [2], it has become the de facto standard for working with numerical data in Python, and is used by everyone from beginners to experienced researchers at the cutting edge of scientific and commercial R&D [3]. Additionally, several other Python analytical libraries have been built on top of NumPy, including Pandas, Scikit-learn and Matplotlib [1]. 

The NumPy random package is a module within NumPy that allows for the generation of pseudo-random numbers. Random numbers are those where the sequence of generation cannot be predicted logically. Random numbers hComputers are deterministic, meaning that they are specifically programmed to eliminate randomness in outputs by following rules and adhering to algorithms during computation. Hence, computers are incapable of generating truly random numbers. The main purpose of the NumPy random module is to overcome this issue by generating pseudorandom numbers of various distributions; these are sequences of numbers that have all the statistial properties required from random numbers but are generated deterministically. 

This notebook will discuss the overall purpose of NumPy and the random package, and provide an overview of some of the features and distributions available within the random module. 

### Getting Started with NumPy
The only prerequisite to installing NumPy is Python itself. For convenience, the Anaconda distribution is recommended for beginners as it comes with NumPy, Python and other statistical Python packages pre-installed. 

To install NumPy on its own, the following command can be used:


In [None]:
pip install numpy

In order to use NumPy Random, the library should be imported using the following command:

In [None]:
import numpy as np
from numpy import random

## Overview of Numpy Arrays ##

NumPy's core object is the multidimensional array (ndarray), which is essentially a fast and flexible container, built to facilitate batch numerical operations on blocks of data. 

### NumPy Arrays vs Python Lists

The key advantage of using NumPy arrays for numerical computations rather than Python lists is that the former is designed for efficiency on large arrays of data. NumPy arrays are significantly faster than Python lists for the following reasons:

1. NumPy arrays are homogeneous, i.e. consisting of a single data type and allocated to a continuous block of memory (continuous). To access the next element stored in an array, the programme simply needs to move to the next memory address. In contrast, Python lists are heterogeneous (not confined to a single data type) and stored in non-consecutive memory locations; both of these factors contribute to processing overhead.
2. The NumPy package breaks down a task into multiple fragments and processes them in parallel. 
3. NumPy incorporates elements of the C, C++ and Fortran programming languages in Python. These are low level languages and therefore have a reduced execution time compared to Python.

https://www.geeksforgeeks.org/why-numpy-is-faster-in-python/

In addition to the speed advantages, NumPy arrays allow for operations on singular elements within the array without requiring loops, and consume less memory than their Python list counterparts. 

The below code illustrates how lists and arrays differ in terms of speed. Two Python lists and Numpy arrays of equal size are initialized. First the elements in list 1 and list 2 are mapped to their equivalents and multiplied, then this process is repeated for the Python lists. Then, the time taken to perform these operations is compared. When the code is run, we can see a significant difference in processing speeds between the two actions. Source for code - https://www.geeksforgeeks.org/why-numpy-is-faster-in-python/.

In [6]:
import numpy as np
import time
from sys import getsizeof

size = 1000000

list1 = range(size)
list2 = range(size)

#Need to get current time to second to subtract from time taken to complete operation
startTime = time.time()

newlist = []

#This code is zipping list 1 and 2 together - mapping each list1 value to its corresponding list2 value and multiplying them
#Then adding them to new list
for a,b in zip(list1,list2):
    newlist.append(a*b)

TimeNow = time.time()
TimeTaken = TimeNow - startTime
print("Time taken to complete in seconds: {}".format(TimeTaken))

#Declaring Arrays
array1 = np.arange(size)  
array2 = np.arange(size)

startTimeNumpy = time.time()

NewArray = array1* array2

TimeNowNumpy = time.time()
TimeTakenNumpy = TimeNowNumpy - startTimeNumpy
print("Time taken to complete in seconds: {}".format(TimeTakenNumpy))
print("Time difference in seconds: {}".format(TimeTaken - TimeTakenNumpy))

Time taken to complete in seconds: 0.41204047203063965
Time taken to complete in seconds: 0.003030538558959961
Time difference in seconds: 0.4090099334716797


Furthermore, the Python list consumes more than twice the amount of memory used by the Numpy array.

In [5]:
#Checking difference in memory consumption between numpy array and python list
totalMemoryList = getsizeof(newlist)
totalMemoryNumpy = getsizeof(NewArray)

print("Total memory consumed by Python List: {} bytes".format(totalMemoryList))
print("Total memory consumed by Numpy Array: {} bytes".format(totalMemoryNumpy))
print("Memory difference in bytes: {}".format(totalMemoryList - totalMemoryNumpy))

Total memory consumed by Python List: 8697456 bytes
Total memory consumed by Numpy Array: 4000104 bytes
Memory difference in bytes: 4697352


### Numpy Array Attributes

NumPy arrays are defined by the following key attributes:

1. **Dimensions** :NumPy arrays can be considered as a grid of values, 	with a defined number of rows and columns.  In the NumPy package, rows and columns are referred to as dimensions. A vector consists of a single dimension, a matrix contains two dimensions, and a tensor comprises three or more dimensions.  Each dimension has a corresponding axis, starting at index [0]. Axes are used to locate and operate on elements within a specific dimension. Below is an illustration of one, two and three-dimensional arrays and their axes/elements. 

<p align="center">
    <img src="numpy-array-xyz-axis.png"
     alt="Markdown Monster icon"
     style="float: left; margin-right: 10px;"/> 
</p>


Run the below code to see examples of one, two and three-dimensional arrays.

In [9]:
oneDArray = np.array ([1,2,3])
print(oneDArray)

twoDArray = np.array ([(1,2), (3,4)])
print(twoDArray)

threeDArray = np.array ([(1,2), (3,4), (5,6)])
print(threeDArray)

[1 2 3]
[[1 2]
 [3 4]]
[[1 2]
 [3 4]
 [5 6]]


## References
1. Analytics Vidhya. 2021. The Ultimate NumPy Tutorial for Data Science Beginners. [online] Available at: <https://www.analyticsvidhya.com/blog/2020/04/the-ultimate-numpy-tutorial-for-data-science-beginners/> [Accessed 16 October 2021].

2. W3schools.com. 2021. Introduction to NumPy. [online] Available at: <https://www.w3schools.com/python/numpy/numpy_intro.asp> [Accessed 16 October 2021].

3. Numpy.org. 2021. NumPy: the absolute basics for beginners — NumPy v1.21 Manual. [online] Available at: <https://numpy.org/doc/stable/user/absolute_beginners.html> [Accessed 16 October 2021].

4. Numpy.org. 2021. NumPy quickstart — NumPy v1.21 Manual. [online] Available at: <https://numpy.org/doc/stable/user/quickstart.html> [Accessed 16 October 2021].

5. Medium. 2021. NumPy for Dummies. [online] Available at: <https://medium.datadriveninvestor.com/numpy-for-dummies-3dbd5f946731> [Accessed 16 October 2021].

6. Numpy.org. 2021. Installing Numpy. [online] Available at: <https://numpy.org/install/> [Accessed 16 October 2021].

7. DataCamp. 2021. [online] Available at: <https://www.datacamp.com/community/tutorials/python-numpy-tutorial#a> [Accessed 16 October 2021].