# Short NumPy Tutorial

* NumPy is a Python package for numerical computation. 
* The core data type is the array.
* NumPy functions operate on arrays.

NumPy (np) supports creating arrays with different characteristics
* **np.array([list of numbers])** - array from a list
* **np.arange(start, stop, step)** - array whose entries are from start to stop and each entry is incremented by step (compared to the previous entry)
* **np.zeros((rows, cols))** - array of zeros of shape (rows, cols)
* **np.ones((rows, cols))** - array of zeros of shape (rows, cols)
* **np.random.randint(start, stop, num)** -- array of num entries randomly chosen from start to stop
* **np.random.rand(num)** array of random num float entries from 0 to 1

as well as a wide variety of arithmetic operations on arrays
* **np.add()** - Addition (e.g., 1 + 1 = 2)
* **np.subtract()** - Subtraction (e.g., 3 - 2 = 1)
* **np.negative()** - Unary negation (e.g., -2)
* **np.multiply()** -	Multiplication (e.g., 2 * 3 = 6)
* **np.divide()**  -	Division (e.g., 3 / 2 = 1.5)
* **np.floor_divide()** -	Floor division (e.g., 3 // 2 = 1)
* **np.power()** - 	Exponentiation (e.g., 2 ** 3 = 8)
* **np.mod()** - 	Modulus/remainder (e.g., 9 % 4 = 1)

and standard trigonometry functions
* **np.sin()** - sin
* **np.cos()** - cosine
* **np.tan()** - tangent
* **np.asin()** - arc sine
* **np.acos()** - arc cosine
* **np.atan()** - arc tangent
* **np.hypot()** - given the sides of a triangle, returns hypotenuse

Note: this is a small fraction of the functions provided by NumPy. For more information see:
* https://numpy.org/doc/stable/reference/index.html
* https://www.w3schools.com/python/numpy_intro.asp

#### Note: You can also write functions in Jupyter Notebooks

In [1]:
import numpy as np
import pandas as pd

In [2]:
def hello(name):
  print ("Hello ", name)

hello("DATA")

Hello  DATA


#### Installing NumPy

Option A: **Using Anaconda**

In [3]:
#type in a terminal
#conda install numpy

Option B: **Using pip**

In [4]:
#type in a terminal
#pip install numpy

#### Using NumPy


In [5]:
import numpy as np
print(np.__version__)


2.3.5


In [6]:
#importing numpy
import numpy as np

In [7]:
#list with % of low emission Licensed Taxis operating in York per year from 2011 to 2019
yorkTaxis = [0.00, 1.00, 3.00, 6.00, 8.00, 13.00, 16.10, 17.60, 19.90]

#list with % of low emission Licensed Taxis operating in Leeds per year from 2011 to 2019
leedsTaxis = [0.50, 1.20, 2.30, 5.00, 7.80, 14.30, 17.10, 17.30, 20.90]


In [8]:
#convert lists into numpy array
york  = np.array(yorkTaxis)
leeds = np.array(leedsTaxis)

In [9]:
#print the shape of the arrays
print(york.shape)
print(leeds.shape)

(9,)
(9,)


In [10]:
#print the data types of the arrays
print(york.dtype)
print(leeds.dtype)

float64
float64


In [11]:
#get the values for York and Leeds from 2015 until 2019
#note: index 4 corresponds to 2015
print(york[4:])
print(leeds[4:])

[ 8.  13.  16.1 17.6 19.9]
[ 7.8 14.3 17.1 17.3 20.9]


In [12]:
#find the min, max, and mean of each city
yorkMin  = np.min(york)
yorkMax  = np.max(york)
yorkMean = np.mean(york)
print(yorkMin, yorkMax, yorkMean)

leedsMin  = np.min(leeds)
leedsMax  = np.max(leeds)
leedsMean = np.mean(leeds)
print(leedsMin, leedsMax, leedsMean)

0.0 19.9 9.399999999999999
0.5 20.9 9.600000000000001


In [13]:
#concatenate the two arrays
allData = np.concatenate((york, leeds))
print(allData)

[ 0.   1.   3.   6.   8.  13.  16.1 17.6 19.9  0.5  1.2  2.3  5.   7.8
 14.3 17.1 17.3 20.9]


In [14]:
#sort the concatenated array
sortedAllData = np.sort(allData)
print(sortedAllData)

[ 0.   0.5  1.   1.2  2.3  3.   5.   6.   7.8  8.  13.  14.3 16.1 17.1
 17.3 17.6 19.9 20.9]


In [15]:
#reshape the concatenated array into a 2D array with dimensions 2x9
allData2D = allData.reshape(2, 9)
print(allData2D)

[[ 0.   1.   3.   6.   8.  13.  16.1 17.6 19.9]
 [ 0.5  1.2  2.3  5.   7.8 14.3 17.1 17.3 20.9]]


In [None]:
#transform the concatenated array into a 2D array where each column holds values for each city
allData2D = allData.reshape(2, 9).T
print(allData2D)

## A few tasks on using NumPy 

**T1) Find the difference of low emission licensed taxis between York and Leeds each year**

In [17]:
#Write your code here
difference = np.subtract(leedsTaxis,yorkTaxis) 
print("the difference of low emission licensed taxis is :",difference)

the difference of low emission licensed taxis is : [ 0.5  0.2 -0.7 -1.  -0.2  1.3  1.  -0.3  1. ]


**T2) Complete the function minMaxDiff which returns the minimum and maximum differences in % of low emission licensed taxis between York and Leeds.**

In [None]:
def minMaxDiff(diff):
    #Write your code here
    min_difference = np.min(difference)
    max_difference = np.max(difference)
    return (max_difference,min_difference)
diff = difference
minMaxDiff(diff)

**T3) Write a function that counts how many years York had more taxis with lower emission than Leeds**

Hint: you may want to look at the NumPy functions _count_nonzero_ or _where_

In [None]:
def countEmissions(a):
    #Write your code here
    difference_array = np.array([difference])
    
    return np.count_nonzero(difference_array < 0)

countEmissions(diff)

**T4) Using the data for York taxis, create a 2D array in which each entry from the york array is associated with its year, e.g., [[2011, 2012, ...][0.00, 1.00, ...]]**

Hints: 
* use the NumPy commands for array creation from above
* check the vstack Numpy function


In [18]:
#Write your code here
import numpy as np
Year = np.arange(2011,2020)
York_array = np.array(yorkTaxis)
York_2D = np.vstack((Year,York_array))
print("is",York_2D)

is [[2.011e+03 2.012e+03 2.013e+03 2.014e+03 2.015e+03 2.016e+03 2.017e+03
  2.018e+03 2.019e+03]
 [0.000e+00 1.000e+00 3.000e+00 6.000e+00 8.000e+00 1.300e+01 1.610e+01
  1.760e+01 1.990e+01]]


## Enabling pretty printing

In [None]:
np.set_printoptions(suppress=True)
yorkData

**Notes** 

* NumPy supports (primarily) homogeneous arrays (i.e., arrays of the same datatype). This is why the years are in the following format: $2.011e+03 = 2.011 x 10^3 = 2011$. 
* NumPy arrays can support heteregenous datatypes but this is not ideal. Hence why in the following practicals we will switch to the Pandas library that has native support for manipulating datasets with heterogenous datatypes.

#### Read Chapter 2 - Introduction to NumPy from the Python Data Science Handbook
https://jakevdp.github.io/PythonDataScienceHandbook/02.00-introduction-to-numpy.html