# Introduction to NumPy


The learning objectives of this section are:

* Understand advantages of vectorized mathematics using Numpy (over standard python ways)
* Create NumPy arrays
    * Convert lists and tuples to numpy arrays 
    * Create (initialise) arrays
* Inspect the structure and content of arrays
* Subset, slice, index and iterate through arrays
* Compare computation times in NumPy and standard Python lists

### NumPy Basics

NumPy is a library written for scientific computing and data analysis. It stands for numerical python.

The most basic object in NumPy is the ```ndarray```, or simply an ```array```, which is an **n-dimensional, homogenous** array. By homogenous, we mean that all the elements in a numpy array have to be of the **same data type**, which is commonly numeric (float or integer). 

Let's see some examples of arrays.

In [None]:
# Import the numpy library
# np is simply an alias, you may use any other alias, though np is quite standard
import numpy as np

In [None]:
# Creating a 1-D array using a list
array_1d = np.array([2, 4, 5, 6, 7, 9])
print(array_1d)
print(type(array_1d))

In [None]:
# Creating a 2-D array using two lists
array_2d = np.array([[2, 3, 4], [5, 8, 7]])
print(array_2d)

In NumPy, dimensions are called **axes**. In the 2-d array above, there are two axes, each having 2 and 3 elements each.  

### Advantages of NumPy 

What is the use of arrays over lists, specifically for data analysis? Putting crudely, it is **speed and convenience**:<br>
1. You can write **vectorised** code on numpy arrays, not on lists, which is more **convenient**. 
2. Numpy is **much faster** than the standard python ways to do computations.

Let's see an example of convenience, we'll see one later for speed. 

Say you have two lists of numbers, and want to calculate the element-wise product. The standard python list way would need you to map a lambda function (or worse - write a ```for``` loop), whereas with NumPy, you simply multiply the arrays.

In [None]:
list_1 = [3, 6, 7, 5]
list_2 = [4, 5, 1, 7]

# the list way to do it: map a product function to the two lists
product_list = list(map(lambda x, y: x*y, list_1, list_2))
print(product_list)


In [None]:
# The numpy array way to do it: simply multiply the two arrays
array_1 = np.array(list_1)
array_2 = np.array(list_2)

array_3 = array_1*array_2
print(array_3)
print(type(array_3))

As you can see, the numpy way is clearly more convenient than the list way. 

### Creating NumPy Arrays 

There are multiple ways to create numpy arrays, the most commmon ones being:
* Convert lists or tuples to arrays using ```np.array()```, as done above
* Initialise arrays of fixed size (when the size is known) 


In [None]:
# Convert lists or tuples to arrays using np.array()
# Note that np.array(2, 5, 6, 7) will throw an error - you need to pass a list or a tuple
array_from_list = np.array([2, 5, 6, 7]) 
array_from_tuple = np.array((4, 5, 8, 9))

print(array_from_list)
print(array_from_tuple)

The other common way is to initialise arrays. You do this when you know the size of the array beforehand.

The following ways are commonly used:
* ```np.ones()```: Create array of 1s
* ```np.zeros()```: Create array of 0s
* ```np.random.random```: Create array of random numbers
* ```np.arange()```: Create array with increments of a fixed step size
* ```np.linspace()```: Create array of fixed length

In [None]:
# Tip: Use help to see the syntax when required
help(np.ones)

In [None]:
# Creating a 5 x 3 array of ones
np.ones((5, 3))

In [None]:
# Notice that, by default, numpy creates data type = float64
# Can provide dtype explicitly using dtype
np.ones((5, 3), dtype = np.int)

In [None]:
# Creating array of zeros
np.zeros(4, dtype = np.int)

In [None]:
# Array of random numbers
np.random.random([3, 4])

In [None]:
# np.arange()
# Generating a sequence of numbers from 10 to 95 with a gap of 5
# np.arange() is the numpy equivalent of range()
# Notice that 10 is included, 100 is not, as in standard python lists

numbers = np.arange(10, 100, 5)
print(numbers)

In [None]:
# np.linspace()
# Sometimes, you know the length of the array, not the step size
# E.g. create array of length 25 between 15 and 18
np.linspace(15, 18, 25)

### Inspect the Structure and Content of Arrays

It is helpful to inspect the structure of numpy arrays, especially while working with large arrays with poor readability. Some attributes of numpy arrays are:
* ```shape```: Shape of array (n x m)
* ```dtype```: data type (int, float etc.)
* ```ndim```: Number of dimensions (or axes)
* ```itemsize```: Memory used by each array elememnt in bytes


Let's say you are working with a moderately large array of size 1000 x 300 (which is actually small compared to whta you get in most real problems). First, you would want to wrap your head around the basic shape and size of the array. 

In [None]:
# Initialising a random 1000 x 300 array
rand_array = np.random.random((1000, 300))

# Print the first row
print(rand_array[1, ])

In [None]:
# Inspecting shape, dtype, ndim and itemsize
print("Shape: {}".format(rand_array.shape))
print("dtype: {}".format(rand_array.dtype))
print("Dimensions: {}".format(rand_array.ndim))
print("Item size: {}".format(rand_array.itemsize))

Reading 3-D arrays is not very obvious, because we can only print maximum two dimensions on paper, and thus they are printed according to a specific convention. Printing higher dimensional arrays follows the following conventions:
* The last axis is printed from left to right
* The second-to-last axis is printed from top to bottom
* The other axes are also printed top-to-bottom, with each slice separated by another using an empty line 

Let's see some examples.

In [97]:
# Creating a 3-D array
# reshape() simply reshapes a 1-D array 
array_3d = np.arange(24).reshape(2, 3, 4)
print(array_3d)

[[[ 0  1  2  3]
  [ 4  5  6  7]
  [ 8  9 10 11]]

 [[12 13 14 15]
  [16 17 18 19]
  [20 21 22 23]]]


* The last axis has 4 elements, and is printed from left to right.
* The second last has 3, and is printed top to bottom
* The other axis has 2, and is printed in the two separated blocks

### Subset, Slice, Index and Iterate through Arrays

For **one-dimensional arrays**, indexing, slicing etc. is **similar to python lists** - indexing starts at 0.

In [None]:
# Indexing and slicing one dimensional arrays
array_1d = np.arange(10)
print(array_1d)

In [None]:
# Third element
print(array_1d[2])

# Specific elements
# Notice that array[2, 5, 6] will throw an error, you need to provide the indices as a list
print(array_1d[[2, 5, 6]])

# Slice third element onwards
print(array_1d[2:])

# Slice first three elements
print(array_1d[:3])

# Slice third to seventh elements
print(array_1d[2:7])

# Subset starting 0 at increment of 2 
print(array_1d[0::2])

In [None]:
# Iterations are also similar to lists
for i in array_1d:
    print(i**2)

**Multidimensional arrays** are indexed using as many indices as the number of dimensions or axes. For instance, to index a 2-D array, you need two indices - ```array[x, y]```. 

Each axes has an index starting at 0. The following figure shows the axes and their indices for a 2-D array.

<img src="2_d_array.png" style="width: 350px; height: 300px">


In [None]:
# Creating a 2-D array
array_2d = np.array([[2, 5, 7, 5], [4, 6, 8, 10], [10, 12, 15, 19]])
print(array_2d)

In [None]:
# Third row second column
print(array_2d[2, 1])

In [None]:
# Slicing the second row, and all columns
# Notice that the resultant is itself a 1-D array
print(array_2d[1, :])
print(type(array_2d[1, :]))

In [None]:
# Slicing all rows and the third column
print(array_2d[:, 2])

In [None]:
# Slicing all rows and the first three columns
print(array_2d[:, :3])

When **iterating on 2-D arrays**, the first axis is used (which is row, the second axis is column). 

In [98]:
# Iterating over 2-D arrays
for row in array_2d:
    print(row)

[2 5 7 5]
[ 4  6  8 10]
[10 12 15 19]


In [100]:
# Iterating over 3-D arrays: Done with respect to the first axis
array_3d = np.arange(24).reshape(2, 3, 4)
print(array_3d)

[[[ 0  1  2  3]
  [ 4  5  6  7]
  [ 8  9 10 11]]

 [[12 13 14 15]
  [16 17 18 19]
  [20 21 22 23]]]


In [103]:
# Prints the two blocks
for row in array_3d:
    print(row)

[[ 0  1  2  3]
 [ 4  5  6  7]
 [ 8  9 10 11]]
[[12 13 14 15]
 [16 17 18 19]
 [20 21 22 23]]


### Compare Computation Times in NumPy and Standard Python Lists

We mentioned that the key advantages of numpy are convenience and speed of computation. 

You'll often work with extremely large datasets, and thus it is important point for you to understand how much computation time (and memory) you can save using numpy, compared to standard python lists, and base you choice of program design based on that.   

Let's compare the computation times of arrays and lists for a simple task of calculating the element-wise product of numbers. 

In [109]:
## Comparing time taken for computation
list_1 = [i for i in range(1000000)]
list_2 = [j**2 for j in range(1000000)]

# list multiplication
import time

# store start time, time after computation, and take the difference
t0 = time.time()
product_list = list(map(lambda x, y: x*y, list_1, list_2))
t1 = time.time()
list_time = t1 - t0 
print(t1-t0)


# numpy array 
array_1 = np.array(list_1)
array_2 = np.array(list_2)

t0 = time.time()
array_3 = array_1*array_2
t1 = time.time()
numpy_time = t1 - t0

print(t1-t0)

print("The ratio of time taken is {}".format(list_time/numpy_time))
print(array_3.dtype)

0.41635656356811523
0.010005950927734375
The ratio of time taken is 41.61089401448723
float64


In this case, numpy is **an order of magnitude faster** than lists. 


### Applying Operations (Functions) on NumPy arrays

You can apply almost all common operations on NumPy arrays, such as mean, median, sum, etc.
There are two ways to apply functions on arrays, as shown in these examples:
1. numpy_array.mean()
2. np.median(numpy_array)

In [None]:
# mean of an array
print(array_1.mean())

# max value in an array
print(array_1.max())

# median
print(np.median(array_1))


###  Applying your own functions to numpy arrays

You can apply your own functions on np arrays, for instance, say you want to apply the operation x/(x+1) <br>on each element of an array.

In [None]:
# a numpy array
some_array = np.arange(100, 150)
new_array = map(lambda x: x/(x+1), some_array)
print(list(new_array))
print(type(new_array))


In [None]:
# Say you want to round this off to two decimals

# The standard round() function
print(list(new_array))
new_array_rounded = map(lambda x: round(x, 2), list(new_array))
print(list(new_array_rounded))