### NumPy Basics




In [1]:
# Import the numpy library
# np is simply an alias, you may use any other alias, though np is quite standard
import numpy as np

In [2]:
# Creating a 1-D array using a list
# np.array() takes in a list or a tuple as argument, and converts into an array
array_1d = np.array([2, 4, 5, 6, 7, 9])
array_1dt = np.array((2,3,4,5,6,7,8)) 
print(array_1d)
print(type(array_1d))
print(array_1dt)
print(type(array_1dt))


[2 4 5 6 7 9]
<class 'numpy.ndarray'>
[2 3 4 5 6 7 8]
<class 'numpy.ndarray'>


In [3]:
# Creating a 2-D array using two lists / tuples
array_2d = np.array([(2, 3, 4), (5, 8, 7)])
print(array_2d)

[[2 3 4]
 [5 8 7]]


In NumPy, dimensions are called **axes**. In the 2-d array above, there are two axes, having two and three elements respectively. 

In NumPy terminology, for 2-D arrays:
* ```axis = 0``` refers to the rows
* ```axis = 1``` refers to the columns

<img src="numpy_axes.jpg" style="width: 600px; height: 400px">

In [4]:
list_1 = [3, 6, 7, 5]
list_2 = [4, 5, 1, 7]

# the list way to do it: map a function to the two lists
product_list = list(map(lambda x, y: x*y, list_1, list_2))
print(product_list)


[12, 30, 7, 35]


In [5]:
# The numpy array way to do it: simply multiply the two arrays
array_1 = np.array(list_1)
array_2 = np.array(list_2)

array_3 = array_1*array_2
print(array_3)
print(type(array_3))

array_4 = array_1+array_2
print(array_4)
print(type(array_4))

[12 30  7 35]
<class 'numpy.ndarray'>
[ 7 11  8 12]
<class 'numpy.ndarray'>


As you can see, the NumPy way of multiplication is clearly more concise.

Even simple mathematical operations on lists require for loops, unlike with arrays. For example, to calculate the square of every number in a list:

In [6]:
# Square a list
list_squared = [i**2 for i in list_1]

# Square a numpy array
array_squared = array_1**2

print(list_squared)
print(array_squared)

[9, 36, 49, 25]
[ 9 36 49 25]


In [7]:
array_2d = np.array([(1,2,3),(3,2,1)])
array_2d1 = np.array([(1,2,3),(3,2,1)])

prod_array = array_2d * array_2d1 
print(prod_array)

[[1 4 9]
 [9 4 1]]


This was with 1-D arrays. You'll often work with 2-D arrays (matrices), where the difference would be even greater. With lists, you'll have to store matrices as lists of lists and loop through them. With NumPy, you simply multiply the matrices.

### Creating NumPy Arrays 

There are multiple ways to create numpy arrays, the most commmon ones being:
* Convert lists or tuples to arrays using ```np.array()```, as done above
* Initialise arrays of fixed size (when the size is known) 


In [None]:
# Convert lists or tuples to arrays using np.array()
# Note that np.array(2, 5, 6, 7) will throw an error - you need to pass a list or a tuple
array_from_list = np.array([2, 5, 6, 7]) 
array_from_tuple = np.array((4, 5, 8, 9))

print(array_from_list)
print(array_from_tuple)

The other common way is to initialise arrays. You do this when you know the size of the array beforehand.

The following ways are commonly used:
* ```np.ones()```: Create array of 1s
* ```np.zeros()```: Create array of 0s
* ```np.random.random()```: Create array of random numbers
* ```np.arange()```: Create array with increments of a fixed step size
* ```np.linspace()```: Create array of fixed length

In [8]:
# Tip: Use help to see the syntax when required
help(np.ones)

Help on function ones in module numpy:

ones(shape, dtype=None, order='C', *, like=None)
    Return a new array of given shape and type, filled with ones.
    
    Parameters
    ----------
    shape : int or sequence of ints
        Shape of the new array, e.g., ``(2, 3)`` or ``2``.
    dtype : data-type, optional
        The desired data-type for the array, e.g., `numpy.int8`.  Default is
        `numpy.float64`.
    order : {'C', 'F'}, optional, default: C
        Whether to store multi-dimensional data in row-major
        (C-style) or column-major (Fortran-style) order in
        memory.
    like : array_like, optional
        Reference object to allow the creation of arrays which are not
        NumPy arrays. If an array-like passed in as ``like`` supports
        the ``__array_function__`` protocol, the result will be defined
        by it. In this case, it ensures the creation of an array object
        compatible with that passed in via this argument.
    
        .. versionad

In [9]:
# Creating a 5 x 3 array of ones
np.ones((5, 3))

array([[1., 1., 1.],
       [1., 1., 1.],
       [1., 1., 1.],
       [1., 1., 1.],
       [1., 1., 1.]])

In [10]:
# Notice that, by default, numpy creates data type = float64
# Can provide dtype explicitly using dtype
np.ones((5, 3), dtype = int)

array([[1, 1, 1],
       [1, 1, 1],
       [1, 1, 1],
       [1, 1, 1],
       [1, 1, 1]])

In [11]:
# Creating array of zeros
np.zeros(4, dtype = int)

array([0, 0, 0, 0])

In [None]:
help(np.random.random)

In [12]:
# Array of random numbers
np.random.random([3, 4])

array([[0.45515677, 0.37433189, 0.59901021, 0.41666659],
       [0.76255845, 0.44080835, 0.2720474 , 0.56854066],
       [0.95536265, 0.84703043, 0.93100334, 0.1419591 ]])

In [15]:
# np.arange()
# np.arange() is the numpy equivalent of range()
# Notice that 10 is included, 100 is not, as in standard python lists

# From 10 to 100 with a step of 5
numbers = np.arange(10, 100, 5)
print(numbers)

[10 15 20 25 30 35 40 45 50 55 60 65 70 75 80 85 90 95]


In [16]:
# Create a 4 x 4 random array of integers ranging from 0 to 9
np.random.randint(0, 10, (4,4))

array([[2, 1, 2, 5],
       [9, 0, 2, 3],
       [2, 1, 1, 7],
       [0, 7, 3, 2]])

### Inspect the Structure and Content of Arrays

It is helpful to inspect the structure of numpy arrays, especially while working with large arrays. Some attributes of numpy arrays are:
* ```shape```: Shape of array (n x m)
* ```dtype```: data type (int, float etc.)
* ```ndim```: Number of dimensions (or axes)
* ```itemsize```: Memory used by each array elememnt in bytes


Let's say you are working with a moderately large array of size 1000 x 300. First, you would want to wrap your head around the basic shape and size of the array. 

In [17]:
# Initialising a random 1000 x 300 array
rand_array = np.random.random((1000, 300))

# Print the second row
print(rand_array[1, ])

[0.70478082 0.16297164 0.08890915 0.49381254 0.9884108  0.35847187
 0.37678624 0.32327901 0.90131595 0.56368189 0.36023197 0.51778407
 0.4344683  0.09592862 0.75736363 0.53232351 0.02245858 0.69690701
 0.0450395  0.78535211 0.43456676 0.21296539 0.48366765 0.68655802
 0.5947007  0.48931634 0.9744963  0.19813651 0.83869325 0.59789335
 0.61855726 0.22926408 0.29117175 0.95120069 0.43866688 0.29589006
 0.41651319 0.81591879 0.89256804 0.65229197 0.48646444 0.93331284
 0.0993758  0.04694213 0.92532751 0.19703003 0.16003144 0.93247595
 0.55468829 0.78316748 0.58768722 0.93250914 0.44852402 0.46735241
 0.68394213 0.81387445 0.86503697 0.33009742 0.21814458 0.14631919
 0.5129243  0.33221098 0.89549164 0.60249201 0.05766939 0.16131526
 0.77364727 0.88609644 0.48836471 0.76554731 0.31146545 0.92666038
 0.05687473 0.28249831 0.16258777 0.26638967 0.69132352 0.92489686
 0.44573352 0.63585998 0.95229777 0.89397928 0.69728325 0.80282796
 0.08457976 0.09786796 0.26017267 0.42490968 0.0212282  0.7264

In [18]:
# Inspecting shape, dtype, ndim and itemsize
print("Shape: {}".format(rand_array.shape))
print("dtype: {}".format(rand_array.dtype))
print("Dimensions: {}".format(rand_array.ndim))
print("Item size: {}".format(rand_array.itemsize))

Shape: (1000, 300)
dtype: float64
Dimensions: 2
Item size: 8


Reading 3-D arrays is not very obvious, because we can only print maximum two dimensions on paper, and thus they are printed according to a specific convention. Printing higher dimensional arrays follows the following conventions:
* The last axis is printed from left to right
* The second-to-last axis is printed from top to bottom
* The other axes are also printed top-to-bottom, with each slice separated by another using an empty line 

Let's see some examples.

In [19]:
# Creating a 3-D array
# reshape() simply reshapes a 1-D array 
array_3d = np.arange(24).reshape(2, 3, 4)
print(array_3d)

[[[ 0  1  2  3]
  [ 4  5  6  7]
  [ 8  9 10 11]]

 [[12 13 14 15]
  [16 17 18 19]
  [20 21 22 23]]]


In [20]:
# np.linspace()
# Sometimes, you know the length of the array, not the step size

# Array of length 25 between 15 and 18
np.linspace(15, 18, 25)

array([15.   , 15.125, 15.25 , 15.375, 15.5  , 15.625, 15.75 , 15.875,
       16.   , 16.125, 16.25 , 16.375, 16.5  , 16.625, 16.75 , 16.875,
       17.   , 17.125, 17.25 , 17.375, 17.5  , 17.625, 17.75 , 17.875,
       18.   ])

Apart from the methods mentioned above, there are a few more NumPy functions that you can use to create special NumPy arrays:

np.full(): Create a constant array of any number ‘n’

np.tile(): Create a new array by repeating an existing array for a particular number of times

np.eye(): Create an identity matrix of any dimension

np.random.randint(): Create a random array of integers within a particular range 


In [22]:
# Creating a 4 x 3 array of 7s using np.full()
# The default data type here is int only
np.full((4,3), 5)

array([[5, 5, 5],
       [5, 5, 5],
       [5, 5, 5],
       [5, 5, 5]])

In [23]:
# Given an array, np.tile() creates a new array by repeating the given array for any number of times that you want
# The default data type her is int only
arr = ([0, 1, 2])
np.tile(arr, 3)

array([0, 1, 2, 0, 1, 2, 0, 1, 2])

In [24]:
# You can also create multidimensional arrays using np.tile()
np.tile(arr, (3,2))

array([[0, 1, 2, 0, 1, 2],
       [0, 1, 2, 0, 1, 2],
       [0, 1, 2, 0, 1, 2]])

# Subset, Slice, Index and Iterate through Arrays
For one-dimensional arrays, indexing, slicing etc. is similar to python lists - indexing starts at 0.

In [25]:
# Indexing and slicing one dimensional arrays
array_1d = np.arange(10)
print(array_1d)

[0 1 2 3 4 5 6 7 8 9]


In [27]:
# Third element
print(array_1d[2])

# Specific elements
# Notice that array[2, 5, 6] will throw an error, you need to provide the indices as a list
print(array_1d[2])

# Slice third element onwards
print(array_1d[2:])

# Slice first three elements
print(array_1d[:3])

# Slice third to seventh elements
print(array_1d[2:7])

# Subset starting 0 at increment of 2 
print(array_1d[0::2])

2
2
[2 3 4 5 6 7 8 9]
[0 1 2]
[2 3 4 5 6]
[0 2 4 6 8]


In [28]:
# Iterations are also similar to lists
for i in array_1d:
    print(i**2)

0
1
4
9
16
25
36
49
64
81


**Multidimensional arrays** are indexed using as many indices as the number of dimensions or axes. For instance, to index a 2-D array, you need two indices - array[x, y].

Each axes has an index starting at 0. The following figure shows the axes and their indices for a 2-D array.

In [29]:
# Creating a 2-D array
array_2d = np.array([[2, 5, 7, 5], [4, 6, 8, 10], [10, 12, 15, 19]])
print(array_2d)

[[ 2  5  7  5]
 [ 4  6  8 10]
 [10 12 15 19]]


In [30]:
# Third row second column
print(array_2d[2, 1])

12


In [31]:
# Slicing the second row, and all columns
# Notice that the resultant is itself a 1-D array
print(array_2d[1, :])
print(type(array_2d[1, :]))

[ 4  6  8 10]
<class 'numpy.ndarray'>


In [32]:
# Slicing all rows and the third column
print(array_2d[:, 2])

[ 7  8 15]


In [33]:
# Slicing all rows and the first three columns
print(array_2d[:, :3])

[[ 2  5  7]
 [ 4  6  8]
 [10 12 15]]


In [34]:
print(array_2d[1:3, 1:4])

[[ 6  8 10]
 [12 15 19]]


**Iterating on 2-D arrays** is done with respect to the first axis (which is row, the second axis is column). 

In [None]:
# Iterating over 2-D arrays
for row in array_2d:
    print('row:' ,row)

In [None]:
# Iterating over 3-D arrays: Done with respect to the first axis
array_3d = np.arange(24).reshape(2, 3, 4)
print(array_3d)

In [None]:
# Prints the two blocks
for row in array_3d:
    print(row)

In [None]:
np.mean(array_3d,axis=1)

In [None]:
np.sort(array_3d)

# Numpy Aggregate functions 

Sorting : https://numpy.org/doc/stable/reference/generated/numpy.sort.html

Mean : https://numpy.org/doc/stable/reference/generated/numpy.mean.html

Median : https://numpy.org/doc/stable/reference/generated/numpy.median.html

# Compare Computation Times in NumPy and Standard Python Lists

Let's compare the computation times of arrays and lists for a simple task of calculating the element-wise product of numbers.


In [None]:

## Comparing time taken for computation
list_1 = [i for i in range(1000000)]
list_2 = [j**2 for j in range(1000000)]

# list multiplication
import time

# store start time, time after computation, and take the difference
t0 = time.time()
product_list = list(map(lambda x, y: x*y, list_1, list_2))
t1 = time.time()
list_time = t1 - t0 
print(t1-t0)

# numpy array 
array_1 = np.array(list_1)
array_2 = np.array(list_2)

t0 = time.time()
array_3 = array_1*array_2
t1 = time.time()
numpy_time = t1 - t0

print(t1-t0)

print("The ratio of time taken is {}".format(list_time/numpy_time))
0.13862943649291992
0.00394892692565918
The ratio of time taken is 35.10559681217171
In this case, numpy is an order of magnitude faster than lists. This is with arrays of size in millions, but you may work on much larger arrays of sizes in order of billions. Then, the difference is even larger.

Some reasons for such difference in speed are:

NumPy is written in C, which is basically being executed behind the scenes
NumPy arrays are more compact than lists, i.e. they take much lesser storage space than lists
The following discussions demonstrate the differences in speeds of NumPy and standard python:

https://stackoverflow.com/questions/8385602/why-are-numpy-arrays-so-fast
https://stackoverflow.com/questions/993984/why-numpy-instead-of-python-lists