## NumPy

* Stands for numerical python.
* Fundamental package for scientific computing in Python.
* It's extensive and capable of performing mathematical computations effectively and efficiently.

## Why use NumPy?

* You may be wondering because after all, Python has lists which are extremely useful.
* Lists are awesome, but NumPy has some advantages over lists. One of them is speed.
* When performing operations on large arrays, NumPy often performs several orders of magnitude faster than ordinary lists.
* NumPy has memory efficient arrays and optimized algorithms.
* NumPy also has multidimensional array data structures that can represent vectors and matrices. (A lot of machine learning algorithms rely on matrix operations.)
* NumPy also has a large number of optimized built-in mathematical functions. These allow for execution of a variety of complex mathematical computations very fast with very little code (read no complicated loops), making the programs easier to understand.

## Creating NumPy ndarrays

* nd stands for n-dimensional.
* ndarrays contain elements of the same type.

In [3]:
import numpy as np

* ndarrays can be created in two ways;
  * using python lists
  * using built-in NumPy functions

In [4]:
# using python lists

x = np.array([1, 2, 3, 4, 5])
print("x = {}".format(x))

x = [1 2 3 4 5]


## Terminologies

* **Rank** - In general, N-Dimensional arrays have rank N, e.g. a 2D array is a rank 2 array.
* **shape** - corresponds to the no. of rows and columns in an array. The shape of an ndarray can be obtained by using the `.shape` attribute. It returns a tuple of N positive integers that specify the sizes of each dimension.

In [5]:
print("x has dimensions: {}".format(x.shape))
print("x is an object of type: {}".format(type(x)))
print("The elements in x are of type: {}".format(x.dtype))

x has dimensions: (5,)
x is an object of type: <class 'numpy.ndarray'>
The elements in x are of type: int64


In [7]:
x = np.array(["Hello", "World"])
print("x has dimensions: {}".format(x.shape))
print("x is an object of type: {}".format(type(x)))
print("The elements in x are of type: {}".format(x.dtype))

x has dimensions: (2,)
x is an object of type: <class 'numpy.ndarray'>
The elements in x are of type: <U5


## Major difference btwn ndarrays and lists

* Python lists can contain elements of multiple datatypes while ndarray elements must be of the same datatype.

In [8]:
x = np.array([1, 2, "Hello", "World"])

print("x has dimensions: {}".format(x.shape))
print("x is an object of type: {}".format(type(x)))
print("The elements in x are of type: {}".format(x.dtype))

x has dimensions: (4,)
x is an object of type: <class 'numpy.ndarray'>
The elements in x are of type: <U21


We can see mixed datatypes during the creation of x above, but when printing out the dtype, we see that all elements in x are of type _Unicode String_.

## Multidimensional NumPy arrays

In [13]:
y = np.array([[1, 2, 3], [4, 5, 6], [7, 8, 9], [10, 11, 12]])

print("Ed Sheeran - I'm in love with the shape of you. (-_-)")
print("Just kidding! The shape of y is ...")
print()
print(y)
print()

print("y has dimensions: {}".format(y.shape))

print("y has a total of {} elements.".format(y.size))
print("y is an object of type: {}".format(type(y)))
print("the elements in y are of type: {}".format(y.dtype))

Ed Sheeran - I'm in love with the shape of you. (-_-)
Just kidding! The shape of y is ...

[[ 1  2  3]
 [ 4  5  6]
 [ 7  8  9]
 [10 11 12]]

y has dimensions: (4, 3)
y has a total of 12 elements.
y is an object of type: <class 'numpy.ndarray'>
the elements in y are of type: int64


## What is the dtype when we create ndarrays with other data types?

In [15]:
x = np.array([1, 2, 3])
y = np.array([1.0, 2.0, 3.0])
z = np.array([1, 2.5, 4])

print("The elements in x are of type: {}".format(x.dtype))
print("The elements in y are of type: {}".format(y.dtype))
print("The elements in z are of type: {}".format(z.dtype))

The elements in x are of type: int64
The elements in y are of type: float64
The elements in z are of type: float64


The `z` array is created with elements of different types but its dtype is `float64`. NumPy converts the ndarray elements to float in order to avoid losing precision in numerical computations. This is called _upcasting_.

Even though NumPy selects the dtype of the ndarray, it has the provision to allow you to specify the particular dtype you want to assign the elements inside the ndarray.
You specify the dtype when creating the ndarray using the keyword `dtype` in the `nd.array()` function.

In [17]:
x = np.array([1.5, 2.2, 3.7, 4.0, 5.9], dtype=np.int64)

print("This is x")
print()
print(x)
print()
print("The elements of x are of type {}".format(x.dtype))

This is x

[1 2 3 4 5]

The elements of x are of type int64


NumPy converted the ndarray to `int64` by removing the decimal part.
Specifying the `dtype` can be important when you don't want NumPy to accidentally choose the wrong datatype.

Once you create an ndarray, you may want to save it in a file to be read later or used by another program.

In [19]:
x = np.ndarray([1, 2, 3, 4, 5])

# save x in the current directory
np.save('x_np_array', x)

In [20]:
y = np.load('x_np_array.npy')

print("This is y")
print()
print(y)
print()
print("The elements of y are of type {}".format(y.dtype))

This is y

[[[[[0.00000000e+000 5.43472210e-323 6.92689230e-310 2.25465835e-314
     0.00000000e+000]
    [0.00000000e+000             nan 0.00000000e+000 1.00582130e-221
     0.00000000e+000]
    [4.44659081e-323 2.27648524e-314 6.92688973e-310 0.00000000e+000
     0.00000000e+000]
    [            nan 0.00000000e+000 2.12199579e-314 3.55727265e-322
     5.43472210e-323]]

   [[2.27648524e-314 6.92688973e-310 4.94065646e-324 1.18575755e-322
                 nan]
    [0.00000000e+000 2.12199579e-314 7.11454530e-322 5.43472210e-323
     2.27648524e-314]
    [6.92688973e-310 0.00000000e+000 1.38338381e-322             nan
     0.00000000e+000]
    [4.94065646e-324 1.06718180e-321 5.43472210e-323 2.27648524e-314
     6.92688973e-310]]

   [[0.00000000e+000 1.38338381e-322             nan 4.94065646e-323
     4.94065646e-324]
    [1.42290906e-321 3.95252517e-323 2.27648524e-314 6.92688973e-310
     0.00000000e+000]
    [5.92878775e-323             nan 0.00000000e+000 4.94065646e-324
     1

In [21]:
# not sure what's happening when we print y. Need to check it out later.

## creating NumPy ndarrays using built-in functions

NumPy provides the abilioty to create ndarrays using built-in functions.
These functions allow us to create certain kinds of functions using just one line of code.

### np.zeros(shape)
This function creates an ndarray that is full of zeroes of the given shape.

In [26]:
# e.g. a rank 2 array with 3 rows and 4 columns
x = np.zeros((3, 4))

print()
print(x)
print()
print("This is the shape of x: {}".format(x.shape))
print("Elements of x are type: {}".format(x.dtype))


[[0. 0. 0. 0.]
 [0. 0. 0. 0.]
 [0. 0. 0. 0.]]

This is the shape of x: (3, 4)
Elements of x are type: float64


> By default, np.zeroes creates an ndarray with dtype of float64. The desired datatype can be acheived through the use of the dtype keyword.

### np.ones(shape)

Similarly, we can create an ndarray full of ones.

In [25]:
x = np.ones((3, 2))

print()
print(x)
print()
print("This is the shape of x: {}".format(x.shape))


[[1. 1.]
 [1. 1.]
 [1. 1.]]

This is the shape of x: (3, 2)


Again, `np.ones()` creates an ndarray of dtype `float64`. This behaviour can be modified through the use of the `dtype` kwarg when creating the array.

### np.full(shape, constant_value)

The `np.full` function takes two arguments; the shape and the constant value with which to populate the ndarray.

In [27]:
x = np.full((4, 3), 34)

print()
print(x)
print()
print("This is the shape of x: {}".format(x.shape))


[[34 34 34]
 [34 34 34]
 [34 34 34]
 [34 34 34]]

This is the shape of x: (4, 3)


`np.full` creates an ndarray of the same dtype as the constant value.

### np.eye(rows)

This is used to create an identity matrix.

An identity matrix is a square matrix (rows == columns) that only has ones in its main diagonal and zeroes everywhere else.

In [29]:
x = np.eye(6)

print()
print(x)
print()
print("This is the shape of x: {}".format(x.shape))
print("This is the dtype of x: {}".format(x.dtype))


[[1. 0. 0. 0. 0. 0.]
 [0. 1. 0. 0. 0. 0.]
 [0. 0. 1. 0. 0. 0.]
 [0. 0. 0. 1. 0. 0.]
 [0. 0. 0. 0. 1. 0.]
 [0. 0. 0. 0. 0. 1.]]

This is the shape of x: (6, 6)
This is the dtype of x: float64


> The default dtype of `np.eye` is `float64`. This can be changed via the `dtype` kwarg at the point of array creation.

### np.diag(diag)

This creates a **square** matrix that only has values along its main diagonal.

In [30]:
x = np.diag([10, 20, 30, 40])

print()
print(x)
print()
print("This is the shape of x: {}".format(x.shape))
print("This is the dtype of x: {}".format(x.dtype))


[[10  0  0  0]
 [ 0 20  0  0]
 [ 0  0 30  0]
 [ 0  0  0 40]]

This is the shape of x: (4, 4)
This is the dtype of x: int64


### np.arange(start, stop, step)

This creates an ndarray that has evenly spaced values within a specified interval.

Can be used with either one, two or three arguments.

In [31]:
# when used with only one argument, it creates an ndarray of rank 1 with consecutive integers between 0 and `stop` -1
x = np.arange(10)

print()
print(x)
print()
print("This is the shape of x: {}".format(x.shape))
print("This is the dtype of x: {}".format(x.dtype))


[0 1 2 3 4 5 6 7 8 9]

This is the shape of x: (10,)
This is the dtype of x: int64


In [32]:
# when used with two arguments, it creates an ndarray with evenly spaced values between `start` and `stop` - 1
x = np.arange(4, 10)

print()
print(x)
print()
print("This is the shape of x: {}".format(x.shape))
print("This is the dtype of x: {}".format(x.dtype))


[4 5 6 7 8 9]

This is the shape of x: (6,)
This is the dtype of x: int64


In [33]:
# when used with three arguments, it creates an ndarray with values between `start` and `stop` - 1 that evenly spaced by step intervals
x = np.arange(1, 14, 3)

print()
print(x)
print()
print("This is the shape of x: {}".format(x.shape))
print("This is the dtype of x: {}".format(x.dtype))


[ 1  4  7 10 13]

This is the shape of x: (5,)
This is the dtype of x: int64


> `np.arange` allows for non-integer steps but the results may be incosistent with what you may expect due to finite floating point precision. For cases where non-integer steps are required, it's usually better to use `np.linspace()`

### np.linspace(start, stop, step)

Creates an ndarray with evenly spaced elements where `start` and `stop` are inclusive (unlike np.arange()). This means it needs to be called with at least two arguments.
When no step is provided, a default of 50 will be used.

In [37]:
x = np.linspace(0, 25, 10)

print()
print(x)
print()
print("This is the shape of x: {}".format(x.shape))
print("This is the dtype of x: {}".format(x.dtype))


[ 0.          2.77777778  5.55555556  8.33333333 11.11111111 13.88888889
 16.66666667 19.44444444 22.22222222 25.        ]

This is the shape of x: (10,)
This is the dtype of x: float64


`np.linspace` works better than `np.arange` because it uses the step as the number of elements we want within interval 
rather than using the step as the value between elements.

By default, the stop value is included in `np.linspace`. To exclude it, use the `endpoint` boolean flag when creating the ndarray.

In [38]:
x = np.linspace(0, 25, 10, endpoint=False)

print()
print(x)
print()
print("This is the shape of x: {}".format(x.shape))
print("This is the dtype of x: {}".format(x.dtype))


[ 0.   2.5  5.   7.5 10.  12.5 15.  17.5 20.  22.5]

This is the shape of x: (10,)
This is the dtype of x: float64


### np.reshape(nd_array, shape)

`np.arange` and `np.linspace` can also be used to create rank 2 ndarrays. This can be done by passing them to `np.reshape(nd_array, shape)`.

This converts `nd_array` to the specified `shape`.

It's important to note that `shape` should be compatible with the number of elements in the given `nd_array`, e.g. 

* you can convert a 6 element 1 rank array to a 3 X 2 rank 2 array or a 2 X 3 rank 2 array since both dimensions have a total of 6 elements.
* you cannot reshape a 6 element rank 1 to a 3 X 3 rank 2 because the rank 2 array needs 9 elements in total.

In [39]:
x = np.arange(20)

print("Original x")
print()
print(x)
print()
x = np.reshape(x, (4,5))
print("Reshaped x")
print()
print(x)
print()
print("This is the shape of x: {}".format(x.shape))
print("This is the dtype of x: {}".format(x.dtype))

Original x

[ 0  1  2  3  4  5  6  7  8  9 10 11 12 13 14 15 16 17 18 19]

Reshaped x

[[ 0  1  2  3  4]
 [ 5  6  7  8  9]
 [10 11 12 13 14]
 [15 16 17 18 19]]

This is the shape of x: (4, 5)
This is the dtype of x: int64


One great feature about NumPy is that these built-in functions can also be chained in a single line!

In [41]:
x = np.arange(20).reshape(4, 5)

print()
print(x)
print()
print("This is the shape of x: {}".format(x.shape))
print("This is the dtype of x: {}".format(x.dtype))


[[ 0  1  2  3  4]
 [ 5  6  7  8  9]
 [10 11 12 13 14]
 [15 16 17 18 19]]

This is the shape of x: (4, 5)
This is the dtype of x: int64


> Notice that when we chain, the call to `np.reshape` changes slightly. It is applied as `np.reshape(shape)`

Chaining can also be done with `np.linspace`

In [44]:
x = np.linspace(0, 50, 10, endpoint=False).reshape(5, 2)

print()
print(x)
print()
print("This is the shape of x: {}".format(x.shape))
print("This is the dtype of x: {}".format(x.dtype))


[[ 0.  5.]
 [10. 15.]
 [20. 25.]
 [30. 35.]
 [40. 45.]]

This is the shape of x: (5, 2)
This is the dtype of x: float64


### np.random.random(shape)

NumPy also has random ndarrays. They (obviously) contain random numbers.

`np.random.random` creates an ndarray of the specified shape with random floats within the `0.0 - 1.0` range.

In [45]:
x = np.random.random((3, 3))

print()
print(x)
print()
print("This is the shape of x: {}".format(x.shape))
print("This is the dtype of x: {}".format(x.dtype))


[[0.32055626 0.68693361 0.12290858]
 [0.02191445 0.06862056 0.33535179]
 [0.73944632 0.1167699  0.32899875]]

This is the shape of x: (3, 3)
This is the dtype of x: float64


### np.random.randint(start, stop, size=shape)

NumPy can also create ndarrays with random integers within a particular interval.

In [46]:
x = np.random.randint(4, 15, size=(3, 2))

print()
print(x)
print()
print("This is the shape of x: {}".format(x.shape))
print("This is the dtype of x: {}".format(x.dtype))


[[12 14]
 [10  8]
 [ 4  5]]

This is the shape of x: (3, 2)
This is the dtype of x: int64


### np.random.normal(mean, standard_deviation, size=shape)

Sometimes you want to create an ndarray that contains random numbers that satisfy certain statistical properties, e.g. if you want random numbers in the ndarray to have an average of 0.

Let's create a 1000 X 1000 ndarray of random floating point numbers drawn from a normal distribution with a mean of zero and a standard deviation of 1.

In [49]:
x = np.random.normal(0, 0.1, size=(1000, 1000))

print()
print(x)
print()
print("This is the shape of x: {}".format(x.shape))
print("This is the dtype of x: {}".format(x.dtype))
print("The elements of x have a mean of: {}".format(x.mean()))
print("The maximum value amongst elements of x is: {}".format(x.max()))
print("The minimum value amongst elements of x is: {}".format(x.min()))
print("x has {} positive numbers".format((x > 0).sum()))
print("x has {} negative numbers".format((x < 0).sum()))


[[-0.01940528  0.13361145  0.02067963 ...  0.00226071 -0.03509071
   0.16002329]
 [-0.0515206   0.03298996  0.22345196 ...  0.02005954 -0.02434448
   0.01308222]
 [-0.04315404  0.08915572 -0.01486874 ... -0.22164326  0.25286332
   0.00625993]
 ...
 [ 0.03215816  0.0261805  -0.01513131 ...  0.12141768  0.12100384
   0.01555877]
 [-0.3145045   0.16148513  0.06168342 ...  0.07133627 -0.05692987
   0.00296736]
 [ 0.03392221 -0.14894621  0.02711902 ... -0.07130806 -0.07728966
  -0.13272103]]

This is the shape of x: (1000, 1000)
This is the dtype of x: float64
The elements of x have a mean of: -1.6341100875723564e-05
The maximum value amongst elements of x is: 0.5133890246931032
The minimum value amongst elements of x is: -0.47046337917440556
x has 499641 positive numbers
x has 500359 negative numbers
