## Sections:

1. [NumPy](#NumPy)  
2. [Installing and importing NumPy](#installing-and-importing-numpy)  
3. [Creating a NumPy array](#creating-a-numpy-array)  
4. [NumPy array data types](#numpy-array-data-types)    
5. [Indexing and slicing](#indexing-and-slicing)
6. [Operations on arrays](#operations-on-arrays)
7. [Reshaping arrays](#reshaping-arrays)

# 1. NumPy  <a id='numpy'></a>

**NumPy** is a library used for performing efficient operations on arrays of data. A NumPy array is a type of object that can hold a collection of items, similarly to  a `list`. The benefit of using NumPy arrays is that they significantly boost the speed at which various operations are performed. The efficiency of operations becomes increasingly important as the amount of data we work with increases, along with the number of operations we want to perform on said data. For this reason, NumPy is a particularly useful library.


# 2. Installing and importing NumPy  <a id='installing-numpy'></a>

The NumPy package is not included with Python, and therefore we have to install it before we can import the "numpy" module. The easiest way to install NumPy is to use **pip** - the official **P**ackage **I**nstaller for **P**ython. In order to install the NumPy package using pip, we can run the following command in the terminal:

`pip install numpy`

However, since we are using a Jupyter notebook, we can also install NumPy by running the command in the cell below (which will run the above command in the terminal for us):

In [1]:
!pip install numpy

Collecting numpy
  Using cached numpy-1.23.2-cp39-cp39-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (17.1 MB)
Installing collected packages: numpy
Successfully installed numpy-1.23.2


Once we have installed the NumPy package, we can import the "numpy" module by running the following code:

In [2]:
import numpy as np

The line above imports the "numpy" module and assigns it to the variable `np` - this is a commonly used abbreviation for the "numpy" module.

# 3. NumPy arrays  <a id='numpy-arrays'></a>

As mentioned previously, a NumPy array is an object that can hold a collection of items, similarly to a `list`. There are many ways we create a NumPy array. For example, we can call the `array()` function and pass in a list of items as an argument:

In [71]:
x = np.array([1, 2, 3, 4])
x

array([1, 2, 3, 4])

We can use the `type()` function to see that `x` is indeed an array:

In [72]:
type(x)

numpy.ndarray

We can see that the `type(x)` function returns `numpy.ndarray`, indicating that this object is an `ndarray` from the "numpy" module. The name `ndarray` stands for N-dimensional array - this is because NumPy arrays can have multiple dimensions (N dimensions). We can check how many dimensions an array has by accessing its `ndim` attribute. 

**Note:** A variable that belongs to an object is called an attribute. Attributes can be accessed by writing the `.` symbol followed by the name of the attribute.

In [65]:
x.ndim

1

As we can see above, the `x` array has one dimension. If we want to create an array with two dimensions, we can do so by nesting lists within a list, and then passing the top-level list to the `array()` function:

In [78]:
y = [[1, 2, 3], [4, 5, 6]]
y = np.array(y)
y

array([[1, 2, 3],
       [4, 5, 6]])

In [79]:
y.ndim

2

As we can see above, the `y` array has two dimensions. A two-dimensional array can be thought of as a table or a matrix composed of rows and columns. In this case, the `y` array has two rows and three columns.

![matrix-diagram.png](attachment:matrix-diagram.png)

 In a more general sense, this is referred to as the shape of an array. The shape of an array can be checked via the `shape` attribute.

In [81]:
y.shape

(2, 3)

The shape of an array is represented as a `tuple`. A `tuple` is similar to a `list`, as it also holds a collection of items - the beginning and end of a `tuple` is signified by round bracktes `()`. The main difference between a `tuple` and a `list` is that a `tuple` cannot be changed, whereas a list can have items added, changed or removed. You can think of a `tuple` as a frozen `list` that does not change.

The `shape` attribute returns a `tuple` containing the size of each dimension. The size of a dimension refers to the number of items in that dimension. In the case of the `y` array, the first dimension contains two items (both of which are arrays), which is why the first item in the `shape` tuple is `2`. Each of those arrays contains 3 items (all of which are integers), which is why the second item in the `shape` tuple is `3`. 

Wrapping your head around multi-dimensional arrays can take a bit of practice. Below are some more examples of two-dimensional arrays. Try figuring out the shape of the array by looking at the data, before running the code that will print out the shape.

In [51]:
data = [
    [1, 2, 3],
    [4, 5, 6],
    [7, 8, 9],
    [10, 11, 12]
]

arr = np.array(data)

**Note:** when writing nested lists, it can be a good idea to spread them out across multiple lines, in order to improve the readability of the code.

In [52]:
arr.shape

(4, 3)

In [4]:
data = [
    [1, 2],
    [3, 4],
    [5, 6],
]

arr = np.array(data)

In [5]:
arr.shape

(3, 2)

In [6]:
data = [
    [1, 2, 3, 4, 5]
]

arr = np.array(data)

In [7]:
arr.shape

(1, 5)

Arrays can also have three or more dimensions. A three dimensional array can be thought of as a collection of tables or matrices. Alternatively, it can be visualized as a three-dimensional cube filled with numbers. The illustration below contains visual representations of multi-dimensional arrays:

![tensor-diargram.png](attachment:tensor-diargram.png)

An array can also have zero dimensions. For example, if we pass in a single integer into the `array()` function, we will create a zero-dimensional `ndarray`.

In [11]:
arr = np.array(3)
print(arr.ndim)
print(arr.shape)

0
()


To summarize, the code cell below creates arrays with 0, 1, 2 and 3 dimensions:

In [28]:
a = np.array(1)
b = np.array([1, 2, 3])
c = np.array([[1, 2], [1, 2]])
d = np.array([[[1, 2], [1, 2]], [[1, 2], [1, 2]]])

print(a.ndim, a.shape)
print(b.ndim, b.shape)
print(c.ndim, c.shape)
print(d.ndim, d.shape)

0 ()
1 (3,)
2 (2, 2)
3 (2, 2, 2)


**Note:** In the code cell above, we pass in multiple arguments into the `print()` function. The `print()` function can take in an arbitrary number of arguments and it will print them out on the same line, separating them by a single white space `" "`.

There are also other ways to create arrays in NumPy, including many different functions (other than `array()`), which conveniently allow us to create certain types of arrays. It is not important to know all of the different functions, however, it is worth knowing that there are different functions for creating arrays. This is because some functions are more convenient than others, depending on the type of array we want to create.

Below are some examples:

The `arange()` function creates an array containing a range of numbers specified by the arguments passed in.

In [42]:
np.arange(1, 5)

array([1, 2, 3, 4])

The `zeros()` function creates an array filled with the value `0`. The argument we pass in determines the shape of the array.

In [29]:
np.zeros(3)

array([0., 0., 0.])

We can create a multi-dimensional array by passing in a `list` or a `tuple` of values, which represent the shape of the array.

In [33]:
np.zeros([2, 3])

array([[0., 0., 0.],
       [0., 0., 0.]])

In [32]:
np.zeros((2, 3))

array([[0., 0., 0.],
       [0., 0., 0.]])

The `ones()` function works analogously to the `zeros()` function, filling an array with the value `1`.

In [20]:
np.ones(1)

array([1.])

In [19]:
np.ones((3, 4))

array([[1., 1., 1., 1.],
       [1., 1., 1., 1.],
       [1., 1., 1., 1.]])

The "numpy" module also contains another module within called the "random" module. The "random" module contains different functions which allow us to create arrays filled with random numbers. For example the `uniform()` function creates an array with random floating-point numbers sampled uniformly from a specified range of values ("sampled uniformly" means that every floating-point number within the specified range of values has an equal likelihood of being chosen).

The `uniform()` function takes in three arguments:

1. The first argument determines the lower-bound of the range of values (the value itself is included in the range).
2. The second argument determines the upper-bound of the range of values (the value itself is excluded from the range).
3. The third argument determines the shape of the array.

In [36]:
np.random.uniform(0, 1, 3)

array([0.4961275 , 0.66281566, 0.53734364])

In [37]:
np.random.uniform(0, 1, (2, 3))

array([[0.50961483, 0.7198284 , 0.29486337],
       [0.75636579, 0.81119542, 0.91071251]])

In [38]:
np.random.uniform(-15, 15, (2, 3))

array([[  2.49448695, -12.02803112,   6.50248066],
       [-14.89821916,   4.58227313,  -8.44230171]])

There is also a `randint()` function which works like the `uniform()` function but with integers:

In [56]:
np.random.randint(0, 10, 5)

array([3, 6, 7, 4, 8])

In [39]:
np.random.randint(0, 10, (4, 2))

array([[8, 4],
       [3, 7],
       [6, 6],
       [6, 7]])

In [40]:
np.random.randint(-30, -20, (4, 2))

array([[-23, -27],
       [-21, -21],
       [-23, -26],
       [-30, -29]])

There are also other functions which create various kinds of arrays, however, as mentioned previously, it is not important to memorize or know about all of these functions. All we have to know is that there are many different functions. This is because if we know they exist, we know we can look for them (via a search engine or on the [official numpy documentation page](https://numpy.org/doc/stable/user/index.html)) to see if a function exists that solves our particular problem.

# 4. NumPy array data types  <a id='numpy-array-data-types'></a>

The code cell below creates three arrays - each containing items of a different data type (`int`, `float`, `str`). If we use the `type()` function, we can see that each of those arrays is an `ndarray` regardless of the type of data it contains.

In [3]:
a = np.array([0, 1, 2])
b = np.array([0.34, 27.42, 5.39])
c = np.array(["a", "b", "c"])

print(type(a))
print(type(b))
print(type(c))

<class 'numpy.ndarray'>
<class 'numpy.ndarray'>
<class 'numpy.ndarray'>


However, a NumPy array also contains information about the data type of the items stored within it. The data type of the items within an array can be accessed via the `dtype` attribute:

In [40]:
print(a.dtype)
print(b.dtype)
print(c.dtype)

int64
float64
<U1


As we can see, the `dtype` attribute returns a different data type for each `ndarray`.

1. **int64** - is an integer which occupies 64 bits in the computer's memory. In pure Python, we only have an `int` type, without any specification of the number of bits this integer occupies in the computer's memory. In NumPy, we can be more precise and specify the number of bits we want our integers to occupy in the computer's memory. The more memory we dedicate, the higher the range of integers we can store. However, if we know that we will not be using big integers, we can dedicate less memory, and thereby save memory overall, as well as increase the speed of operations. For example, we could use an `int8`, which dedicates 8 bits of memory per integer - this gives us a range of numbers from -128 to 127. Or if we are only interested in positive numbers, we could use a `uint8` (unsigned integer with 8 bits of memory), which does not permit negative values and therefore allows a range of integers from 0 to 255.


2. **float64** - is a floating-point number that occupies 64 bits in the computer's memory. We can also have floating-point numbers that occupy more or less memory, depending on what our needs are in terms of the range of values we expect the floating-point numbers to have.


3. **<U1** - is a unicode string of length 1. Indeed when creating the `c` array, we have passed in a list containing strings of length 1. NumPy stores those strings in the unicode format (dedicating 16 bits per character).

One of the things that differentiates an `ndarray` from a `list` is that an `ndarray` can contain items of only one data type (whereas a `list` can have items of multiple different data types). When creating an array with the `array()` function, if we pass in a `list` containing items with differing data types, NumPy will do its best to infer the data type of the array and convert items accordingly. Take a look at the examples below:

In [7]:
x = np.array([1, 2.648, 3])
print(x.dtype)
print(x)

float64
[1.    2.648 3.   ]


As we can see above, NumPy has converted all of the items in the array `x` into the data type `float64`, even though the items in the list at index `0` and index `2` were of type `int`. We can see that they have indeed been converted to floating-point numbers by the `.` added to them.

Take a look at what happens if we create an array from a list that contains integers and booleans:

In [34]:
x = np.array([5, True, False])
print(x.dtype)
print(x)

int64
[5 1 0]


As can be seen above, the value `True` is converted to the integer `1`, while the value `False` is converted to the integer `0`. And therefore, the `dtype` of the `ndarray` is `int64`.

Take a look at what happens if we create an array with varying lengths of strings:

In [35]:
x = np.array(["a", "bb", "ccc"])
print(x.dtype)
print(x)

<U3
['a' 'bb' 'ccc']


The `dtype` is `<U3` indicating that the array holds unicode strings of length `3`. Even though the first string is only 1 character long and the second string only 2 characters long, both of these strings occupy the same amount of memory as the third string. Namely, every item in the `x` array occupies 48 bits of memory (3 * 16 bits, since a single unicode character occupies 16 bits).

NumPy requires every item in an array to occupy an equal amount of memory, because it allows storing data in the computer's memory in a way that enables greater efficiency when performing operations.

It is also worth knowing that we can specify the data type of an array when we are creating it with the `array()` function. The data we pass in will be converted accordingly to the `dtype` specified:

In [20]:
x = np.array(["a", "bb", "ccc"], "<U10")
print(x)
print(x.dtype)

['a' 'bb' 'ccc']
<U10


Now each item in the array has 160 bits (10 * 16) reserved in the computer's memory. Similarly, if we specify the `dtype` to be `int32` but we pass in a list with floating-point numbers, NumPy will convert those floating-point numbers into integers of size 32 (bits):

In [21]:
x = np.array([8.23, 5.14, 7.21], "int32")
print(x)
print(x.dtype)

[8 5 7]
int32


We can also convert the data type of an `ndarray` after it has been created with the `astype()` method:

In [22]:
x = x.astype("float64")
print(x)
print(x.dtype)

[8. 5. 7.]
float64


# 5. Indexing and slicing <a id='indexing-and-slicing'></a>

When it comes to indexing, an array is similar to a `list` or a `str` in many ways. For example, we can access items in an array by their index:

In [10]:
arr = np.arange(1, 10)

print(arr)
print(arr[3])
print(arr[-2])

[1 2 3 4 5 6 7 8 9]
4
8


When an array has more than one dimension, we can access items in the following way:

In [44]:
arr = np.random.randint(0, 10, (3, 5))
arr

array([[4, 3, 4, 8, 0],
       [4, 3, 5, 0, 1],
       [0, 2, 3, 7, 8]])

In [45]:
print(arr[1])
print(arr[1, 2])
print(arr[-1, -4])

[4 3 5 0 1]
5
2


We can also slice arrays:

In [46]:
arr[1:3]

array([[4, 3, 5, 0, 1],
       [0, 2, 3, 7, 8]])

In [47]:
arr[:-1]

array([[4, 3, 4, 8, 0],
       [4, 3, 5, 0, 1]])

In [48]:
arr[:, 2]

array([4, 5, 3])

In [49]:
arr[:, 2:4]

array([[4, 8],
       [5, 0],
       [3, 7]])

In [50]:
arr[1:, 2:4]

array([[5, 0],
       [3, 7]])

NumPy also allows us to create a mask which filters out certain items based on a logical condition:

In [19]:
x = np.arange(0, 10)

mask = x > 5

x[mask]

array([6, 7, 8, 9])

As we can see, only values greater than `5` are present in the array when we apply the mask. If we look at the `mask` variable, we will see that it is in fact an array containing booleans:

In [20]:
print(mask)
print(mask.shape)

[False False False False False False  True  True  True  True]
(10,)


The `False` values indicate which items should be filtered out, while the `True` values indicate which items should remain.

In [24]:
x[x != 4]

array([0, 1, 2, 3, 5, 6, 7, 8, 9])

In [26]:
x[x <= 3]

array([0, 1, 2, 3])

We can also access items by passing a list containing the indices of items:

In [43]:
x = np.arange(10, 35, 5)
print(x)

indices = [0, 1, 3] 
print(x[indices])
print(x[[2, 4]])
print(x[[0, 0, 1, 1]])

[10 15 20 25 30]
[10 15 25]
[20 30]
[10 10 15 15]


In [3]:
x = np.arange(10, 85, 5).reshape(3, 5)
print(x)
print("-" * 20)

print(x[[0, 2]])
print("-" * 20)

print(x[[0, 2], -3:])
print("-" * 20)

[[10 15 20 25 30]
 [35 40 45 50 55]
 [60 65 70 75 80]]
--------------------
[[10 15 20 25 30]
 [60 65 70 75 80]]
--------------------
[[20 25 30]
 [70 75 80]]
--------------------


It is worth mentioning that we can also assign values to specific locations in an array via indexing and slicing:

In [5]:
x = np.arange(0, 5)
x[2] = 99
x

array([ 0,  1, 99,  3,  4])

In [16]:
x = np.arange(0, 15).reshape(3, 5)
print(x)
print("-" * 20)

x[1:, 1:4] = [97, 98, 99]
print(x)
print("-" * 20)

x[1:, 1:4] = [[97, 98, 99], [100, 101, 102]]
print(x)

[[ 0  1  2  3  4]
 [ 5  6  7  8  9]
 [10 11 12 13 14]]
--------------------
[[ 0  1  2  3  4]
 [ 5 97 98 99  9]
 [10 97 98 99 14]]
--------------------
[[  0   1   2   3   4]
 [  5  97  98  99   9]
 [ 10 100 101 102  14]]


# 6. Operations on arrays <a id='operations-on-arrays'></a>

NumPy allows us to perform operations between arrays (collections of values) and scalars (single values). This allows us to express an operation we want to have performed on every item in an array without having to write loops:

In [37]:
x = np.ones(5)

print(x)
print(x + 5)
print(x / 3)

[1. 1. 1. 1. 1.]
[6. 6. 6. 6. 6.]
[0.33333333 0.33333333 0.33333333 0.33333333 0.33333333]


**Note:** such a way of writing code is called vectorization. The vectorized operations in NumPy are much more efficient than for loops in pure Python.

NumPy also allows for operations between two arrays of the same size. In that case, the operation will be performed in an element-wise fashion:

In [3]:
x = np.arange(1, 6)

print(x)
print(x + x)
print(x * x)
print(x - x)
print(x / x)

[1 2 3 4 5]
[ 2  4  6  8 10]
[ 1  4  9 16 25]
[0 0 0 0 0]
[1. 1. 1. 1. 1.]


There are also many different statistical operations in NumPy, which can be accessed via functions or methods. For example, the sum of an array can be calculated in the following ways:

In [4]:
print(np.sum(x))
print(x.sum())

15
15


Both of the ways above are equivalent. Many operations in NumPy are available as both functions within the "numpy" module or as methods contained within an `ndarray`.

In [5]:
print(np.min(x))
print(x.min())

1
1


In [6]:
print(np.max(x))
print(x.max())

5
5


In [14]:
print(np.mean(x))
print(x.mean())

3.0
3.0


In [15]:
# `std()` calculates the standard deviation
print(np.std(x))
print(x.std())

1.4142135623730951
1.4142135623730951


In [16]:
# `var()` calculates the variance
print(np.var(x))
print(x.var())

2.0
2.0


NumPy arrays are also iterables, which means we can use the `in` operator:

In [10]:
print(x)
print(3 in x)
print(6 in x)

[1 2 3 4 5]
True
False


Of course this means we can also use for loops with NumPy arrays:

In [8]:
for item in x:
    print(item)

1
2
3
4
5


# 7. Reshaping and joining arrays <a id='reshaping-arrays'></a>

It is also worth knowing that arrays can be reshaped after they have been created:

In [19]:
x = np.arange(12)

print(x)
print()
print(x.reshape(4,3))

[ 0  1  2  3  4  5  6  7  8  9 10 11]

[[ 0  1  2]
 [ 3  4  5]
 [ 6  7  8]
 [ 9 10 11]]


We can also concatenate two arrays to create a new array of a different shape:

In [65]:
x = np.arange(6)
y = np.arange(6) + 6

print(x)
print(y)

z = np.concatenate([x, y])
print(z)

[0 1 2 3 4 5]
[ 6  7  8  9 10 11]
[ 0  1  2  3  4  5  6  7  8  9 10 11]


We can also concatenate multi-dimensional arrays. In this case, we can also pass in a second argument to the `concatenate()` function - this second argument determines along which dimension we should concatenate the arrays.

In [63]:
x = x.reshape(2, 3)
y = y.reshape(2, 3)

print(x)
print("-" * 20)
print(y)
print("-" * 20)
print(np.concatenate([x, y], 0))
print("-" * 20)
print(np.concatenate([x, y], 1))

[[0 1 2]
 [3 4 5]]
--------------------
[[ 6  7  8]
 [ 9 10 11]]
--------------------
[[ 0  1  2]
 [ 3  4  5]
 [ 6  7  8]
 [ 9 10 11]]
--------------------
[[ 0  1  2  6  7  8]
 [ 3  4  5  9 10 11]]


In general, there is more that can be said about NumPy arrays, however, this notebook covers some of the main features. Hopefully, you now have some intuition of what an array is and what can be done with it. 

As always, you do not need to remember all of the details of the NumPy library, because you can always look up the names of functions (or whether a function even exists) online - there's plenty of great resources that cover various aspects of the NumPy library. Alternatively, you can go straight to the [official NumPy documentation](https://numpy.org/doc/) for more detailed information.