![](https://miro.medium.com/max/875/1*cyXCE-JcBelTyrK-58w6_Q.png)

# Introduction

#### Q: What is numpy?
A: A linear algebra library for Python. Almost all of the libraries in the PyData ecosystem rely on numpy as their base, or building blocks. It is actually written in C with bindings to python and is therefore incredibly fast by comparison to coding and optimizing similar functions in python alone. This is why it's a critical package for Data sciences in the python realm. 

#### Q: What are numpy arrays?
A: Numpy arrays are the main way we will use the numpy library in the context of this tutorial. They come in two flavours:
- vectors - strictly 1 dimensional arrays
- matrices - 2 dimensional and beyond. Although note that a matrix may still one row or one column. 


# Numpy Arrays

In [1]:
import numpy as np

The history saving thread hit an unexpected error (DatabaseError('database disk image is malformed')).History will not be written to the database.


### Casting python objects to numpy arrays

In [2]:
# create a standard numpy list 
my_list = [1,2,3,4,5]
print(type(my_list))

<class 'list'>


In [3]:
# cast the list object to a numpy array type
arr = np.array(my_list)
print(type(arr))

<class 'numpy.ndarray'>


In [4]:
# create a list of lists 
mat = [[1,2,3],[4,5,6],[7,8,9]]

# cast to a numpy array, creating a 2d array
arr = np.array(mat)
arr

array([[1, 2, 3],
       [4, 5, 6],
       [7, 8, 9]])

### Using numpy built-in methods for array generation

In [5]:
# similar to python built-in range function.
# also uses similar inclusive/exclusive rules
# for the start, stop values. 
np.arange(0,10)

array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])

The numpy arange function can also take a 3rd parameter which is the step parameter and that dictates how the arange function derives the next value in its sequence. 

In [6]:
# get all even numbers between 0-10
evens = np.arange(0,10,2)
evens

array([0, 2, 4, 6, 8])

In [7]:
# get all the odds between 1,10
odds = np.arange(1,10,2)
odds

array([1, 3, 5, 7, 9])

We can also use othermethods of array generation.

In [14]:
# use numpy to generate an array of zeros.

# single digit creates a vector
arr = np.zeros(5)
print("vector", '\n', type(arr), '\n', arr, '\n')

# create a multi-dimensinal array
# note: the tuple is taken as the form
# of rows, cloumns
arr = np.zeros((5,5))
print("multidimensional", '\n', type(arr), '\n', arr)

vector 
 <class 'numpy.ndarray'> 
 [0. 0. 0. 0. 0.] 

multidimensional 
 <class 'numpy.ndarray'> 
 [[0. 0. 0. 0. 0.]
 [0. 0. 0. 0. 0.]
 [0. 0. 0. 0. 0.]
 [0. 0. 0. 0. 0.]
 [0. 0. 0. 0. 0.]]


In [15]:
np.ones((3,5))

array([[1., 1., 1., 1., 1.],
       [1., 1., 1., 1., 1.],
       [1., 1., 1., 1., 1.]])

Another possible array generation tool is `linspace`. This allows the programmer to create an evenly distributed amount of elements in an array between the value of `start` and `stop` with a 3rd parameter for number of points between.

In [16]:
arr = np.linspace(5,10, 20)
arr

array([ 5.        ,  5.26315789,  5.52631579,  5.78947368,  6.05263158,
        6.31578947,  6.57894737,  6.84210526,  7.10526316,  7.36842105,
        7.63157895,  7.89473684,  8.15789474,  8.42105263,  8.68421053,
        8.94736842,  9.21052632,  9.47368421,  9.73684211, 10.        ])

We can also create a identity matrix which is a linear algebra square matrix with a fox ed diagonal of ones, in a structure of zeros. 

In [17]:
arr = np.eye(5)
arr

array([[1., 0., 0., 0., 0.],
       [0., 1., 0., 0., 0.],
       [0., 0., 1., 0., 0.],
       [0., 0., 0., 1., 0.],
       [0., 0., 0., 0., 1.]])

### np.random methods

In [18]:
# note this struct does not require wrapping the 
# values as a tuple. 
np.random.rand(3,5)

array([[0.11981978, 0.86894861, 0.76954324, 0.19842932, 0.96308298],
       [0.51635323, 0.77324231, 0.61682914, 0.04742738, 0.9466581 ],
       [0.81606513, 0.56100625, 0.59874398, 0.63614977, 0.76867576]])

In [20]:
# where we want to wrap around a normal distribution
np.random.randn(5,5)

array([[ 0.55376005, -0.9703    , -0.28753457,  0.81665987,  0.14824513],
       [ 0.1486342 , -0.85716824, -0.50412266, -0.85944944, -0.17401128],
       [-0.81195704, -0.82018481,  0.46985104, -0.52101469, -0.8436827 ],
       [ 0.7520208 ,  0.15908852, -0.83722555, -0.07817761,  0.16397485],
       [ 0.54840988, -0.31715268, -1.02201918,  0.91979636,  0.50754323]])

In [21]:
# random int between a start and stop
# typical inclusive/exclusive ruling
# if only a single parameter is passed it
# is assumed as the stop and an implicit
# zero is taken as a start. 
np.random.randint(20)

8

In [23]:
arr = np.random.randint(1,51, 5)
arr

array([19, 40, 29, 42, 13])

### Reshaping

Note to use the .reshape() method without error the structure you are providing as the base must be able to fully consume the new shape. eg if you have 30 items you can successfully have (2,15),(3,10)(5,6). You can not use a larger shape with part of the shape left blank or None. The easy check being does you intended shape multiplied  equal the number of elements? If not you _will_ have an error. 

In [30]:
arr = np.arange(25)
arr

array([ 0,  1,  2,  3,  4,  5,  6,  7,  8,  9, 10, 11, 12, 13, 14, 15, 16,
       17, 18, 19, 20, 21, 22, 23, 24])

In [38]:
arr.reshape(5,5)

print(f"original shape: {arr.shape}")

newarr = arr.reshape(5,5)
print(f"rehsape: {newarr.shape}")

original shape: (25,)
rehsape: (5, 5)


### Max, min, argument indexes

You can easily find the max or min of an array with the `max` and `min` methods. If you want to find the index of these values you can use the `argmax` & `argmin`.

In [39]:
print(f"max: {arr.max()} argmax: {arr.argmax()}")
print(f"min: {arr.min()} argmin: {arr.argmin()}")


max: 24 argmax: 24
min: 0 argmin: 0


### Indexing & selection

In [41]:
arr = np.arange(0,11)
arr

array([ 0,  1,  2,  3,  4,  5,  6,  7,  8,  9, 10])

In [43]:
# to understand slicing we need to understand the indexing of list items
# list indexes start at zero.
# list index of the last element will be the length -1 (because of starting
# at zero of course.)

#our current arr
for idx, item in enumerate(arr):
    print(f"{idx}\t{item}")

0	0
1	1
2	2
3	3
4	4
5	5
6	6
7	7
8	8
9	9
10	10


In [44]:
arr = np.arange(10,21)
arr

array([10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20])

In [45]:
#our current arr
for idx, item in enumerate(arr):
    print(f"{idx}\t{item}")

0	10
1	11
2	12
3	13
4	14
5	15
6	16
7	17
8	18
9	19
10	20


numpy uses the same slicing capabilities as python itself, the same inclusive, exclusive rules.

In [46]:
arr[2:5]

array([12, 13, 14])

### Multi-dimensional array slicing 

Slicing is where we take a partial of an array in numpy, or a list in python itself. We are slicing off a part of the original. It uses the typical python indexing, ie, starts at zero. It also uses the typical Python rules for inclusive/exclusive rules. ie, we should note that the stop index is **not included**, the start index **is included**.  

In [47]:
# declare a multi-dimensional array
arr = np.array([[1,2,3,4,5],[6,7,8,9,10],[11,12,13,14,15],[16,17,18,19,20],[21,22,23,24,25]])
arr

array([[ 1,  2,  3,  4,  5],
       [ 6,  7,  8,  9, 10],
       [11, 12, 13, 14, 15],
       [16, 17, 18, 19, 20],
       [21, 22, 23, 24, 25]])

In [64]:
# we want to take a partial that has a blank start index. (left side of the colon)
# therefore an implicit zero, up until bit not including index 3. So we can expect:
# index 0 : 1
# index 1 : 2
# index 2 : 3
# as the first row, but we see that we are taking from row zero up until but not including
# row index 3. 
arr[:3,:3]

array([[ 1,  2,  3],
       [ 6,  7,  8],
       [11, 12, 13]])

This means we are taking the partial as: 
- row 0, index 0 (1), index 1 (2), index 2 (3)
- row 1, index 0 (6), index 1 (7), index 2 (8)
- row 2, index 0 (11), index 1 (12), index 2 (13)

In [52]:
arr[3:,3:]

array([[19, 20],
       [24, 25]])

In [53]:
bool_array = arr > 5

In [55]:
arr[arr>5]

array([ 6,  7,  8,  9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22,
       23, 24, 25])

In [56]:
arr[arr<3]

array([1, 2])

In [58]:
arr[arr < 10]

array([1, 2, 3, 4, 5, 6, 7, 8, 9])

In [65]:
# nifty one liner for creating a specific shape
# of values. 
arr_2d = np.arange(50).reshape(5,10)
arr_2d

array([[ 0,  1,  2,  3,  4,  5,  6,  7,  8,  9],
       [10, 11, 12, 13, 14, 15, 16, 17, 18, 19],
       [20, 21, 22, 23, 24, 25, 26, 27, 28, 29],
       [30, 31, 32, 33, 34, 35, 36, 37, 38, 39],
       [40, 41, 42, 43, 44, 45, 46, 47, 48, 49]])

In [63]:
# show the same slicing strategy
arr_2d[:3,:2]

array([[ 0,  1],
       [10, 11],
       [20, 21]])

### Numpy Operations

In [73]:
# array addition.

arr1 = np.arange(10)
arr2 = np.arange(10,20)
arr3 = arr1 + arr2

for i in range(len(arr3)):
    print(f"arr1:{arr1[i]} + arr2:{arr2[i]} = {arr1[i] + arr2[i]}")

arr1:0 + arr2:10 = 10
arr1:1 + arr2:11 = 12
arr1:2 + arr2:12 = 14
arr1:3 + arr2:13 = 16
arr1:4 + arr2:14 = 18
arr1:5 + arr2:15 = 20
arr1:6 + arr2:16 = 22
arr1:7 + arr2:17 = 24
arr1:8 + arr2:18 = 26
arr1:9 + arr2:19 = 28


In [74]:
# Multiplication example
arr = np.arange(10)
arr = arr * arr
arr

array([ 0,  1,  4,  9, 16, 25, 36, 49, 64, 81])

In [75]:
arr = np.arange(100,90, -1)
arr

array([100,  99,  98,  97,  96,  95,  94,  93,  92,  91])

In [76]:
arr = arr / 10
arr

array([10. ,  9.9,  9.8,  9.7,  9.6,  9.5,  9.4,  9.3,  9.2,  9.1])