# Numpy

Numpy (Numerical Python) is an open-source library in Python for performing scientific computations. It lets us work with arrays and matrices in a more natural way unlike lists, wherein we have to loop through individual elements to perform a numerical operation.

As a refresher, here are basic descriptions of arrays and matrices:
 - Arrays are simply a collection of values of same type indexed by integers--think of list
 - Matrices are defined to be multi-dimensional array indexed by rows, columns and dimensions--think of nested lists

When doing mathematical operations, usage of Numpy library is highly recommended because it is designed with high performance in mind--Numpy is largely written in C which makes computations much faster than just using Python code. In addition, Numpy arrays are stored more efficiently than an equivalent data structure in Python such as lists and arrays. 

Numpy is a third-party module, which means it is not part of Python's suite of built-in libraries. You don't have to worry about this since this is already in the environment we have set up for you.

**Import**

To use numpy, we have to import it first.

In [1]:
import numpy as np

`as` keyword allows us to create an alias for our imported library. In this case, we renamed `numpy` to `np`. This is a common naming convention for numpy. You'll see almost all implementations using numpy on the web using this alias.

We can also do the following to check its version.

In [2]:
print(np.__version__)

1.19.1


**Numpy Array Basics**

Some notes on numpy arrays:
 - all elements in a numpy array must be of the same type.
 - the size cannot be changed once construced.
 - support “vectorized” operations such as element-wise addition and multiplication.
 
## 1. Attributes
 
Numpy has built-in attributes that we can use. Here are some of them:
 - ndarray.ndim - number of axes or dimensions of the array.
 - ndarray.shape - the dimension of the array--a tuple of integers indicating the size of the array in each dimension.
 - ndarray.dtype - the type of the elements in the array. Numpy provides its own `int16`, `int32`, `float64` data types, among others.
 - ndarray.itemsize - size in bytes of each element of the array. For example an array of elements of type `float64` has itemsize of $\frac{64}{8} = 8$ and `complex32` has item size of $\frac{32}{8} = 4$.

In [3]:
arr = np.array([1, 2, 3, 4], dtype=float)
print('Type: ',type(arr))
print('Shape: ',arr.shape)
print('Dimension: ',arr.ndim)
print('Itemsize: ',arr.itemsize)
print('Size: ',arr.size)

Type:  <class 'numpy.ndarray'>
Shape:  (4,)
Dimension:  1
Itemsize:  8
Size:  4


**Mixed data types**

If we try to construct a numpy array from a list with mixed data types, it will automatically treat them as strings. But if we force it into a certain numeric data type, say float32, it will cause an error. 

In [4]:
arr = np.array([1, 2.0, "dsi"])  # notice that we did not pass an argument to dtype parameter
print("Datatype: ", arr.dtype)

Datatype:  <U32


## 2. Creating Arrays

The following are different ways of creating an array in Numpy aside from passing a list as seen in the examples above.

**np.arange**

arange creates an array based on the arguments passed. If only a single argument is passed--let's call this `n1`, it creates an array of size `n1` starting from 0 to `n1`-1. If two arguments (`n1` and `n2`) are passed, it creates an array starting from `n1` to `n2`-1.

In [5]:
np.arange(5, dtype=float)

array([0., 1., 2., 3., 4.])

In [6]:
np.arange(2, 5, dtype=float)

array([2., 3., 4.])

**np.ones** and **np.zeros**

Creates an array of 0s and 1s

In [7]:
np.ones(4)

array([1., 1., 1., 1.])

In [8]:
np.zeros(4)

array([0., 0., 0., 0.])

**np.linspace**

Creates an array having values of equal intervals.

In [9]:
np.linspace(1, 10, 10, dtype=float)

array([ 1.,  2.,  3.,  4.,  5.,  6.,  7.,  8.,  9., 10.])

In [10]:
np.linspace(1, 10, 4, dtype=float)

array([ 1.,  4.,  7., 10.])

**np.ones_like** and **np.zeros_like**

Creates an array of 0s and 1s based on input array/matrix.

In [11]:
arr = np.array([[1, 2, 3], [4, 5, 6]])

In [12]:
np.ones_like(arr)

array([[1, 1, 1],
       [1, 1, 1]])

In [13]:
np.zeros_like(arr)

array([[0, 0, 0],
       [0, 0, 0]])

**np.diag**

Creates a diagonal array

In [14]:
arr = [1, 5, 4]
np.diag(arr)

array([[1, 0, 0],
       [0, 5, 0],
       [0, 0, 4]])

**np.random**

Creates an array/matrix of random values.

In [15]:
np.random.randint(0, 10, size=(2, 3))  # matrix with dimension 2x3 containing integer values ranging from 0-9

array([[6, 0, 0],
       [0, 8, 0]])

In [16]:
np.random.random(size=(2, 3))  # matrix with dimension 2x3 containing float values ranging from 0-1

array([[0.02271835, 0.37488761, 0.08307888],
       [0.90593719, 0.20331943, 0.32805565]])

## 3. Accessing and Manipulating Arrays

Numpy allows us to do manipulations on an array/matrix.

**Indexing** and **Slicing**

This is similar to how you index/slice a list.

In [17]:
arr = np.arange(3, 10)
arr[6]

9

In [18]:
arr[:4]

array([3, 4, 5, 6])

We can also indicate the number of steps by adding another colon `:` and an integer number after the slice syntax.

In [19]:
arr[:4:2]

array([3, 5])

**Reshaping**

To reshape a matrix in Numpy, we use the `reshape()` method. It accepts a tuple indication the new dimensions of the matrix after transformation.

In [20]:
arr = np.arange(10)
arr.reshape(5, 2)

array([[0, 1],
       [2, 3],
       [4, 5],
       [6, 7],
       [8, 9]])

In [21]:
arr.reshape(2, 5)

array([[0, 1, 2, 3, 4],
       [5, 6, 7, 8, 9]])

**Concatenating**

We use the following methods in Numpy to do concatenation:
 - `np.concatenate()` - joins 1-dimensional arrays
 - `np.hstack()` - joins multi-dimensional arrays on the horizontal axis
 - `np.vstack()` - joins multi-dimensional arrays on the vertical axis

In [22]:
arr1 = np.arange(5)
arr2 = np.arange(5, 10)
arr3 = np.arange(10, 15)
np.concatenate([arr1, arr2])

array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])

In [23]:
np.concatenate([arr1, arr2, arr3])

array([ 0,  1,  2,  3,  4,  5,  6,  7,  8,  9, 10, 11, 12, 13, 14])

In [24]:
arr1 = np.random.random((2,1))
arr2 = np.random.random((2,3))

print('Array 1:\n', arr1)
print('Array 2:\n', arr2)

np.hstack([arr1, arr2])

Array 1:
 [[0.78811603]
 [0.76780946]]
Array 2:
 [[0.51714119 0.45220059 0.42040362]
 [0.696543   0.55529237 0.29080178]]


array([[0.78811603, 0.51714119, 0.45220059, 0.42040362],
       [0.76780946, 0.696543  , 0.55529237, 0.29080178]])

In [25]:
arr1 = np.random.random((1,2))
arr2 = np.random.random((4,2))

print('Array 1:\n', arr1)
print('Array 2:\n', arr2)

np.vstack([arr1, arr2])

Array 1:
 [[0.4971306  0.17554895]]
Array 2:
 [[0.68257367 0.68154157]
 [0.14494759 0.05046671]
 [0.84548115 0.47933083]
 [0.75885983 0.09451713]]


array([[0.4971306 , 0.17554895],
       [0.68257367, 0.68154157],
       [0.14494759, 0.05046671],
       [0.84548115, 0.47933083],
       [0.75885983, 0.09451713]])

**Splitting**

This is just the opposite of the concatenation methods we've seen earlier. The following are the methods we use for doing such:
 - `np.split()` - splits a 1-dimensional array. The first argument is the array we want to split. The second argument is a tuple of indices where we want the array to be split.
 - `np.hsplit()` - splits a multi-dimensional array on the horizontal axis
 - `np.vsplit()` - splits a multi-dimensional array on the vertical axis

In [26]:
arr = np.arange(10)
np.split(arr, (1, 3, 6))

[array([0]), array([1, 2]), array([3, 4, 5]), array([6, 7, 8, 9])]

In [27]:
arr = np.arange(20)
np.split(arr, (1, 10))

[array([0]),
 array([1, 2, 3, 4, 5, 6, 7, 8, 9]),
 array([10, 11, 12, 13, 14, 15, 16, 17, 18, 19])]

In [28]:
arr = np.random.random((6,5))
arr

array([[0.65433343, 0.77543641, 0.80061794, 0.22271445, 0.40918432],
       [0.83196024, 0.41290561, 0.50827685, 0.26174452, 0.26500302],
       [0.13453823, 0.24721646, 0.18870987, 0.21859293, 0.5548548 ],
       [0.1657117 , 0.25703694, 0.69648366, 0.76032382, 0.75141316],
       [0.5327367 , 0.72691193, 0.54856732, 0.76685818, 0.31767612],
       [0.70828076, 0.21462249, 0.43582992, 0.30508301, 0.48342847]])

In [29]:
arr1, arr2 = np.hsplit(arr, [2])
print('Split 1:\n', arr1)
print('Split 2:\n', arr2)

Split 1:
 [[0.65433343 0.77543641]
 [0.83196024 0.41290561]
 [0.13453823 0.24721646]
 [0.1657117  0.25703694]
 [0.5327367  0.72691193]
 [0.70828076 0.21462249]]
Split 2:
 [[0.80061794 0.22271445 0.40918432]
 [0.50827685 0.26174452 0.26500302]
 [0.18870987 0.21859293 0.5548548 ]
 [0.69648366 0.76032382 0.75141316]
 [0.54856732 0.76685818 0.31767612]
 [0.43582992 0.30508301 0.48342847]]


In [30]:
arr1, arr2, arr3 = np.vsplit(arr, [1,3])
print('Split 1:\n', arr1)
print('Split 2:\n', arr2)
print('Split 3:\n', arr3)

Split 1:
 [[0.65433343 0.77543641 0.80061794 0.22271445 0.40918432]]
Split 2:
 [[0.83196024 0.41290561 0.50827685 0.26174452 0.26500302]
 [0.13453823 0.24721646 0.18870987 0.21859293 0.5548548 ]]
Split 3:
 [[0.1657117  0.25703694 0.69648366 0.76032382 0.75141316]
 [0.5327367  0.72691193 0.54856732 0.76685818 0.31767612]
 [0.70828076 0.21462249 0.43582992 0.30508301 0.48342847]]


## 4. Matrix Operations


### 4.1 Arithmetic Operations

We can perform arithmetic operations using on Numpy matrices like in linear algebra. Be careful of the dimensions! Make sure that there is no mismatch for a particular operation that you will be using.

In [31]:
arr1 = np.arange(9).reshape((3,3))
arr2 = np.ones(9).reshape((3,3))

arr1 + arr2

array([[1., 2., 3.],
       [4., 5., 6.],
       [7., 8., 9.]])

In [32]:
arr1 - arr2

array([[-1.,  0.,  1.],
       [ 2.,  3.,  4.],
       [ 5.,  6.,  7.]])

In [33]:
arr1 * arr2  # note that this is an element-wise multiplication

array([[0., 1., 2.],
       [3., 4., 5.],
       [6., 7., 8.]])

In [34]:
arr1 / arr2  # note that this is an element-wise division

array([[0., 1., 2.],
       [3., 4., 5.],
       [6., 7., 8.]])

To do a proper matrix multiplication, we use the `np.dot` method.

In [35]:
np.dot(arr1, arr2)

array([[ 3.,  3.,  3.],
       [12., 12., 12.],
       [21., 21., 21.]])

### 4.2 Broadcasting

Broadcasting allows us to perform an arithmetic operation to a whole matrix using a scalar value. For example:

In [36]:
arr = np.arange(9).reshape((3,3))
arr

array([[0, 1, 2],
       [3, 4, 5],
       [6, 7, 8]])

In [37]:
arr + 5

array([[ 5,  6,  7],
       [ 8,  9, 10],
       [11, 12, 13]])

Notice that all the elements in the array have increased by 5. We can also do this for other arithmetic operations.

In [38]:
arr - 5

array([[-5, -4, -3],
       [-2, -1,  0],
       [ 1,  2,  3]])

In [39]:
arr * 5

array([[ 0,  5, 10],
       [15, 20, 25],
       [30, 35, 40]])

In [40]:
arr / 5

array([[0. , 0.2, 0.4],
       [0.6, 0.8, 1. ],
       [1.2, 1.4, 1.6]])

We can also broadcast using 1-d array .

In [41]:
arr1 = np.arange(12).reshape((4,3))
arr2 = np.ones(3)/2  # we broadcast using a scalar value of 2.
arr1

array([[ 0,  1,  2],
       [ 3,  4,  5],
       [ 6,  7,  8],
       [ 9, 10, 11]])

In [42]:
arr2

array([0.5, 0.5, 0.5])

In [43]:
arr1 - arr2

array([[-0.5,  0.5,  1.5],
       [ 2.5,  3.5,  4.5],
       [ 5.5,  6.5,  7.5],
       [ 8.5,  9.5, 10.5]])

In [44]:
arr1 * arr2

array([[0. , 0.5, 1. ],
       [1.5, 2. , 2.5],
       [3. , 3.5, 4. ],
       [4.5, 5. , 5.5]])

### 4.3 Other functions

Here are other useful methods that we typically use:

**Transpose**

This flips the original matrix

In [45]:
arr = np.arange(12).reshape((4,3))
arr

array([[ 0,  1,  2],
       [ 3,  4,  5],
       [ 6,  7,  8],
       [ 9, 10, 11]])

In [46]:
arr.T

array([[ 0,  3,  6,  9],
       [ 1,  4,  7, 10],
       [ 2,  5,  8, 11]])

**Aggregation methods**

We can use methods like sum, max, min, and std.

In [47]:
arr = np.arange(12).reshape((4,3))
arr.sum()

66

We can also specify which dimension to use for the aggregation.

In [48]:
arr.sum(axis=0)

array([18, 22, 26])

In [49]:
arr.sum(axis=1)

array([ 3, 12, 21, 30])

In [50]:
arr.max()

11

In [51]:
arr.max(axis=0)

array([ 9, 10, 11])

In [52]:
arr.min()

0

In [53]:
arr.std()

3.452052529534663

In [54]:
arr.std(axis=1)

array([0.81649658, 0.81649658, 0.81649658, 0.81649658])