# NumPy

NumPy is the most basic but powerful package for working with data in Python. At the core, NumPy provides the excellent ndarray objects, short for n-dimensional arrays. This Notebook can be skipped at first. Most parts of it will be explained in the Machine Learning Notebooks.

<img src="./resources/numpy.png"  style="height: 200px"/>

## 1. Creating a NumPy array

The most common way is to create an array from a list by passing it to the `np.array` function.

In [3]:
# create an 1d array from a list
import numpy as np

list1 = [0, 1, 2, 3, 4]
arr1d = np.array(list1)

# print the array and its type
print(type(arr1d))
arr1d

<class 'numpy.ndarray'>


array([0, 1, 2, 3, 4])

What is the key difference between an array and a list? Arrays are designed to handle vectorized operations while a Python list is not. This means, if you apply a function it is performed __on every item in the array__, rather than on the whole array object.

In [4]:
# list1 + 2  # error

# add 2 to each element of arr1d
arr1d = arr1d + 2
arr1d

array([2, 3, 4, 5, 6])

Another characteristic is that, once a NumPy array is created, you cannot increase its size. To do so, you will have to create a new array. Nevertheless, there are so many advantages. 

In [5]:
# create a 2d array from a list of lists
list2 = [[0, 1, 2], [3, 4, 5], [6, 7, 8]]
arr2d = np.array(list2)
arr2d

array([[0, 1, 2],
       [3, 4, 5],
       [6, 7, 8]])

You may also specify the datatype by setting the dtype argument: 'float', 'int', 'bool', 'str' and 'object'.

In [6]:
# create a float 2d array
arr2d_f = np.array(list2, dtype='float')
arr2d_f

array([[0., 1., 2.],
       [3., 4., 5.],
       [6., 7., 8.]])

Every array has some properties.

In [7]:
# create a 2d array with 3 rows and 4 columns
list2 = [[1, 2, 3, 4], [3, 4, 5, 6], [5, 6, 7, 8]]
arr2 = np.array(list2, dtype='float')
print(arr2)

# shape
print('Shape: ', arr2.shape)
# dtype
print('Datatype: ', arr2.dtype)
# size
print('Size: ', arr2.size)
# ndim
print('Num Dimensions: ', arr2.ndim)

[[1. 2. 3. 4.]
 [3. 4. 5. 6.]
 [5. 6. 7. 8.]]
Shape:  (3, 4)
Datatype:  float64
Size:  12
Num Dimensions:  2


## 2. Extracting specific items from an array

You can extract specific portions of an array using indexing, starting with 0. NumPy arrays can accept as many parameters in the square brackets as the number of dimensions.

In [8]:
print(arr2)

# extract the first 2 rows and columns
arr2[:2, :2]

# list2[:2, :2]  # error

[[1. 2. 3. 4.]
 [3. 4. 5. 6.]
 [5. 6. 7. 8.]]


array([[1., 2.],
       [3., 4.]])

## 3. Computing mean, min, max

The np.array has methods to compute mean, min and max for the whole array.

In [9]:
print(arr2)

# mean, max and min
print("Mean value is: ", arr2.mean())
print("Max value is: ", arr2.max())
print("Min value is: ", arr2.min())

# row wise and column wise min
print("Column wise minimum: ", np.amin(arr2, axis=0))
print("Row wise minimum: ", np.amin(arr2, axis=1))

[[1. 2. 3. 4.]
 [3. 4. 5. 6.]
 [5. 6. 7. 8.]]
Mean value is:  4.5
Max value is:  8.0
Min value is:  1.0
Column wise minimum:  [1. 2. 3. 4.]
Row wise minimum:  [1. 3. 5.]


## 4. Creating a new array from an existing array

If you just assign a portion of an array to another array, the new array you just created actually __refers to the parent array__ in memory. That means, if you make any changes to the new array, it will reflect in the parent array as well.

So to avoid disturbing the parent array, you need to make a copy of it using copy().

In [10]:
# assign portion of arr2 to arr2a - doesn't really create a new array.
arr2a = arr2[:2,:2]  
arr2a[:1, :1] = 100  # 100 will reflect in arr2
print(arr2)

# copy portion of arr2 to arr2b
arr2b = arr2[:2, :2].copy()
arr2b[:1, :1] = 101  # 101 will not reflect in arr2
print(arr2)

[[100.   2.   3.   4.]
 [  3.   4.   5.   6.]
 [  5.   6.   7.   8.]]
[[100.   2.   3.   4.]
 [  3.   4.   5.   6.]
 [  5.   6.   7.   8.]]


## 5. Reshaping and Flattening multidimensional arrays

In [11]:
# reshape a 3x4 array to 4x3 array
print(arr2.reshape(4, 3))

print(arr2)

[[100.   2.   3.]
 [  4.   3.   4.]
 [  5.   6.   5.]
 [  6.   7.   8.]]
[[100.   2.   3.   4.]
 [  3.   4.   5.   6.]
 [  5.   6.   7.   8.]]


There are 2 popular ways to implement flattening: the flatten() method and the other using the ravel() method. The difference between ravel and flatten is: the new array created using ravel is actually a reference to the parent array.

In [12]:
list2 = [[1, 2, 3, 4], [3, 4, 5, 6], [5, 6, 7, 8]]
arr2 = np.array(list2, dtype='float')

# changing the flattened array does not change parent
b1 = arr2.flatten()  
b1[0] = 100  # changing b1 does not affect arr2
print(b1)
print(arr2)

# changing the raveled array changes the parent also.
b2 = arr2.ravel()  
b2[0] = 101  # changing b2 changes arr2 also
print(b2)
print(arr2)

[100.   2.   3.   4.   3.   4.   5.   6.   5.   6.   7.   8.]
[[1. 2. 3. 4.]
 [3. 4. 5. 6.]
 [5. 6. 7. 8.]]
[101.   2.   3.   4.   3.   4.   5.   6.   5.   6.   7.   8.]
[[101.   2.   3.   4.]
 [  3.   4.   5.   6.]
 [  5.   6.   7.   8.]]


## 6. Creating sequences

The np.arange function comes handy to create customised number sequences as ndarray.

In [13]:
# lower limit is 0 be default
print(np.arange(5))  

# 0 to 9
print(np.arange(0, 10))  

# 0 to 9 with step of 2
print(np.arange(0, 10, 2))  

# 10 to 1, decreasing order
print(np.arange(10, 0, -1))

[0 1 2 3 4]
[0 1 2 3 4 5 6 7 8 9]
[0 2 4 6 8]
[10  9  8  7  6  5  4  3  2  1]


## 7. Visual Intro to NumPy

Since NumPy is such an important library for datascience, there are many tutorials out there.
One of the best we've seen so far, is the one by Jay Alammar, who explains numpy is a visual stunning way.
Link: https://jalammar.github.io/visual-numpy/

<img src="./resources/Visual_Repr_numpy.png"  style="height: 400px"/>