# NumPy and Pandas 


### NumPy

It's a fundamental package for scientific computing in Python. NumPy provides Python with an extensive math library capable of performing numerical computations effectively and efficiently.

One great feature of NumPy is that it has multidimensional array data structures that can represent vectors and matrices. A lot of machine learning algorithms rely on matrix operations. For example, when training a Neural Network, you often have to carry out many matrix multiplications. NumPy is optimized for matrix operations and it allows us to do Linear Algebra operations effectively and efficiently, making it very suitable for solving machine learning problems.

At the core of NumPy is the ndarray, where nd stands for n-dimensional. An ndarray is a multidimensional array of elements all of the same type. In other words, an ndarray is a grid that can take on many shapes and can hold either numbers or strings. In many Machine Learning problems you will often find yourself using ndarrays in many different ways. For instance, you might use an ndarray to hold the pixel values of an image that will be fed into a Neural Network for image classification.

We can create ndarrays from other array-like objects such as Python lists (using the built-in np.array function) or by using built-in NumPy functions

#### Creating ndarrays from np.array

In [11]:
import numpy as np

x = np.array([1, 2, 3, 4, 5])
x

array([1, 2, 3, 4, 5])

In [12]:
type(x)

numpy.ndarray

NumPy arrays have a number of attributes that give us further info about them.

In [15]:
x.dtype

# gives us the data type of the elements in the array, not the data type of the array itself. Here the elements are stored in memomry assigned 32bits integers 

dtype('int32')

In [16]:
x.shape

(5,)

In [17]:
x.size

5

In [19]:
a = np.array([[1, 2, 3], [4, 5, 6], [7, 8, 9], [10, 11, 12]])

In [20]:
a

array([[ 1,  2,  3],
       [ 4,  5,  6],
       [ 7,  8,  9],
       [10, 11, 12]])

In [21]:
a.shape

(4, 3)

No of dimensions = rank of the array

In [22]:
a.ndim

# rank also equals 2

2

In [25]:
# specifying dtype
a = np.array([[1, 2, 3], [4, 5, 6], [7, 8, 9], [10, 11, 12]], dtype= np.float64)

In [26]:
a

array([[ 1.,  2.,  3.],
       [ 4.,  5.,  6.],
       [ 7.,  8.,  9.],
       [10., 11., 12.]])

To save an array: np.save()
To load an array: np.load()

In [27]:
np.save('my_array', a)

In [29]:
d = np.load('my_array.npy')

In [30]:
d

array([[ 1.,  2.,  3.],
       [ 4.,  5.,  6.],
       [ 7.,  8.,  9.],
       [10., 11., 12.]])

N/B: If you use mixed data types in your arrays, such integers and strings, NumPy does upcasting

#### Using built-in functions

N/B: To state the dtype, e.g dtype=int

In [31]:
# Generate an array of zeros. Takes shape as an argument
np.zeros((4,3))

array([[0., 0., 0.],
       [0., 0., 0.],
       [0., 0., 0.],
       [0., 0., 0.]])

In [33]:
# Generate an array of ones. Takes shape as an argument
np.ones((3,4))

array([[1., 1., 1., 1.],
       [1., 1., 1., 1.],
       [1., 1., 1., 1.]])

In [34]:
np.ones((3,4), dtype = int)

array([[1, 1, 1, 1],
       [1, 1, 1, 1],
       [1, 1, 1, 1]])

In [35]:
# Generate an array full of a constant
np.full((3,4), 7)

array([[7, 7, 7, 7],
       [7, 7, 7, 7],
       [7, 7, 7, 7]])

In [37]:
# Generate an identity matrix (square shaped matrix with ones along its main diagonal and zeros everywhere else)
np.eye((5))

array([[1., 0., 0., 0., 0.],
       [0., 1., 0., 0., 0.],
       [0., 0., 1., 0., 0.],
       [0., 0., 0., 1., 0.],
       [0., 0., 0., 0., 1.]])

In [38]:
# Generate a diagonal matrix wiith particular numbers along its diagonal
np.diag([23, 32, 48, 64, 65])

array([[23,  0,  0,  0,  0],
       [ 0, 32,  0,  0,  0],
       [ 0,  0, 48,  0,  0],
       [ 0,  0,  0, 64,  0],
       [ 0,  0,  0,  0, 65]])

In [39]:
# Generating an array. It takes three arguments; start, stop, step
np.arange(2, 20, 2)

array([ 2,  4,  6,  8, 10, 12, 14, 16, 18])

In [43]:
# When we want non-integer steps, it's preferable to use linspace. It gives n-evenly spaced numbers from start to stop.
#It takes start, stop, and n
# If n is not specified, it defaults to 50
# To get the stop to be exclusive, set endpoint=False
np.linspace(1, 10, 5)

array([ 1.  ,  3.25,  5.5 ,  7.75, 10.  ])

The reshape function which can also be used as a method

In [49]:
np.reshape(np.linspace(2, 40, 35, endpoint=False), (7,5))

array([[ 2.        ,  3.08571429,  4.17142857,  5.25714286,  6.34285714],
       [ 7.42857143,  8.51428571,  9.6       , 10.68571429, 11.77142857],
       [12.85714286, 13.94285714, 15.02857143, 16.11428571, 17.2       ],
       [18.28571429, 19.37142857, 20.45714286, 21.54285714, 22.62857143],
       [23.71428571, 24.8       , 25.88571429, 26.97142857, 28.05714286],
       [29.14285714, 30.22857143, 31.31428571, 32.4       , 33.48571429],
       [34.57142857, 35.65714286, 36.74285714, 37.82857143, 38.91428571]])

In [50]:
np.linspace(2, 40, 35, endpoint=False).reshape(7,5)

array([[ 2.        ,  3.08571429,  4.17142857,  5.25714286,  6.34285714],
       [ 7.42857143,  8.51428571,  9.6       , 10.68571429, 11.77142857],
       [12.85714286, 13.94285714, 15.02857143, 16.11428571, 17.2       ],
       [18.28571429, 19.37142857, 20.45714286, 21.54285714, 22.62857143],
       [23.71428571, 24.8       , 25.88571429, 26.97142857, 28.05714286],
       [29.14285714, 30.22857143, 31.31428571, 32.4       , 33.48571429],
       [34.57142857, 35.65714286, 36.74285714, 37.82857143, 38.91428571]])

To generate random numbers`

In [53]:
# Between 0 and 1
np.random.random((3,4))

array([[0.06736711, 0.16084858, 0.39170001, 0.07898757],
       [0.56180501, 0.92584618, 0.55386197, 0.03261331],
       [0.76832232, 0.65041827, 0.70347703, 0.85431548]])

In [54]:
# Random integers within a particular interval. Upper bound is exclusive
np.random.randint(2, 10, (5,5))

array([[9, 7, 8, 5, 9],
       [3, 5, 3, 2, 9],
       [9, 4, 4, 8, 3],
       [2, 7, 3, 5, 3],
       [3, 6, 4, 4, 2]])

In [56]:
# You may want NumPy arrays with random numbers that satisfy certain satistical properties e.g mean=0
# You may also want a random array with numbers drawn from a certain statistical distribution
np.random.normal(0,0.1, size=(5,5))

array([[-0.04249837,  0.13793983, -0.06123047, -0.03379478, -0.11895154],
       [-0.07039167, -0.0374524 , -0.19799631,  0.03460748,  0.02979442],
       [-0.05763738, -0.05489279, -0.07056053, -0.04360023,  0.11102362],
       [ 0.00728827, -0.12557582, -0.0390334 ,  0.147382  , -0.19473876],
       [ 0.07031462,  0.06034209, -0.05135857,  0.07420587, -0.02627166]])

#### Accessing, Deleting and Inserting Elements in ndarray. 
NumPy arrays are mutable and ordered

In [58]:
x = np.array([1, 2, 3, 4, 5])
x

array([1, 2, 3, 4, 5])

In [59]:
x[0]

1

In [60]:
x[-1]

5

In [61]:
x[4]

5

Modifying elements in arrays

In [62]:
x[3] = 20

In [63]:
x

array([ 1,  2,  3, 20,  5])

In [73]:
y = np.reshape(np.arange(1, 10),(3,3))
y

array([[1, 2, 3],
       [4, 5, 6],
       [7, 8, 9]])

In [72]:
y[0,0]

1

In [74]:
y[1,2] = 28

In [75]:
y

array([[ 1,  2,  3],
       [ 4,  5, 28],
       [ 7,  8,  9]])

Deleting elements in ndarrays

Here, we use the delete function; np.delete. Here, we introduce the axis (for rank 2 arrays). The arguments are aray, index, axis

In [77]:
x

array([ 1,  2,  3, 20,  5])

In [78]:
np.delete(x, 2)

array([ 1,  2, 20,  5])

In [79]:
x

array([ 1,  2,  3, 20,  5])

In [80]:
np.delete(x, [1,-1])

array([ 1,  3, 20])

In [81]:
y

array([[ 1,  2,  3],
       [ 4,  5, 28],
       [ 7,  8,  9]])

In [83]:
np.delete(y, 2, axis=1)

array([[1, 2],
       [4, 5],
       [7, 8]])

In [84]:
np.delete(y, [2,1], axis=1)

array([[1],
       [4],
       [7]])

In [85]:
y[1,1] = 39

In [86]:
y

array([[ 1,  2,  3],
       [ 4, 39, 28],
       [ 7,  8,  9]])

#### Add elements to a NumPy array
Here, we use the append function.
The arguments are; array, list of elements, axis

In [95]:
np.append(x, 10)

array([ 1,  2,  3, 20,  5, 10])

In [96]:
np.append(x, [10, 335])

array([  1,   2,   3,  20,   5,  10, 335])

In [92]:
np.append(y, [[10, 20, 30]], axis = 0)

array([[ 1,  2,  3],
       [ 4, 39, 28],
       [ 7,  8,  9],
       [10, 20, 30]])

In [94]:
np.append(y, [[10], [20], [30]], axis = 1)

array([[ 1,  2,  3, 10],
       [ 4, 39, 28, 20],
       [ 7,  8,  9, 30]])

#### Insert elements at a position into NumPy arrays
we use the insert function. Arguments are array, index, elements, axis

In [97]:
x

array([ 1,  2,  3, 20,  5])

In [98]:
np.insert(x, 2, 8)

array([ 1,  2,  8,  3, 20,  5])

In [99]:
y

array([[ 1,  2,  3],
       [ 4, 39, 28],
       [ 7,  8,  9]])

In [100]:
np.insert(y, 1, [[23], [54], [76]], axis=1)

array([[ 1, 23, 54, 76,  2,  3],
       [ 4, 23, 54, 76, 39, 28],
       [ 7, 23, 54, 76,  8,  9]])

In [101]:
np.insert(y, 1, [23, 54, 76], axis=1)

array([[ 1, 23,  2,  3],
       [ 4, 54, 39, 28],
       [ 7, 76,  8,  9]])

In [102]:
np.insert(y, 1, [23, 54, 76], axis=0)

array([[ 1,  2,  3],
       [23, 54, 76],
       [ 4, 39, 28],
       [ 7,  8,  9]])

In [103]:
np.insert(y, 1, 5, axis=0)

array([[ 1,  2,  3],
       [ 5,  5,  5],
       [ 4, 39, 28],
       [ 7,  8,  9]])

In [104]:
np.insert(y, 1, 5, axis=1)

array([[ 1,  5,  2,  3],
       [ 4,  5, 39, 28],
       [ 7,  5,  8,  9]])

#### Stacking arrays
There can be vertical stacking using the vstack function or horizontal stacking using the hstack function

In [106]:
x

array([ 1,  2,  3, 20,  5])

In [107]:
y

array([[ 1,  2,  3],
       [ 4, 39, 28],
       [ 7,  8,  9]])

In [108]:
b = np.delete(x, [-1, -2])

In [109]:
b

array([1, 2, 3])

In [112]:
np.vstack((y, b))

array([[ 1,  2,  3],
       [ 4, 39, 28],
       [ 7,  8,  9],
       [ 1,  2,  3]])

In [114]:
np.hstack((y, b.reshape(3,1)))

array([[ 1,  2,  3,  1],
       [ 4, 39, 28,  2],
       [ 7,  8,  9,  3]])

## Slicing ndarrays
* ndarray[start:end]
* ndarray[start:]
* ndarray[:end]

* For @D arrays, you have:
* ndarray[row slice, column slice]

In [116]:
x[2:]

array([ 3, 20,  5])

In [121]:
e = np.arange(20).reshape(4, 5)
e

array([[ 0,  1,  2,  3,  4],
       [ 5,  6,  7,  8,  9],
       [10, 11, 12, 13, 14],
       [15, 16, 17, 18, 19]])

In [125]:
w = e[1:,2:5]
w

array([[ 7,  8,  9],
       [12, 13, 14],
       [17, 18, 19]])

In [126]:
w

array([[ 7,  8,  9],
       [12, 13, 14],
       [17, 18, 19]])

When we subset and save into a new variable, whatever change we make to the new variable also affects the parent array. To prevent this, we use the copy function. It can also be used as a method.

In [127]:
w[1,1] = 333

In [128]:
w

array([[  7,   8,   9],
       [ 12, 333,  14],
       [ 17,  18,  19]])

In [129]:
e

array([[  0,   1,   2,   3,   4],
       [  5,   6,   7,   8,   9],
       [ 10,  11,  12, 333,  14],
       [ 15,  16,  17,  18,  19]])

In [130]:
w = e[1:,2:5].copy

We can also use an array as indices to slice an array

In [131]:
indices = np.array([1,3])
Y = e[indices,:]

In [132]:
Y

array([[ 5,  6,  7,  8,  9],
       [15, 16, 17, 18, 19]])

In [133]:
e[[1,3], :]

array([[ 5,  6,  7,  8,  9],
       [15, 16, 17, 18, 19]])

Selecting random rows

In [134]:
X = np.random.randint(1,20, size=(50,5))
row_indices = np.random.randint(0,50, size=10)
X_subset = X[row_indices, :]

In [135]:
X_subset

array([[ 6, 19, 12, 17, 19],
       [19,  1, 16, 16,  4],
       [19,  3,  6, 10, 17],
       [ 8, 10, 10, 15, 10],
       [10, 15,  1, 15, 19],
       [ 3,  4,  6, 13,  5],
       [ 3, 18,  9, 17, 10],
       [ 9,  2, 16, 19, 13],
       [11, 17, 17,  9, 14],
       [19,  3,  6, 10, 17]])

When trying to get a single column, it might not be presented as a column. You'll need to use the :

In [137]:
q = X[:,2]
q

array([ 7, 13, 12,  6,  5, 11, 16, 13,  9,  7,  5,  6,  7,  2,  2, 15, 17,
        9,  1, 11,  2,  5, 10,  7, 19,  1,  6, 19, 16, 17, 19, 14, 16, 10,
        5, 12, 16, 11, 16, 10, 15,  4, 11,  5, 19, 17, 19,  6,  3,  3])

In [138]:
R = X[:,2:3]
R

array([[ 7],
       [13],
       [12],
       [ 6],
       [ 5],
       [11],
       [16],
       [13],
       [ 9],
       [ 7],
       [ 5],
       [ 6],
       [ 7],
       [ 2],
       [ 2],
       [15],
       [17],
       [ 9],
       [ 1],
       [11],
       [ 2],
       [ 5],
       [10],
       [ 7],
       [19],
       [ 1],
       [ 6],
       [19],
       [16],
       [17],
       [19],
       [14],
       [16],
       [10],
       [ 5],
       [12],
       [16],
       [11],
       [16],
       [10],
       [15],
       [ 4],
       [11],
       [ 5],
       [19],
       [17],
       [19],
       [ 6],
       [ 3],
       [ 3]])

We can extract the elements along the diagonal of a NumPy array by using the diag function. It takes the optional argument k to select above or below the main diagonal

In [139]:
e

array([[  0,   1,   2,   3,   4],
       [  5,   6,   7,   8,   9],
       [ 10,  11,  12, 333,  14],
       [ 15,  16,  17,  18,  19]])

In [140]:
np.diag([10, 20, 30, 40])

array([[10,  0,  0,  0],
       [ 0, 20,  0,  0],
       [ 0,  0, 30,  0],
       [ 0,  0,  0, 40]])

In [141]:
np.diag(e)

array([ 0,  6, 12, 18])

In [142]:
np.diag(e, k=1)

array([  1,   7, 333,  19])

In [143]:
np.diag(e, k=-1)

array([ 5, 11, 17])

We can also obtain the unique elements in a NumPy array by using the unique function

In [145]:
e

array([[  0,   1,   2,   3,   4],
       [  5,   6,   7,   8,   9],
       [ 10,  11,  12, 333,  14],
       [ 15,  16,  17,  18,  19]])

In [146]:
np.unique(e)

array([  0,   1,   2,   3,   4,   5,   6,   7,   8,   9,  10,  11,  12,
        14,  15,  16,  17,  18,  19, 333])

#### Boolean Indexing, Set Operations, and Sorting

In [147]:
e

array([[  0,   1,   2,   3,   4],
       [  5,   6,   7,   8,   9],
       [ 10,  11,  12, 333,  14],
       [ 15,  16,  17,  18,  19]])

In [148]:
e[e > 10]

array([ 11,  12, 333,  14,  15,  16,  17,  18,  19])

In [151]:
e[(e > 10) & (e < 15)]

array([11, 12, 14])

We can also use boolean indexing for assignment


In [153]:
e[(e > 10) & (e < 15)] = 12

In [154]:
e

array([[  0,   1,   2,   3,   4],
       [  5,   6,   7,   8,   9],
       [ 10,  12,  12, 333,  12],
       [ 15,  16,  17,  18,  19]])

Set operations using NumPy arrays

In [155]:
e

array([[  0,   1,   2,   3,   4],
       [  5,   6,   7,   8,   9],
       [ 10,  12,  12, 333,  12],
       [ 15,  16,  17,  18,  19]])

In [156]:
y

array([[ 1,  2,  3],
       [ 4, 39, 28],
       [ 7,  8,  9]])

In [159]:
print(np.intersect1d(e, y))
print(np.setdiff1d(e, y))
print(np.union1d(e, y))

[1 2 3 4 7 8 9]
[  0   5   6  10  12  15  16  17  18  19 333]
[  0   1   2   3   4   5   6   7   8   9  10  12  15  16  17  18  19  28
  39 333]


NumPy also allows sorting its elements by using sort either as a method or as a function
As a function, the array is sorted out of place
As a method, the array is sorted in place

N/B: All the powerful scientific computations possible with arrays is what makes storing our elements in arrays so catchy

In [161]:
np.sort(e)

array([[  0,   1,   2,   3,   4],
       [  5,   6,   7,   8,   9],
       [ 10,  12,  12,  12, 333],
       [ 15,  16,  17,  18,  19]])

In [162]:
e.sort()

In [163]:
e

array([[  0,   1,   2,   3,   4],
       [  5,   6,   7,   8,   9],
       [ 10,  12,  12,  12, 333],
       [ 15,  16,  17,  18,  19]])

In [165]:
# for rank 2 arrays, we can also set axis
np.sort(y, axis=0)

array([[ 1,  2,  3],
       [ 4,  8,  9],
       [ 7, 39, 28]])

In [166]:
np.sort(y, axis=1)

array([[ 1,  2,  3],
       [ 4, 28, 39],
       [ 7,  8,  9]])

#### Arithmetic Operations

In [167]:
e

array([[  0,   1,   2,   3,   4],
       [  5,   6,   7,   8,   9],
       [ 10,  12,  12,  12, 333],
       [ 15,  16,  17,  18,  19]])

In [168]:
y

array([[ 1,  2,  3],
       [ 4, 39, 28],
       [ 7,  8,  9]])

In [170]:
x = np.array([1,2,3,4])
y = np.array([5.5,6.5,7.5,8.5])

In [171]:
x - y

array([-4.5, -4.5, -4.5, -4.5])

In [172]:
np.add(x,y)

array([ 6.5,  8.5, 10.5, 12.5])

In [173]:
# np.add, np.subtract, np.divide, np.multiply

In [174]:
X = np.array([1,2,3,4]).reshape(2,2)
Y = np.array([5.5,6.5,7.5,8.5]).reshape(2,2)

In [176]:
np.exp(x)
np.sqrt(x)
np.power(x,2)

array([ 1,  4,  9, 16], dtype=int32)

NumPy also has some functions for statistical function
* np.sum()
* np.mean()
* np.std()
* np.min()
* np.max()

In [179]:
np.add(e, 3)

array([[  3,   4,   5,   6,   7],
       [  8,   9,  10,  11,  12],
       [ 13,  15,  15,  15, 336],
       [ 18,  19,  20,  21,  22]])

In [180]:
e

array([[  0,   1,   2,   3,   4],
       [  5,   6,   7,   8,   9],
       [ 10,  12,  12,  12, 333],
       [ 15,  16,  17,  18,  19]])