What is NumPy?

NumPy stands for ‘Numerical Python’ or ‘Numeric Python’. 
It is an open source module of Python which provides fast mathematical computation on arrays and matrices.
Since, arrays and matrices are an essential part of the Machine Learning ecosystem, NumPy along 
with Machine Learning modules like Scikit-learn, Pandas, Matplotlib, TensorFlow, etc. 
complete the Python Machine Learning Ecosystem.

NumPy provides the essential multi-dimensional array-oriented computing functionalities designed 
for high-level mathematical functions and scientific computation. Numpy can be imported into the notebook using

### For NumPy official documentation <a href="https://numpy.org/doc/stable/index.html" title = "NumPy"> Click here</a>

In [2]:
import numpy as np

NumPy’s main object is the homogeneous multidimensional array. 
It is a table with same type elements, i.e, integers or string or characters (homogeneous), usually integers.
In NumPy, dimensions are called axes. The number of axes is called the rank.

There are several ways to create an array in NumPy like np.array, np.zeros, no.ones, etc. 
Each of them provides some flexibility.

![Screenshot%202022-09-12%20at%204.08.32%20PM.png](attachment:Screenshot%202022-09-12%20at%204.08.32%20PM.png)


![Screenshot%202022-09-12%20at%204.09.23%20PM.png](attachment:Screenshot%202022-09-12%20at%204.09.23%20PM.png)

![Screenshot%202022-09-12%20at%204.10.14%20PM.png](attachment:Screenshot%202022-09-12%20at%204.10.14%20PM.png)

Some of the important attributes of a NumPy object are:

1.Ndim: displays the dimension of the array
2.Shape: returns a tuple of integers indicating the size of the array
3.Size: returns the total number of elements in the NumPy array
4.Dtype: returns the type of elements in the array, i.e., int64, character
5.Itemsize: returns the size in bytes of each item
6.Reshape: Reshapes the NumPy array
7.NumPy array elements can be accessed using indexing. Below are some of the useful examples:

1.A[2:5] will print items 2 to 4. Index in NumPy arrays starts from 0
2.A[2::2] will print items 2 to end skipping 2 items
3.A[::-1] will print the array in the reverse order
4.A[1:] will print from row 1 to end

Vectors and Machine learning

Machine learning uses vectors. Vectors are one-dimensional arrays. It can be represented either as a row or as a column array.

What are vectors? Vector quantity is the one which is defined by a magnitude and a direction. For example, force is a vector quantity. It is defined by the magnitude of force as well as a direction. It can be represented as an array [a,b] of 2 numbers = [2,180] where ‘a’ may represent the magnitude of 2 Newton and 180 (‘b’) represents the angle in degrees.

Another example, say a rocket is going up at a slight angle: it has a vertical speed of 5,000 m/s, and also a slight speed towards the East at 10 m/s, and a slight speed towards the North at 50 m/s. The rocket’s velocity may be represented by the following vector: [10, 50, 5000] which represents the speed in each of x, y, and z-direction.

Similarly, vectors have several usages in Machine Learning, most notably to represent observations and predictions.

For example, say we built a Machine Learning system to classify videos into 3 categories (good, spam, clickbait) based on what we know about them. For each video, we would have a vector representing what we know about it, such as: [10.5, 5.2, 3.25, 7.0]. This vector could represent a video that lasts 10.5 minutes, but only 5.2% viewers watch for more than a minute, it gets 3.25 views per day on average, and it was flagged 7 times as spam.

As you can see, each axis may have a different meaning. Based on this vector, our Machine Learning system may predict that there is an 80% probability that it is a spam video, 18% that it is clickbait, and 2% that it is a good video. This could be represented as the following vector: class_probabilities = [0.8,0.18,0.02].

As can be observed, vectors can be used in Machine Learning to define observations and predictions. The properties representing the video, i.e., duration, percentage of viewers watching for more than a minute are called features. 

Since the majority of the time of building machine learning models would be spent in data processing, it is important to be familiar to the libraries that can help in processing such data.

Why NumPy and Pandas over regular Python arrays?

In python, a vector can be represented in many ways, the simplest being a regular python list of numbers. Since Machine Learning requires lots of scientific calculations, it is much better to use NumPy’s ndarray, which provides a lot of convenient and optimized implementations of essential mathematical operations on vectors.

Vectorized operations perform faster than matrix manipulation operations performed using loops in python. For example, to carry out a 100 * 100 matrix multiplication, vector operations using NumPy are two orders of magnitude faster than performing it using loops.

Some ways in which NumPy arrays are different from normal Python arrays are:

1.If you assign a single value to a ndarray slice, it is copied across the whole slice. So, it is easier to assign values to a slice of an array in a NumPy array as compared to a normal array wherein it may have to be done using loops.


![Screenshot%202022-09-12%20at%204.14.08%20PM.png](attachment:Screenshot%202022-09-12%20at%204.14.08%20PM.png)

2. ndarray slices are actually views on the same data buffer. If you modify it, it is going to modify the original ndarray as well.

![Screenshot%202022-09-12%20at%204.17.09%20PM.png](attachment:Screenshot%202022-09-12%20at%204.17.09%20PM.png)

If we need a copy of the NumPy array, we need to use the copy method as another_slice = another_slice = a[2:6].copy(). If we modify another_slice, a remains same
3.The way multidimensional arrays are accessed using NumPy is different from how they are accessed in normal python arrays. The generic format in NumPy multi-dimensional arrays is:
Array[row_start_index:row_end_index, column_start_index: column_end_index]

NumPy arrays can also be accessed using boolean indexing. For example,

In [8]:
a = np.arange(12).reshape(3, 4)
print(a)

[[ 0  1  2  3]
 [ 4  5  6  7]
 [ 8  9 10 11]]


In [9]:
rows_on = np.array([True, False, True])
rows_on

array([ True, False,  True])

In [10]:
a[rows_on , : ]

array([[ 0,  1,  2,  3],
       [ 8,  9, 10, 11]])

NumPy arrays are capable of performing all basic operations such as addition, subtraction, element-wise product, matrix dot product, element-wise division, element-wise modulo, element-wise exponents and conditional operations.

An important feature with NumPy arrays is broadcasting.

![Screenshot%202022-09-12%20at%204.25.50%20PM.png](attachment:Screenshot%202022-09-12%20at%204.25.50%20PM.png)

In general, when NumPy expects arrays of the same shape but finds that this is not the case, it applies the so-called broadcasting rules.

Basically, there are 2 rules of Broadcasting to remember:

1.For the arrays that do not have the same rank, then a 1 will be prepended to the smaller ranking arrays until their ranks match. For example, when adding arrays A and B of sizes (3,3) and (,3) [rank 2 and rank 1], 1 will be prepended to the dimension of array B to make it (1,3) [rank=2]. The two sets are compatible when their dimensions are equal or either one of the dimension is 1. 
2.When either of the dimensions compared is one, the other is used. In other words, dimensions with size 1 are stretched or “copied” to match the other. For example, upon adding a 2D array A of shape (3,3) to a 2D ndarray B of shape (1, 3). NumPy will apply the above rule of broadcasting. It shall stretch the array B and replicate the first row 3 times to make array B of dimensions (3,3) and perform the operation.

NumPy provides basic mathematical and statistical functions like mean, min, max, sum, prod, std, var, summation across different axes, transposing of a matrix, etc.

A particular NumPy feature of interest is solving a system of linear equations. NumPy has a function to solve linear equations. For example,
2x + 6y = 6
5x + 3y = -9

Can be solved in NumPy using

In [17]:
import numpy as np
from scipy import linalg
coeffs  = np.array([[2, 6], [5, 3]])
depvars = np.array([6, -9])
solution = linalg.solve(coeffs, depvars)
solution

array([-3.,  2.])

# NUMPY EXAMPLES: CREATING ARRAY

In [24]:
#creating arrays
np.zeros(10, dtype='int')

array([0, 0, 0, 0, 0, 0, 0, 0, 0, 0])

In [25]:
#creating a 3 row x 5 column matrix
np.ones((3,5), dtype=float)

array([[1., 1., 1., 1., 1.],
       [1., 1., 1., 1., 1.],
       [1., 1., 1., 1., 1.]])

In [26]:
#creating a matrix with a predefined value
np.full((3,5),1.23)

array([[1.23, 1.23, 1.23, 1.23, 1.23],
       [1.23, 1.23, 1.23, 1.23, 1.23],
       [1.23, 1.23, 1.23, 1.23, 1.23]])

In [27]:
#create an array with a set sequence
np.arange(0, 20, 2)

array([ 0,  2,  4,  6,  8, 10, 12, 14, 16, 18])

In [28]:
#create an array of even space between the given range of values
np.linspace(0, 1, 5)

array([0.  , 0.25, 0.5 , 0.75, 1.  ])

In [29]:
#create a 3x3 array with mean 0 and standard deviation 1 in a given dimension
np.random.normal(0, 1, (3,3))

array([[ 0.46532332,  2.00427159, -0.34529543],
       [ 0.51593714, -1.60872286,  0.06505919],
       [-0.29798514, -0.21038031,  0.2669534 ]])

In [30]:
#create an identity matrix
np.eye(3)

array([[1., 0., 0.],
       [0., 1., 0.],
       [0., 0., 1.]])

In [31]:
#set a random seed
np.random.seed(0)


x1 = np.random.randint(10, size=6) #one dimension
x2 = np.random.randint(10, size=(3,4)) #two dimension
x3 = np.random.randint(10, size=(3,4,5)) #three dimension


print("x3 ndim:", x3.ndim)
print("x3 shape:", x3.shape)
print("x3 size: ", x3.size)

x3 ndim: 3
x3 shape: (3, 4, 5)
x3 size:  60


# NUMPY EXAMPLES: ARRAY INDEXING

The important thing to remember is that indexing in python starts at zero.

In [32]:
x1 = np.array([4, 3, 4, 4, 8, 4])
x1

array([4, 3, 4, 4, 8, 4])

In [33]:
#assess value to index zero
x1[0]

4

In [34]:
#assess fifth value
x1[4]

8

In [35]:
#get the last value
x1[-1]

4

In [36]:
#get the second last value
x1[-2]

8

In [37]:
#in a multidimensional array, we need to specify row and column index
x2

array([[3, 5, 2, 4],
       [7, 6, 8, 8],
       [1, 6, 7, 7]])

In [38]:
#1st row and 2nd column value
x2[2,3]

7

In [39]:
#3rd row and last value from the 3rd column
x2[2,-1]

7

In [40]:
#replace value at 0,0 index
x2[0,0] = 12
x2

array([[12,  5,  2,  4],
       [ 7,  6,  8,  8],
       [ 1,  6,  7,  7]])

# Array Slicing

Now, we'll learn to access multiple or a range of elements from an array.

In [41]:
x = np.arange(10)
x

array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])

In [42]:
#from start to 4th position
x[:5]

array([0, 1, 2, 3, 4])

In [43]:
#from 4th position to end
x[4:]

array([4, 5, 6, 7, 8, 9])

In [44]:
#from 4th to 6th position
x[4:7]

array([4, 5, 6])

In [45]:
#return elements at even place
x[ : : 2]

array([0, 2, 4, 6, 8])

In [46]:
#return elements from first position step by two
x[1::2]

array([1, 3, 5, 7, 9])

In [47]:
#reverse the array
x[::-1]

array([9, 8, 7, 6, 5, 4, 3, 2, 1, 0])

# Array Concatenation

Many a time, we are required to combine different arrays. So, instead of typing each of their elements manually, you can use array concatenation to handle such tasks easily.

In [48]:
#You can concatenate two or more arrays at once.
x = np.array([1, 2, 3])
y = np.array([3, 2, 1])
z = [21,21,21]
np.concatenate([x, y,z])

array([ 1,  2,  3,  3,  2,  1, 21, 21, 21])

In [49]:
#You can also use this function to create 2-dimensional arrays.
grid = np.array([[1,2,3],[4,5,6]])
np.concatenate([grid,grid])

array([[1, 2, 3],
       [4, 5, 6],
       [1, 2, 3],
       [4, 5, 6]])

In [50]:
#Using its axis parameter, you can define row-wise or column-wise matrix
np.concatenate([grid,grid],axis=1)

array([[1, 2, 3, 1, 2, 3],
       [4, 5, 6, 4, 5, 6]])

Until now, we used the concatenation function of arrays of equal dimension. But, what if you are required to combine a 2D array with 1D array? In such situations, np.concatenate might not be the best option to use. Instead, you can use np.vstack or np.hstack to do the task. Let's see how!

In [51]:
x = np.array([3,4,5])
grid = np.array([[1,2,3],[17,18,19]])
np.vstack([x,grid])

array([[ 3,  4,  5],
       [ 1,  2,  3],
       [17, 18, 19]])

In [54]:
#Similarly, you can add an array using np.hstack
z = np.array([[9],[9]])
z

array([[9],
       [9]])

In [55]:
np.hstack([grid,z])

array([[ 1,  2,  3,  9],
       [17, 18, 19,  9]])

Also, we can split the arrays based on pre-defined positions. Let's see how!

In [56]:
x = np.arange(10)
x

array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])

In [58]:
x1,x2,x3 = np.split(x,[3,6])
print (x1,x2,x3)

[0 1 2] [3 4 5] [6 7 8 9]


In [59]:
grid = np.arange(16).reshape((4,4))
grid
upper,lower = np.vsplit(grid,[2])
print (upper, lower)

[[0 1 2 3]
 [4 5 6 7]] [[ 8  9 10 11]
 [12 13 14 15]]


# Besides its obvious scientific uses, NumPy can also be used as an efficient multi-dimensional container of generic data. Arbitrary data-types can be defined. This allows NumPy to seamlessly and speedily integrate with a wide variety of databases.

1.argpartition()

NumPy has this amazing function which can find N largest values index. The output will be the N largest values index, and then we can sort the values if needed.

In [60]:
x = np.array([12, 10, 12, 0, 6, 8, 9, 1, 16, 4, 6, 0])
index_val = np.argpartition(x, -4)[-4:]
index_val

array([1, 8, 2, 0])

In [61]:
np.sort(x[index_val])

array([10, 12, 12, 16])

2. allclose()
Allclose() is used for matching two arrays and getting the output in terms of a boolean value. It will return False if items in two arrays are not equal within a tolerance. It is a great way to check if two arrays are similar, which can actually be difficult to implement manually.

In [62]:
array1 = np.array([0.12,0.17,0.24,0.29])
array2 = np.array([0.13,0.19,0.26,0.31])
# with a tolerance of 0.1, it should return False:
np.allclose(array1,array2,0.1)

False

In [63]:
# with a tolerance of 0.2, it should return True:
np.allclose(array1,array2,0.2)
True

True

3. clip()
Clip() is used to keep values in an array within an interval. Sometimes, we need to keep the values within an upper and lower limit. For the mentioned purpose, we can make use of NumPy’s clip(). Given an interval, values outside the interval are clipped to the interval edges.

In [64]:
x = np.array([3, 17, 14, 23, 2, 2, 6, 8, 1, 2, 16, 0])
np.clip(x,2,5)

array([3, 5, 5, 5, 2, 2, 5, 5, 2, 2, 5, 2])

4. extract()
Extract() as the name goes, is used to extract specific elements from an array based on a certain condition. With extract(), we can also use conditions like and and or.

In [65]:
# Random integers
array = np.random.randint(20, size=12)
array

array([15,  3, 12,  4,  8, 14, 15,  3, 15, 13, 16, 17])

In [66]:
#  Divide by 2 and check if remainder is 1
cond = np.mod(array, 2)==1
cond

array([ True,  True, False, False, False, False,  True,  True,  True,
        True, False,  True])

In [67]:
# Use extract to get the values
np.extract(cond, array)

array([15,  3, 15,  3, 15, 13, 17])

In [68]:
# Apply condition on extract directly
np.extract(((array < 3) | (array > 15)), array)

array([16, 17])

5. where()
Where() is used to return elements from an array that satisfy a certain condition. It returns the index position of values that fall in a certain condition. This is almost similar to the where condition that we use in SQL, I’ll demonstrate that in the examples below.

In [69]:
y = np.array([1,5,6,8,1,7,3,6,9])
# Where y is greater than 5, returns index position
np.where(y>5)

(array([2, 3, 5, 7, 8]),)

In [70]:
# First will replace the values that match the condition, 
# second will replace the values that does not
np.where(y>5, "Hit", "Miss")

array(['Miss', 'Miss', 'Hit', 'Hit', 'Miss', 'Hit', 'Miss', 'Hit', 'Hit'],
      dtype='<U4')

6. percentile()
Percentile() is used to compute the nth percentile of the array elements along the specified axis.

In [71]:
a = np.array([1,5,6,8,1,7,3,6,9])
print("50th Percentile of a, axis = 0 : ",  
      np.percentile(a, 50, axis =0))

50th Percentile of a, axis = 0 :  6.0


In [72]:
b = np.array([[10, 7, 4], [3, 2, 1]])
print("30th Percentile of b, axis = 0 : ",  
      np.percentile(b, 30, axis =0))

30th Percentile of b, axis = 0 :  [5.1 3.5 1.9]
