# INTRODUCTION

This notebook is to highlight amazing functions available within numpy module which can play a crucial role in any data analysis or machine learning project. Inspiration for this has come from many articles written in <b><i> MEDIUM</b></i>.

In [1]:
import numpy as np
import pandas as pd

# NUMPY

## ARGPARTITION

This function can help you find out indices of N largest values in an array 

In [2]:
x = np.array([10, 12, 3, 8, 7, 55, 29, 34, 6, 0, 1])

index_value = np.argpartition(x, -4)

x[index_value]

array([ 6,  1,  3,  8,  7,  0, 10, 12, 29, 34, 55])

What happened here!!! 

Argpartition generated new indices for input array x where the whole array was divided into two parts. 

1st part has all indices for elements which are smaller than 34 and second part has indices for elements which are greter than or equal to 34. 

34 is chosen since it represents -4th element of array x.

## ALLCLOSE

This can help in comparing two arrays with a tolerance

In [5]:
x1 = np.array([0.12, 0.17, 0.24, 0.29])
x2 = np.array([0.13, 0.19, 0.26, 0.31])

print("Match : {} with a tolerance of {}".format(np.allclose(x1, x2, 0.1), 0.1))
print("Match : {} with a tolerance of {}".format(np.allclose(x1, x2, 0.2), 0.2))


Match : False with a tolerance of 0.1
Match : True with a tolerance of 0.2


## CLIP

This helps in limiting values of an array within a limit (good use case is outlier treatment of continuous variables)

In [6]:
x = np.array([2, 10, 12, 15, 6, 18, 19, 22, 34, 8])

np.clip(x, 2, 20)

array([ 2, 10, 12, 15,  6, 18, 19, 20, 20,  8])

## EXTRACT

Helpful to extract specific elements based on certain conditions

In [10]:
x = np.random.randint(30, size = 10)

print("Input Array: {}".format(x))

print("Elements greater than 5 and less than 15 : {}".format(np.extract((x > 5)&(x < 15), x)))

Input Array: [ 8  7  4  5 26 13 15  7 15 16]
Elements greater than 5 and less than 15 : [ 8  7 13  7]


## WHERE

This function is similar as "extract" and instead returns indices of the elements

In [17]:
x = np.random.randint(30, size = 10)

print("Input Array: {}".format(x))

print("Indice of elements greater than 5 and less than 15 : {}".format(np.where((x > 5)&(x < 15))))

Input Array: [ 7 19  7  6  6 14 16 28 16 12]
Indice of elements greater than 5 and less than 15 : (array([0, 2, 3, 4, 5, 9], dtype=int64),)


## PERCENTILE

Useful for calculating percentiles along specified axis

In [19]:
a = np.array([1, 5, 6, 8, 10, 11, 7, 2])

np.percentile(a, 50)

6.5

In [20]:
b = np.array([[5, 8, 9], [11, 12, 13]])
np.percentile(b, 50, axis = 0)

array([ 8., 10., 11.])

## ARGMIN, ARGMAX, ARGSORT

These functions return indices of things asked and of great help if you don't want to implement the logic all by yourself

In [26]:
scores = np.array([55, 80, 7, 10, 34, 88, 42, 13])

print("Index of minimum value: {}".format(scores.argmin()))

print("Index of maximum value: {}".format(scores.argmax()))

print("Indices of sorted values: {}".format(scores.argsort()))

Index of minimum value: 2
Index of maximum value: 5
Indices of sorted values: [2 3 7 4 6 0 1 5]


## INTERSECT1D

This will quickly return intersection of two arrays, a great way to pick up common items between long lists

In [27]:
x = np.random.randint(20, size = 10)
y = np.random.randint(20, size = 15)

print("Common items between x and y are : {}".format(np.intersect1d(x, y)))

Common items between x and y are : [ 0  6  8 16]


# CONCLUSION

Numpy is a great libray for most data analysis related work and has many great tools to help in any project. I will keep on adding things to this notebook as I gain more knowledge on this beautiful