**Numpy**

NumPy, an acronym for the term ‘Numerical Python’, is a library in Python which is used extensively for efficient mathematical computing.

This library allows users to store large amounts of data using less memory and perform extensive operations efficiently.

It provides optimised and simpler functionalities to perform aforementioned operations using homogenous, one-dimensional and multidimensional arrays.

Numpy array are homogeneous.

**Numpy VS py list**



*   List are slow, numpy array are fast.
*   List store hetrogeneous data where as np array stores homogeneous data.

*   np array element size can be controlled when intialising it(like int32 (4 Bytes), int16 (2 Bytes)). Values stored in a list have 4 things (Size 4 Bytes, Reference count 8 Bytes, Object Type 8 Bytes, Object Value 8 Bytes) so in total 28 Bytes to store a single element, where as numpy takes (8 or 4 or 2 dependending on intialisation). Hence it is easy to read less Bytes of memory.

*   np array doesn't require type checking when iterating through objects.

*   elements in np array are stored in continuous memeory allocation, whereas in py list elements are scattered over in memory.

In [1]:
import numpy as np

#### **Basics**

In [2]:
a = np.array([1,2,3])

In [3]:
b = np.array([[1.0,2.0,3.0], [4.0,5.0, 6.0]])

In [4]:
# for creating array from interval values 
z = np.linspace(-np.pi, np.pi, 128, endpoint=True)

In [5]:
# get dimension (1-D, 2-D or N-D)
b.ndim

2

In [6]:
# get shape (no of rows and cols)
b.shape

(2, 3)

In [7]:
# get type
a.dtype

dtype('int64')

In [8]:
# get item size (single element size)
a.itemsize

8

In [9]:
# get total size (total size of array)
a.nbytes

24

#### **accessing changing specific elements, rows, columns etc**

In [10]:
c = np.array([[1,2,3,4,5,6,7], [8,9,10,11,12,13,14]])

In [11]:
# accessing specific element [r, c]
c[1,6]

14

In [12]:
# specific row
c[0, :]

array([1, 2, 3, 4, 5, 6, 7])

In [13]:
# specific col
c[:, 5]

array([ 6, 13])

In [14]:
# step
c[:, 1::2]

array([[ 2,  4,  6],
       [ 9, 11, 13]])

In [15]:
# 3 d array

three_d = np.array([[[[1,2,3], [4,5,6]], [[7,8,9], [10,11,12]]]])
three_d

array([[[[ 1,  2,  3],
         [ 4,  5,  6]],

        [[ 7,  8,  9],
         [10, 11, 12]]]])

#### **Intialising Different types of array**

In [16]:
# all 0's matrix
np.zeros(3)
np.zeros((2,3)) # 2-D array

array([[0., 0., 0.],
       [0., 0., 0.]])

In [17]:
# all 1's matrix
np.ones(5)
np.ones((2,2))
np.ones((3,3), dtype='int32')

array([[1, 1, 1],
       [1, 1, 1],
       [1, 1, 1]], dtype=int32)

In [18]:
# intitalise with any number
np.full((2,2), 88)

array([[88, 88],
       [88, 88]])

In [19]:
# random deciaml nums
np.random.rand(3,2)

array([[0.7231911 , 0.33597445],
       [0.52770483, 0.15146943],
       [0.19567208, 0.38163853]])

In [20]:
# random integr nums
np.random.randint(50,100, size=(2,2))

array([[67, 74],
       [85, 80]])

In [21]:
# identity matrix
np.identity(3)

array([[1., 0., 0.],
       [0., 1., 0.],
       [0., 0., 1.]])

In [22]:
# repeat an array
arr = np.array([[1,2,3]])
arr1 = np.repeat(arr,3,axis=0)
print(arr1)

[[1 2 3]
 [1 2 3]
 [1 2 3]]


#### **Mathematics**

[numpy linear algebra docs](https://numpy.org/doc/stable/reference/routines.linalg.html)

In [23]:
d = np.array([1,2,3])
np.sin(d)
np.cos(d)
np.tan(d)

array([ 1.55740772, -2.18503986, -0.14254654])


#### **Brodcasting**

In [24]:
mat1 = np.ones((2,3), dtype='int32')
mat2 = np.full((3,2), 2)

# mat1 * mat2 will result in error as the shape is different

np.matmul(mat1, mat2)

# matmul works only for correct shapes if col no of one matrix matches row no of
# another matix

array([[6, 6],
       [6, 6]])

In [25]:
# determinant
e = np.identity(3, dtype='int32')
# determinant of the identity matrix
np.linalg.det(e)

1.0

#### **Statistics**

In [26]:
data = np.array([[1,2,3], [88,5,6], [98, 74 ,9]])

In [27]:
# min of data
np.min(data) 
# min of each row
np.min(data, axis=1)

array([1, 5, 9])

In [28]:
# max of data
np.max(data)
# max of each row
np.max(data, axis=1)

array([ 3, 88, 98])

In [29]:
np.sum(data)

286

#### **Reorganising array**

In [30]:
# reshape
before = np.array([[1,2,3,4],[5,6,7,8]])
after = np.reshape(before, (4,2))
after

array([[1, 2],
       [3, 4],
       [5, 6],
       [7, 8]])

In [31]:
# vertically stacking matrices (col num should be same)

v1 = np.array([1,2,3,4])
v2 = np.array([5,6,7,8])
np.vstack((v1,v2))

array([[1, 2, 3, 4],
       [5, 6, 7, 8]])

In [32]:
# horizontal stacking matrices (row num should be same)

h1 = np.ones((2,4))
h2 = np.zeros((2,2))
np.hstack((h1,h2))

array([[1., 1., 1., 1., 0., 0.],
       [1., 1., 1., 1., 0., 0.]])

#### **Advanced indexing and boolean masking**

Boolean masking is typically the most efficient way to quantify a sub-collection in a collection. Masking in python and data science is when you want manipulated data in a collection based on some criteria. The criteria you use is typically of a true or false nature, hence the boolean part. They can also be used for indexing but it is very different as compare to index arrays.



In [33]:
arr = np.array([85,95,4,1,2,24,51,8,97])
arr[arr > 50]

array([85, 95, 51, 97])

In [34]:
arr[[2,3,7]]

array([4, 1, 8])

In [36]:
soln = np.ones((5,5), dtype='int32')
soln = np.vstack(([1,2,3,4,5], soln))
soln

array([[1, 2, 3, 4, 5],
       [1, 1, 1, 1, 1],
       [1, 1, 1, 1, 1],
       [1, 1, 1, 1, 1],
       [1, 1, 1, 1, 1],
       [1, 1, 1, 1, 1]])