# Data manipulation with Numpy and Pandas



* Handling large data in Python is very elegant. In the simplest way you can use plain arrays. However, they are pretty slow. 
* Numpy and Panda are two great libraries for dealing with tabular datasets. 
* Numpy is used for homogenous n-dimensional data (matrices). 

In [15]:
import numpy as np

In [16]:
# generating a random array of size 3 x 5 array. It contains values between 0 and 1 [0.0, 1.0)

X_rand = np.random.random((3, 5))  

print(X_rand)

[[0.22760737 0.66008358 0.63729631 0.07820294 0.53955239]
 [0.64474176 0.71449377 0.77295792 0.23149222 0.81522973]
 [0.31327108 0.21820111 0.75955633 0.78145032 0.13065796]]


In [17]:
# Creating 2D array of size 3 x 5 array
X = np.array([[15, 18, 22, 32, 45],    
              [13, 6, 17, 9, 20],   
              [21, 4, 49, 2, 8,]])   

print(X)

[[15 18 22 32 45]
 [13  6 17  9 20]
 [21  4 49  2  8]]


### Accessing elements

In [18]:
X[0, 0]

15

In [19]:
# getting a row
X[1]

array([13,  6, 17,  9, 20])

In [20]:
# geting a column
X[:, 1]

array([18,  6,  4])

In [21]:
# transposing an array
X.T

array([[15, 13, 21],
       [18,  6,  4],
       [22, 17, 49],
       [32,  9,  2],
       [45, 20,  8]])

In [22]:
print(X.shape)
#change the layout of the matrix
print(X.reshape(5, 3)) 

(3, 5)
[[15 18 22]
 [32 45 13]
 [ 6 17  9]
 [20 21  4]
 [49  2  8]]


In [23]:
indices = np.array([3, 1, 0])
print(indices)
X[:, indices]

[3 1 0]


array([[32, 18, 15],
       [ 9,  6, 13],
       [ 2,  4, 21]])

### Operations along an axis

In [24]:
X

array([[15, 18, 22, 32, 45],
       [13,  6, 17,  9, 20],
       [21,  4, 49,  2,  8]])

In [25]:
X.shape

(3, 5)

In [26]:
# sum of all values
np.sum(X) 

281

In [27]:
np.sum(X, axis=1) 

array([132,  65,  84])

In [28]:
np.max(X, axis=0) 

array([21, 18, 49, 32, 45])