# Basics of Numpy and Pandas

---

This notebook discusses basics of two most important Python libraries for data analytics and statistical modeling - `Numpy` and `Pandas`,

### Numpy

---

* Numpy array - from list, special functions
* Array operations
* 2-D arrays
* Indexing and slicing
* Conditional subsetting
* Array-array operations

### Pandas

---

* Pandas series
* DataFrame - creation, read from files
* Quick checking DataFrame
* Descriptive stats on DataFrame
* Indexing, slicing, conditional subsetting
* Operations on specific rows/columns

## Importing Libraries
Kind of Importing Libraries in python

## Numpy array from a Python list
Numpy arrays behave like **true numerical vectors**, not ordinary lists. That's why they are used for all mathematical operations, machine learning algorithms, and as basis of Pandas DataFrame for data analytics.

## 1- import as

In [1]:
import numpy as np

In [2]:
lst1=[1,2,3]
array1 = np.array(lst1)
array1

array([1, 2, 3])

In [3]:
type(lst1)

list

In [4]:
np.log(array1)

array([0.        , 0.69314718, 1.09861229])

## 2- import from Library

In [9]:
from numpy import array, log

In [6]:
lst3=[1,2,3]
array3 = np.array(lst3)
array3

array([1, 2, 3])

In [10]:
log(array3)

array([0.        , 0.69314718, 1.09861229])

## 3- import  Library Using *

In [1]:
from numpy import *

In [4]:
log(array3)

array([0.        , 0.69314718, 1.09861229])

In [9]:
import numpy as np

In [7]:
lst1=[1,3,5]
array1 = np.array(lst1)
array1

array([1, 3, 5])

In [8]:
type(lst1)

list

In [9]:
type(array1)

numpy.ndarray

In [10]:
lst2=[10,11,12]
array2 = np.array(lst2)

In [11]:
print(f"Adding two lists {lst1} and {lst2} together: {lst1+lst2}")

Adding two lists [1, 3, 5] and [10, 11, 12] together: [1, 3, 5, 10, 11, 12]


In [12]:
print(f"Adding two numpy arrays {array1} and {array2} together: {array1+array2}")

Adding two numpy arrays [1 3 5] and [10 11 12] together: [11 14 17]


## Mathematical operations with/on Numpy arrays

In [13]:
array1*array2

array([10, 33, 60])

In [14]:
array2/array1

array([10.        ,  3.66666667,  2.4       ])

In [15]:
array2**array1

array([    10,   1331, 248832], dtype=int32)

In [16]:
array1

array([1, 3, 5])

In [17]:
# sine function
print("Sine: ",np.sin(array1))
# logarithm
print("Natural logarithm: ",np.log(array1))
print("Base-10 logarithm: ",np.log10(array1))
print("Base-2 logarithm: ",np.log2(array1))
# Exponential
print("Exponential: ",np.exp(array1))

Sine:  [ 0.84147098  0.14112001 -0.95892427]
Natural logarithm:  [0.         1.09861229 1.60943791]
Base-10 logarithm:  [0.         0.47712125 0.69897   ]
Base-2 logarithm:  [0.         1.5849625  2.32192809]
Exponential:  [  2.71828183  20.08553692 148.4131591 ]


## How to generate arrays easily?
* `np.zeros`
* `np.ones`
* `np.arange`
* `np.linspace`

In [18]:
print("A series of zeroes:",np.zeros(7))
print("A series of ones:",np.ones(9))

A series of zeroes: [0. 0. 0. 0. 0. 0. 0.]
A series of ones: [1. 1. 1. 1. 1. 1. 1. 1. 1.]


In [19]:
print("A series of numbers:",np.arange(5,16))

print("Numbers spaced apart by 2:",np.arange(0,11,2))

print("Numbers spaced apart by float:",np.arange(0,11,2.5))

A series of numbers: [ 5  6  7  8  9 10 11 12 13 14 15]
Numbers spaced apart by 2: [ 0  2  4  6  8 10]
Numbers spaced apart by float: [ 0.   2.5  5.   7.5 10. ]


In [20]:
print("Every 5th number from 30 in reverse order: ",np.arange(30,-1,-5))

Every 5th number from 30 in reverse order:  [30 25 20 15 10  5  0]


In [21]:
print("11 linearly spaced numbers between 1 and 5: ",np.linspace(1,5,11))

11 linearly spaced numbers between 1 and 5:  [1.  1.4 1.8 2.2 2.6 3.  3.4 3.8 4.2 4.6 5. ]


## Multi-dimensional arrays

In [22]:
my_mat = [[1,2,3],[4,5,6],[7,8,9]]

mat = np.array(my_mat)

print("Type/Class of this object:",type(mat))
print("Here is the matrix\n----------\n",mat,"\n----------")

Type/Class of this object: <class 'numpy.ndarray'>
Here is the matrix
----------
 [[1 2 3]
 [4 5 6]
 [7 8 9]] 
----------


In [23]:
mat.shape
# my_mat.shape

(3, 3)

In [24]:
mat[0][0] = 10

In [25]:
mat

array([[10,  2,  3],
       [ 4,  5,  6],
       [ 7,  8,  9]])

In [26]:
my_tuple = np.array([(1.5,2,3), (4,5,6)])
mat_tuple = np.array(my_tuple)
print (mat_tuple)

[[1.5 2.  3. ]
 [4.  5.  6. ]]


In [27]:
mat_tuple

array([[1.5, 2. , 3. ],
       [4. , 5. , 6. ]])

In [28]:
q = (1,2,3,5)

In [29]:
qq = np.array(q)
qq[0] = 55
qq

array([55,  2,  3,  5])

## Dimension, shape, size, and data type of the 2D array

In [30]:
print("Dimension of this matrix: ",mat.ndim)

print("Size of this matrix: ", mat.size) 

print("Shape of this matrix: ", mat.shape)

print("Data type of this matrix: ", mat.dtype)

Dimension of this matrix:  2
Size of this matrix:  9
Shape of this matrix:  (3, 3)
Data type of this matrix:  int32


## Zeros, Ones, Random, and Identity Matrices and Vectors

In [31]:
print("Vector of zeros: ",np.zeros(5))

print("Matrix of zeros: \n",np.zeros((3,4)))

Vector of zeros:  [0. 0. 0. 0. 0.]
Matrix of zeros: 
 [[0. 0. 0. 0.]
 [0. 0. 0. 0.]
 [0. 0. 0. 0.]]


In [32]:
print("Vector of ones: ",np.ones(4))
print("\n" + "Matrix of ones: \n",np.ones((4,2)))

Vector of ones:  [1. 1. 1. 1.]

Matrix of ones: 
 [[1. 1.]
 [1. 1.]
 [1. 1.]
 [1. 1.]]


In [33]:
print("Matrix of 5’s: \n",5*np.ones((3,3)))

print("Identity matrix of dimension 2:",np.eye(2))
print("Identity matrix of dimension 4:",np.eye(4))

Matrix of 5’s: 
 [[5. 5. 5.]
 [5. 5. 5.]
 [5. 5. 5.]]
Identity matrix of dimension 2: [[1. 0.]
 [0. 1.]]
Identity matrix of dimension 4: [[1. 0. 0. 0.]
 [0. 1. 0. 0.]
 [0. 0. 1. 0.]
 [0. 0. 0. 1.]]


In [34]:
print("Random matrix of shape (4,3):\n",np.random.randint(low=1,high=10,size=(4,3)))

Random matrix of shape (4,3):
 [[1 6 4]
 [3 5 9]
 [6 8 1]
 [4 4 9]]


In [35]:
print("Random matrix of shape (4,3):\n",np.random.randint(1,10,(4,3)))

Random matrix of shape (4,3):
 [[4 6 9]
 [8 7 1]
 [5 3 4]
 [3 8 4]]


## Reshaping, Ravel, Min, Max, Sorting

In [36]:
a = np.random.randint(1,100,30)

b = a.reshape(2,3,5)

c = a.reshape(6,5)

print ("Shape of a:", a.shape)
print ("Shape of b:", b.shape)
print ("Shape of c:", c.shape)

Shape of a: (30,)
Shape of b: (2, 3, 5)
Shape of c: (6, 5)


In [37]:
a

array([23, 60,  6, 30, 34, 46, 79, 37, 37, 44, 95, 13, 84, 31,  7, 37, 82,
       45, 55,  4, 25, 39,  3, 91, 95, 33, 96, 78, 76, 52])

In [38]:
d = a.reshape(3,-1)
print ("Shape of d:", d.shape)

Shape of d: (3, 10)


In [39]:
d

array([[23, 60,  6, 30, 34, 46, 79, 37, 37, 44],
       [95, 13, 84, 31,  7, 37, 82, 45, 55,  4],
       [25, 39,  3, 91, 95, 33, 96, 78, 76, 52]])

In [40]:
print("\na looks like:\n",a)
print("\nb looks like:\n",b)
print("\nc looks like:\n",c)
print("\nd looks like:\n",d)


a looks like:
 [23 60  6 30 34 46 79 37 37 44 95 13 84 31  7 37 82 45 55  4 25 39  3 91
 95 33 96 78 76 52]

b looks like:
 [[[23 60  6 30 34]
  [46 79 37 37 44]
  [95 13 84 31  7]]

 [[37 82 45 55  4]
  [25 39  3 91 95]
  [33 96 78 76 52]]]

c looks like:
 [[23 60  6 30 34]
 [46 79 37 37 44]
 [95 13 84 31  7]
 [37 82 45 55  4]
 [25 39  3 91 95]
 [33 96 78 76 52]]

d looks like:
 [[23 60  6 30 34 46 79 37 37 44]
 [95 13 84 31  7 37 82 45 55  4]
 [25 39  3 91 95 33 96 78 76 52]]


In [41]:
e = a.reshape(1,-1)
e

array([[23, 60,  6, 30, 34, 46, 79, 37, 37, 44, 95, 13, 84, 31,  7, 37,
        82, 45, 55,  4, 25, 39,  3, 91, 95, 33, 96, 78, 76, 52]])

In [42]:
b_flat = b.ravel()
print(b_flat)

[23 60  6 30 34 46 79 37 37 44 95 13 84 31  7 37 82 45 55  4 25 39  3 91
 95 33 96 78 76 52]


## Indexing and slicing

In [45]:
arr[1:9:4]
#first:End:Step

array([1, 5])

In [44]:
arr = np.arange(0,11)

print("Array:",arr)
print("\nElement at 7th index is:", arr[7])
print("\nElements from 3rd to 5th index are:", arr[3:6])
print("\nElements up to 4th index are:", arr[:4])

Array: [ 0  1  2  3  4  5  6  7  8  9 10]

Element at 7th index is: 7

Elements from 3rd to 5th index are: [3 4 5]

Elements up to 4th index are: [0 1 2 3]


In [46]:
print("Elements from last backwards are:", arr[-1::-1])


print("\n3 Elements from last backwards are:", arr[-1:-6:-2])

Elements from last backwards are: [10  9  8  7  6  5  4  3  2  1  0]

3 Elements from last backwards are: [10  8  6]


In [47]:
arr2 = np.arange(0,21,2)
print("New array:",arr2)
print("\nElements at 2nd, 4th, and 9th index are:", arr2[[2,4,9]]) # Pass a list as a index to subset

New array: [ 0  2  4  6  8 10 12 14 16 18 20]

Elements at 2nd, 4th, and 9th index are: [ 4  8 18]


In [48]:
type(mat)

numpy.ndarray

## Conditional subsetting

In [49]:
mat = np.random.randint(10,100,15).reshape(3,5)

print("Matrix of random 2-digit numbers\n",mat)
print ("\nElements greater than 50\n", mat[mat>50])

Matrix of random 2-digit numbers
 [[26 63 84 73 34]
 [11 95 49 13 62]
 [77 55 83 28 23]]

Elements greater than 50
 [63 84 73 95 62 77 55 83]


In [50]:
mat>50

array([[False,  True,  True,  True, False],
       [False,  True, False, False,  True],
       [ True,  True,  True, False, False]])

In [51]:
mat*(mat>50)

array([[ 0, 63, 84, 73,  0],
       [ 0, 95,  0,  0, 62],
       [77, 55, 83,  0,  0]])

## Array operations (array-array, array-scalar, universal functions)

In [52]:
mat1 = np.random.randint(1,10,9).reshape(3,3)
mat2 = np.random.randint(1,10,9).reshape(3,3)

print("\n1st Matrix of random single-digit numbers\n",mat1)
print("\n2nd Matrix of random single-digit numbers\n",mat2)

print("\nAddition\n", mat1+mat2)
print("\nMultiplication\n", mat1*mat2)
print("\nDivision\n", mat1/mat2)
print("\nLineaer combination: 3*A - 2*B\n", 3*mat1-2*mat2)

print("\nAddition of a scalar (100)\n", 100+mat1)

print("\nExponentiation, matrix cubed here\n", mat1**3)
print("\nExponentiation, sq-root using pow function\n",pow(mat1,0.5))


1st Matrix of random single-digit numbers
 [[1 7 8]
 [5 6 7]
 [6 2 6]]

2nd Matrix of random single-digit numbers
 [[9 8 2]
 [6 1 7]
 [2 9 9]]

Addition
 [[10 15 10]
 [11  7 14]
 [ 8 11 15]]

Multiplication
 [[ 9 56 16]
 [30  6 49]
 [12 18 54]]

Division
 [[0.11111111 0.875      4.        ]
 [0.83333333 6.         1.        ]
 [3.         0.22222222 0.66666667]]

Lineaer combination: 3*A - 2*B
 [[-15   5  20]
 [  3  16   7]
 [ 14 -12   0]]

Addition of a scalar (100)
 [[101 107 108]
 [105 106 107]
 [106 102 106]]

Exponentiation, matrix cubed here
 [[  1 343 512]
 [125 216 343]
 [216   8 216]]

Exponentiation, sq-root using pow function
 [[1.         2.64575131 2.82842712]
 [2.23606798 2.44948974 2.64575131]
 [2.44948974 1.41421356 2.44948974]]


In [54]:
mat1

array([[1, 7, 8],
       [5, 6, 7],
       [6, 2, 6]])

In [55]:
mat2

array([[9, 8, 2],
       [6, 1, 7],
       [2, 9, 9]])

In [66]:
1*9 + 7*6 + 8*2

67

In [57]:
print('\ndot product\n', np.dot(mat1,mat2))


dot product
 [[ 67  87 123]
 [ 95 109 115]
 [ 78 104  80]]


## Pandas series

In [61]:
import pandas as pd

In [56]:
labels = ['a','b','c']
my_data = [10,20,30]
arr = np.array(my_data)
d = {'a':10,'b':20,'c':30}


print ("Labels:", labels)
print("My data:", my_data)
print("Dictionary:", d)

Labels: ['a', 'b', 'c']
My data: [10, 20, 30]
Dictionary: {'a': 10, 'b': 20, 'c': 30}


In [47]:
s1=pd.Series(data=my_data)
print(s1)

0    10
1    20
2    30
dtype: int64


In [48]:
s2=pd.Series(data=my_data, index=labels)
print(s2)

a    10
b    20
c    30
dtype: int64


In [49]:
s3=pd.Series(arr, labels)
print(s3)

a    10
b    20
c    30
dtype: int32


In [50]:
s4=pd.Series(d)
print(s4)

a    10
b    20
c    30
dtype: int64


## Pandas DataFrame

In [58]:
matrix_data = np.random.randint(1,20,size=20).reshape(5,4)
row_labels = ['A','B','C','D','E']
column_headings = ['W','X','Y','Z']

df = pd.DataFrame(data=matrix_data, index=row_labels, columns=column_headings)
print("\nThe data frame looks like\n",'-'*45)
print(df)


The data frame looks like
 ---------------------------------------------
    W   X   Y   Z
A  12  13  17  12
B   2   2  15   3
C   5  11  17   1
D  13   9  17   7
E  19  13  15   1


In [13]:
d={'a':[10,20],'b':[30,40],'c':[50,60]}
df2=pd.DataFrame(data=d,index=['X','Y'])
print(df2)

    a   b   c
X  10  30  50
Y  20  40  60


## DataFrame can be created reading directly from a CSV or an Excel file

In [62]:
df3 = pd.read_csv("country-data.csv")

In [63]:
df3

Unnamed: 0,country,child_mort,exports,health,income,inflation
0,Afghanistan,90.2,10.0,7.58,1610,9.44
1,Albania,16.6,28.0,6.55,9930,4.49
2,Algeria,27.3,38.4,4.17,12900,16.10
3,Angola,119.0,62.3,2.85,5900,22.40
4,Antigua and Barbuda,10.3,45.5,6.03,19100,1.44
...,...,...,...,...,...,...
140,Uzbekistan,36.3,31.7,5.81,4240,16.50
141,Vanuatu,29.2,46.6,5.25,2950,2.62
142,Vietnam,23.3,72.0,6.84,4490,12.10
143,Yemen,56.3,30.0,5.18,4480,23.60


In [65]:
df3.country

0              Afghanistan
1                  Albania
2                  Algeria
3                   Angola
4      Antigua and Barbuda
              ...         
140             Uzbekistan
141                Vanuatu
142                Vietnam
143                  Yemen
144                 Zambia
Name: country, Length: 145, dtype: object

In [62]:
#df4 = pd.read_excel("./Data/Height_Weight.xlsx")

df4 = pd.read_csv("Glass.csv")

df20 = pd.read_csv(r'C:\Users\ip3\Desktop\ML402_Datasets\Glass.csv')

In [69]:
df4

Unnamed: 0,RI,Na,Mg,Al,Si,K,Ca,Ba,Fe,Target
0,1.52101,13.64,4.49,1.10,71.78,0.06,8.75,0.00,0.0,1
1,1.51761,13.89,3.60,1.36,72.73,0.48,7.83,0.00,0.0,1
2,1.51618,13.53,3.55,1.54,72.99,0.39,7.78,0.00,0.0,1
3,1.51766,13.21,3.69,1.29,72.61,0.57,8.22,0.00,0.0,1
4,1.51742,13.27,3.62,1.24,73.08,0.55,8.07,0.00,0.0,1
...,...,...,...,...,...,...,...,...,...,...
209,1.51623,14.14,0.00,2.88,72.61,0.08,9.18,1.06,0.0,7
210,1.51685,14.92,0.00,1.99,73.06,0.00,8.40,1.59,0.0,7
211,1.52065,14.36,0.00,2.02,73.42,0.00,8.44,1.64,0.0,7
212,1.51651,14.38,0.00,1.94,73.61,0.00,8.48,1.57,0.0,7


## Quick checking DataFrames
* `.head()`
* `.tail()`
* `.sample()`
* `.info()`
* `.describe()`

In [65]:
df3.head()

Unnamed: 0,country,child_mort,exports,health,income,inflation
0,Afghanistan,90.2,10.0,7.58,1610,9.44
1,Albania,16.6,28.0,6.55,9930,4.49
2,Algeria,27.3,38.4,4.17,12900,16.1
3,Angola,119.0,62.3,2.85,5900,22.4
4,Antigua and Barbuda,10.3,45.5,6.03,19100,1.44


In [66]:
df3.head(3)

Unnamed: 0,country,child_mort,exports,health,income,inflation
0,Afghanistan,90.2,10.0,7.58,1610,9.44
1,Albania,16.6,28.0,6.55,9930,4.49
2,Algeria,27.3,38.4,4.17,12900,16.1


In [73]:
df3.tail(7)

Unnamed: 0,country,child_mort,exports,health,income,inflation
138,United Kingdom,5.2,28.2,9.64,36200,1.57
139,Uruguay,10.6,26.3,8.35,17100,4.91
140,Uzbekistan,36.3,31.7,5.81,4240,16.5
141,Vanuatu,29.2,46.6,5.25,2950,2.62
142,Vietnam,23.3,72.0,6.84,4490,12.1
143,Yemen,56.3,30.0,5.18,4480,23.6
144,Zambia,83.1,37.0,5.89,3280,14.0


In [69]:
df3.sample(5)

Unnamed: 0,country,child_mort,exports,health,income,inflation
64,India,58.8,22.6,4.05,4410,8.98
52,Gambia,80.3,23.8,5.69,1660,4.3
105,Paraguay,24.1,55.1,5.87,7290,6.1
30,Chile,8.7,37.7,7.96,19400,8.96
88,Maldives,13.2,77.6,6.33,10500,2.88


In [70]:
df3.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 145 entries, 0 to 144
Data columns (total 6 columns):
 #   Column      Non-Null Count  Dtype  
---  ------      --------------  -----  
 0   country     145 non-null    object 
 1   child_mort  145 non-null    float64
 2   exports     145 non-null    float64
 3   health      145 non-null    float64
 4   income      145 non-null    int64  
 5   inflation   145 non-null    float64
dtypes: float64(4), int64(1), object(1)
memory usage: 6.9+ KB


In [38]:
df4.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 214 entries, 0 to 213
Data columns (total 10 columns):
 #   Column  Non-Null Count  Dtype  
---  ------  --------------  -----  
 0   RI      214 non-null    float64
 1   Na      214 non-null    float64
 2   Mg      214 non-null    float64
 3   Al      214 non-null    float64
 4   Si      214 non-null    float64
 5   K       214 non-null    float64
 6   Ca      214 non-null    float64
 7   Ba      214 non-null    float64
 8   Fe      214 non-null    float64
 9   Target  214 non-null    int64  
dtypes: float64(9), int64(1)
memory usage: 16.8 KB


In [74]:
df3.describe().transpose()

Unnamed: 0,count,mean,std,min,25%,50%,75%,max
child_mort,145.0,36.223448,34.789344,2.6,8.7,19.8,58.8,137.0
exports,145.0,37.637924,18.598185,0.109,23.8,34.8,50.5,86.9
health,145.0,6.784069,2.466066,1.97,5.07,6.33,8.41,11.9
income,145.0,13969.482759,12790.798819,609.0,3370.0,9920.0,20100.0,45500.0
inflation,145.0,6.685345,6.073896,-1.9,1.77,4.91,9.81,23.6


In [72]:
df3.describe()

Unnamed: 0,child_mort,exports,health,income,inflation
count,145.0,145.0,145.0,145.0,145.0
mean,36.223448,37.637924,6.784069,13969.482759,6.685345
std,34.789344,18.598185,2.466066,12790.798819,6.073896
min,2.6,0.109,1.97,609.0,-1.9
25%,8.7,23.8,5.07,3370.0,1.77
50%,19.8,34.8,6.33,9920.0,4.91
75%,58.8,50.5,8.41,20100.0,9.81
max,137.0,86.9,11.9,45500.0,23.6


## Basic descriptive statistics on a DataFrame
* `mean()`
* `std()`
* `var()`
* `min()` and `max()`

In [75]:
df3.mean()

  df3.mean()


child_mort       36.223448
exports          37.637924
health            6.784069
income        13969.482759
inflation         6.685345
dtype: float64

In [76]:
df3.std()

  df3.std()


child_mort       34.789344
exports          18.598185
health            2.466066
income        12790.798819
inflation         6.073896
dtype: float64

In [77]:
df3.var()

  df3.var()


child_mort    1.210298e+03
exports       3.458925e+02
health        6.081480e+00
income        1.636045e+08
inflation     3.689221e+01
dtype: float64

In [78]:
df3.min()

country       Afghanistan
child_mort            2.6
exports             0.109
health               1.97
income                609
inflation            -1.9
dtype: object

## Indexing, slicing columns and rows of a DataFrame

In [79]:
df4

Unnamed: 0,RI,Na,Mg,Al,Si,K,Ca,Ba,Fe,Target
0,1.52101,13.64,4.49,1.10,71.78,0.06,8.75,0.00,0.0,1
1,1.51761,13.89,3.60,1.36,72.73,0.48,7.83,0.00,0.0,1
2,1.51618,13.53,3.55,1.54,72.99,0.39,7.78,0.00,0.0,1
3,1.51766,13.21,3.69,1.29,72.61,0.57,8.22,0.00,0.0,1
4,1.51742,13.27,3.62,1.24,73.08,0.55,8.07,0.00,0.0,1
...,...,...,...,...,...,...,...,...,...,...
209,1.51623,14.14,0.00,2.88,72.61,0.08,9.18,1.06,0.0,7
210,1.51685,14.92,0.00,1.99,73.06,0.00,8.40,1.59,0.0,7
211,1.52065,14.36,0.00,2.02,73.42,0.00,8.44,1.64,0.0,7
212,1.51651,14.38,0.00,1.94,73.61,0.00,8.48,1.57,0.0,7


In [80]:
print("\nThe 'RI' column\n",'-'*25)
print(df4['RI'])

print("\nType of the column: ", type(df4['RI']))
print("\nThe 'RI' and 'Weight' columns indexed by passing a list\n",'-'*55)
print(df4[['RI','Na']])
print("\nType of the pair of columns: ", type(df4[['RI','Na']]))


The 'RI' column
 -------------------------
0      1.52101
1      1.51761
2      1.51618
3      1.51766
4      1.51742
        ...   
209    1.51623
210    1.51685
211    1.52065
212    1.51651
213    1.51711
Name: RI, Length: 214, dtype: float64

Type of the column:  <class 'pandas.core.series.Series'>

The 'RI' and 'Weight' columns indexed by passing a list
 -------------------------------------------------------
          RI     Na
0    1.52101  13.64
1    1.51761  13.89
2    1.51618  13.53
3    1.51766  13.21
4    1.51742  13.27
..       ...    ...
209  1.51623  14.14
210  1.51685  14.92
211  1.52065  14.36
212  1.51651  14.38
213  1.51711  14.23

[214 rows x 2 columns]

Type of the pair of columns:  <class 'pandas.core.frame.DataFrame'>


In [81]:
df

Unnamed: 0,W,X,Y,Z
A,12,13,17,12
B,2,2,15,3
C,5,11,17,1
D,13,9,17,7
E,19,13,15,1


In [82]:
print("\nLabel-based 'loc' method can be used for selecting row(s)\n",'-'*60)

print("\nSingle row\n")
print(df.loc['C'])

print("\nMultiple rows\n")
print(df.loc[['B','C']])

print("\nIndex position based 'iloc' method can be used for selecting row(s)\n",'-'*70)

print("\nSingle row\n")
print(df.iloc[2])

print("\nMultiple rows\n")
print(df.iloc[[1,2]])


Label-based 'loc' method can be used for selecting row(s)
 ------------------------------------------------------------

Single row

W     5
X    11
Y    17
Z     1
Name: C, dtype: int32

Multiple rows

   W   X   Y  Z
B  2   2  15  3
C  5  11  17  1

Index position based 'iloc' method can be used for selecting row(s)
 ----------------------------------------------------------------------

Single row

W     5
X    11
Y    17
Z     1
Name: C, dtype: int32

Multiple rows

   W   X   Y  Z
B  2   2  15  3
C  5  11  17  1


In [86]:
print(df.iloc[[2,3]])


    W   X   Y  Z
C   5  11  17  1
D  13   9  17  7


## Conditional subsetting

In [88]:
df4

Unnamed: 0,RI,Na,Mg,Al,Si,K,Ca,Ba,Fe,Target
0,1.52101,13.64,4.49,1.10,71.78,0.06,8.75,0.00,0.0,1
1,1.51761,13.89,3.60,1.36,72.73,0.48,7.83,0.00,0.0,1
2,1.51618,13.53,3.55,1.54,72.99,0.39,7.78,0.00,0.0,1
3,1.51766,13.21,3.69,1.29,72.61,0.57,8.22,0.00,0.0,1
4,1.51742,13.27,3.62,1.24,73.08,0.55,8.07,0.00,0.0,1
...,...,...,...,...,...,...,...,...,...,...
209,1.51623,14.14,0.00,2.88,72.61,0.08,9.18,1.06,0.0,7
210,1.51685,14.92,0.00,1.99,73.06,0.00,8.40,1.59,0.0,7
211,1.52065,14.36,0.00,2.02,73.42,0.00,8.44,1.64,0.0,7
212,1.51651,14.38,0.00,1.94,73.61,0.00,8.48,1.57,0.0,7


In [90]:
df4['Mg']>1.8

0       True
1       True
2       True
3       True
4       True
       ...  
209    False
210    False
211    False
212    False
213    False
Name: Mg, Length: 214, dtype: bool

In [91]:
df4[df4['Mg']>1.8]

Unnamed: 0,RI,Na,Mg,Al,Si,K,Ca,Ba,Fe,Target
0,1.52101,13.64,4.49,1.10,71.78,0.06,8.75,0.00,0.0,1
1,1.51761,13.89,3.60,1.36,72.73,0.48,7.83,0.00,0.0,1
2,1.51618,13.53,3.55,1.54,72.99,0.39,7.78,0.00,0.0,1
3,1.51766,13.21,3.69,1.29,72.61,0.57,8.22,0.00,0.0,1
4,1.51742,13.27,3.62,1.24,73.08,0.55,8.07,0.00,0.0,1
...,...,...,...,...,...,...,...,...,...,...
185,1.51131,13.69,3.20,1.81,72.81,1.76,5.43,1.19,0.0,7
186,1.51838,14.32,3.26,2.22,71.25,1.46,5.79,1.63,0.0,7
187,1.52315,13.44,3.34,1.23,72.38,0.60,8.83,0.00,0.0,7
188,1.52247,14.86,2.20,2.06,70.26,0.76,9.76,0.00,0.0,7


Which one have a **Mg more than 1.8  and Na less than 14**?

In [92]:
df4[(df4['Mg']>1.8) & (df4['Na']<14)]

Unnamed: 0,RI,Na,Mg,Al,Si,K,Ca,Ba,Fe,Target
0,1.52101,13.64,4.49,1.10,71.78,0.06,8.75,0.00,0.0,1
1,1.51761,13.89,3.60,1.36,72.73,0.48,7.83,0.00,0.0,1
2,1.51618,13.53,3.55,1.54,72.99,0.39,7.78,0.00,0.0,1
3,1.51766,13.21,3.69,1.29,72.61,0.57,8.22,0.00,0.0,1
4,1.51742,13.27,3.62,1.24,73.08,0.55,8.07,0.00,0.0,1
...,...,...,...,...,...,...,...,...,...,...
164,1.51915,12.73,1.85,1.86,72.69,0.60,10.09,0.00,0.0,5
165,1.52171,11.56,1.88,1.56,72.86,0.47,11.41,0.00,0.0,5
177,1.51937,13.79,2.41,1.19,72.76,0.00,9.77,0.00,0.0,6
185,1.51131,13.69,3.20,1.81,72.81,1.76,5.43,1.19,0.0,7


## Operations on specific columns/rows

In [93]:
df3.head()

Unnamed: 0,country,child_mort,exports,health,income,inflation
0,Afghanistan,90.2,10.0,7.58,1610,9.44
1,Albania,16.6,28.0,6.55,9930,4.49
2,Algeria,27.3,38.4,4.17,12900,16.1
3,Angola,119.0,62.3,2.85,5900,22.4
4,Antigua and Barbuda,10.3,45.5,6.03,19100,1.44


#### What is the standard deviation of child_mort and health contents for the Country dataset?

In [94]:
df3[['child_mort','health']].std()

child_mort    34.789344
health         2.466066
dtype: float64

#### What is the range of child_mort in the Country dataset?

In [99]:
range_child_mort = df3['child_mort'].max()- df3['child_mort'].min()
print("The range of child_mort is: ", round(range_child_mort,3))

The range of child_mort is:  134.4


#### Top 5 percentile in terms of income?

In [95]:
np.percentile(df3['income'],95)

41100.0

In [96]:
df3[df3['income'] >= 41100.0]

Unnamed: 0,country,child_mort,exports,health,income,inflation
7,Australia,4.8,19.8,8.73,41400,1.16
8,Austria,4.3,51.3,11.0,43200,0.873
11,Bahrain,8.6,69.5,4.97,41100,7.44
15,Belgium,4.5,76.4,10.7,41100,1.88
41,Denmark,4.1,50.5,11.4,44000,3.22
99,Netherlands,4.5,72.0,11.9,45500,0.848
102,Oman,11.7,65.7,2.77,45300,15.6
114,Saudi Arabia,15.7,49.6,4.29,45400,17.2
127,Sweden,3.0,46.2,9.63,42900,0.991


In [102]:
df3[df3['income']>=41100.0][['child_mort','exports','health']].mean()

child_mort     6.800000
exports       55.666667
health         8.376667
dtype: float64

## Create a new column as a function of mathematical operations on existing columns

In [97]:
df4

Unnamed: 0,RI,Na,Mg,Al,Si,K,Ca,Ba,Fe,Target
0,1.52101,13.64,4.49,1.10,71.78,0.06,8.75,0.00,0.0,1
1,1.51761,13.89,3.60,1.36,72.73,0.48,7.83,0.00,0.0,1
2,1.51618,13.53,3.55,1.54,72.99,0.39,7.78,0.00,0.0,1
3,1.51766,13.21,3.69,1.29,72.61,0.57,8.22,0.00,0.0,1
4,1.51742,13.27,3.62,1.24,73.08,0.55,8.07,0.00,0.0,1
...,...,...,...,...,...,...,...,...,...,...
209,1.51623,14.14,0.00,2.88,72.61,0.08,9.18,1.06,0.0,7
210,1.51685,14.92,0.00,1.99,73.06,0.00,8.40,1.59,0.0,7
211,1.52065,14.36,0.00,2.02,73.42,0.00,8.44,1.64,0.0,7
212,1.51651,14.38,0.00,1.94,73.61,0.00,8.48,1.57,0.0,7


In [98]:
#df4['Sum of RI and Ca'] = df4['RI']+(df4['Ca']/100)**2

df4['Sum of RI and Ca'] = df4['RI'] + df4['Ca']

df4

Unnamed: 0,RI,Na,Mg,Al,Si,K,Ca,Ba,Fe,Target,Sum of RI and Ca
0,1.52101,13.64,4.49,1.10,71.78,0.06,8.75,0.00,0.0,1,10.27101
1,1.51761,13.89,3.60,1.36,72.73,0.48,7.83,0.00,0.0,1,9.34761
2,1.51618,13.53,3.55,1.54,72.99,0.39,7.78,0.00,0.0,1,9.29618
3,1.51766,13.21,3.69,1.29,72.61,0.57,8.22,0.00,0.0,1,9.73766
4,1.51742,13.27,3.62,1.24,73.08,0.55,8.07,0.00,0.0,1,9.58742
...,...,...,...,...,...,...,...,...,...,...,...
209,1.51623,14.14,0.00,2.88,72.61,0.08,9.18,1.06,0.0,7,10.69623
210,1.51685,14.92,0.00,1.99,73.06,0.00,8.40,1.59,0.0,7,9.91685
211,1.52065,14.36,0.00,2.02,73.42,0.00,8.44,1.64,0.0,7,9.96065
212,1.51651,14.38,0.00,1.94,73.61,0.00,8.48,1.57,0.0,7,9.99651


In [102]:
df4.sort_values(by='RI')

Unnamed: 0,RI,Na,Mg,Al,Si,K,Ca,Ba,Fe,Target,Sum of RI and Ca
184,1.51115,17.38,0.00,0.34,75.41,0.00,6.65,0.00,0.00,6,8.16115
185,1.51131,13.69,3.20,1.81,72.81,1.76,5.43,1.19,0.00,7,6.94131
56,1.51215,12.99,3.47,1.12,72.98,0.62,8.35,0.00,0.31,1,9.86215
180,1.51299,14.40,1.74,1.54,74.55,0.00,7.59,0.00,0.00,6,9.10299
171,1.51316,13.02,0.00,3.04,70.48,6.21,6.96,0.00,0.00,5,8.47316
...,...,...,...,...,...,...,...,...,...,...,...
103,1.52725,13.80,3.15,0.66,70.57,0.08,11.64,0.00,0.00,2,13.16725
111,1.52739,11.02,0.00,0.75,73.08,0.00,14.96,0.00,0.00,2,16.48739
112,1.52777,12.64,0.00,0.67,72.02,0.06,14.40,0.00,0.00,2,15.92777
106,1.53125,10.73,0.00,2.10,69.81,0.58,13.30,3.15,0.28,2,14.83125


## Use `inplace=True` to make the changes reflected on the original DataFrame

In [103]:
df4

Unnamed: 0,RI,Na,Mg,Al,Si,K,Ca,Ba,Fe,Target,Sum of RI and Ca
0,1.52101,13.64,4.49,1.10,71.78,0.06,8.75,0.00,0.0,1,10.27101
1,1.51761,13.89,3.60,1.36,72.73,0.48,7.83,0.00,0.0,1,9.34761
2,1.51618,13.53,3.55,1.54,72.99,0.39,7.78,0.00,0.0,1,9.29618
3,1.51766,13.21,3.69,1.29,72.61,0.57,8.22,0.00,0.0,1,9.73766
4,1.51742,13.27,3.62,1.24,73.08,0.55,8.07,0.00,0.0,1,9.58742
...,...,...,...,...,...,...,...,...,...,...,...
209,1.51623,14.14,0.00,2.88,72.61,0.08,9.18,1.06,0.0,7,10.69623
210,1.51685,14.92,0.00,1.99,73.06,0.00,8.40,1.59,0.0,7,9.91685
211,1.52065,14.36,0.00,2.02,73.42,0.00,8.44,1.64,0.0,7,9.96065
212,1.51651,14.38,0.00,1.94,73.61,0.00,8.48,1.57,0.0,7,9.99651


In [104]:
df4.sort_values(by='RI',inplace=True)

In [105]:
df4

Unnamed: 0,RI,Na,Mg,Al,Si,K,Ca,Ba,Fe,Target,Sum of RI and Ca
184,1.51115,17.38,0.00,0.34,75.41,0.00,6.65,0.00,0.00,6,8.16115
185,1.51131,13.69,3.20,1.81,72.81,1.76,5.43,1.19,0.00,7,6.94131
56,1.51215,12.99,3.47,1.12,72.98,0.62,8.35,0.00,0.31,1,9.86215
180,1.51299,14.40,1.74,1.54,74.55,0.00,7.59,0.00,0.00,6,9.10299
171,1.51316,13.02,0.00,3.04,70.48,6.21,6.96,0.00,0.00,5,8.47316
...,...,...,...,...,...,...,...,...,...,...,...
103,1.52725,13.80,3.15,0.66,70.57,0.08,11.64,0.00,0.00,2,13.16725
111,1.52739,11.02,0.00,0.75,73.08,0.00,14.96,0.00,0.00,2,16.48739
112,1.52777,12.64,0.00,0.67,72.02,0.06,14.40,0.00,0.00,2,15.92777
106,1.53125,10.73,0.00,2.10,69.81,0.58,13.30,3.15,0.28,2,14.83125


In [109]:
df4.drop('RI', axis=1)

Unnamed: 0,Na,Mg,Al,Si,K,Ca,Ba,Fe,Target,Sum of RI and Ca
184,17.38,0.00,0.34,75.41,0.00,6.65,0.00,0.00,6,8.16115
185,13.69,3.20,1.81,72.81,1.76,5.43,1.19,0.00,7,6.94131
56,12.99,3.47,1.12,72.98,0.62,8.35,0.00,0.31,1,9.86215
180,14.40,1.74,1.54,74.55,0.00,7.59,0.00,0.00,6,9.10299
171,13.02,0.00,3.04,70.48,6.21,6.96,0.00,0.00,5,8.47316
...,...,...,...,...,...,...,...,...,...,...
103,13.80,3.15,0.66,70.57,0.08,11.64,0.00,0.00,2,13.16725
111,11.02,0.00,0.75,73.08,0.00,14.96,0.00,0.00,2,16.48739
112,12.64,0.00,0.67,72.02,0.06,14.40,0.00,0.00,2,15.92777
106,10.73,0.00,2.10,69.81,0.58,13.30,3.15,0.28,2,14.83125


In [110]:
df4.corr()

Unnamed: 0,RI,Na,Mg,Al,Si,K,Ca,Ba,Fe,Target,Sum of RI and Ca
RI,1.0,-0.191885,-0.122274,-0.407326,-0.542052,-0.289833,0.810403,-0.000386,0.14301,-0.164237,0.811133
Na,-0.191885,1.0,-0.273732,0.156794,-0.069809,-0.266087,-0.275442,0.326603,-0.241346,0.502898,-0.275376
Mg,-0.122274,-0.273732,1.0,-0.481799,-0.165927,0.005396,-0.44375,-0.492262,0.08306,-0.744993,-0.443244
Al,-0.407326,0.156794,-0.481799,1.0,-0.005524,0.325958,-0.259592,0.479404,-0.074402,0.598829,-0.260011
Si,-0.542052,-0.069809,-0.165927,-0.005524,1.0,-0.193331,-0.208732,-0.102151,-0.094201,0.151565,-0.209526
K,-0.289833,-0.266087,0.005396,0.325958,-0.193331,1.0,-0.317836,-0.042618,-0.007719,-0.010054,-0.317905
Ca,0.810403,-0.275442,-0.44375,-0.259592,-0.208732,-0.317836,1.0,-0.112841,0.124968,0.000952,0.999999
Ba,-0.000386,0.326603,-0.492262,0.479404,-0.102151,-0.042618,-0.112841,1.0,-0.058692,0.575161,-0.112647
Fe,0.14301,-0.241346,0.08306,-0.074402,-0.094201,-0.007719,0.124968,-0.058692,1.0,-0.188278,0.125057
Target,-0.164237,0.502898,-0.744993,0.598829,0.151565,-0.010054,0.000952,0.575161,-0.188278,1.0,0.000601
