# Basics of Numpy and Pandas
---

This notebook discusses basics of two most important Python libraries for data analytics and statistical modeling - `Numpy` and `Pandas`,

### Numpy

---

* Numpy array - from list, special functions
* Array operations
* 2-D arrays
* Indexing and slicing
* Conditional subsetting
* Array-array operations

### Pandas

---

* Pandas series
* DataFrame - creation, read from files
* Quick checking DataFrame
* Descriptive stats on DataFrame
* Indexing, slicing, conditional subsetting
* Operations on specific rows/columns

## Numpy array from a Python list
Numpy arrays behave like **true numerical vectors**, not ordinary lists. That's why they are used for all mathematical operations, machine learning algorithms, and as basis of Pandas DataFrame for data analytics.

In [1]:
import numpy as np
lst1=[1,2,3]
array1 = np.array(lst1)

In [2]:
type(lst1)

list

In [3]:
type(array1)

numpy.ndarray

In [4]:
lst2=[10,11,12]
array2 = np.array(lst2)

In [5]:
print(f"Adding two lists {lst1} and {lst2} together: {lst1+lst2}")

Adding two lists [1, 2, 3] and [10, 11, 12] together: [1, 2, 3, 10, 11, 12]


In [6]:
print(f"Adding two numpy arrays {array1} and {array2} together: {array1+array2}")

Adding two numpy arrays [1 2 3] and [10 11 12] together: [11 13 15]


## Mathematical operations with/on Numpy arrays

In [7]:
print("array2 multiplied by array1: ",array1*array2)
print("array2 divided by array1: ",array2/array1)
print("array2 raised to the power of array1: ",array2**array1)

array2 multiplied by array1:  [10 22 36]
array2 divided by array1:  [10.   5.5  4. ]
array2 raised to the power of array1:  [  10  121 1728]


In [8]:
# sine function
print("Sine: ",np.sin(array1))
# logarithm
print("Natural logarithm: ",np.log(array1))
print("Base-10 logarithm: ",np.log10(array1))
print("Base-2 logarithm: ",np.log2(array1))
# Exponential
print("Exponential: ",np.exp(array1))

Sine:  [0.84147098 0.90929743 0.14112001]
Natural logarithm:  [0.         0.69314718 1.09861229]
Base-10 logarithm:  [0.         0.30103    0.47712125]
Base-2 logarithm:  [0.        1.        1.5849625]
Exponential:  [ 2.71828183  7.3890561  20.08553692]


## How to generate arrays easily?
* `np.zeros`
* `np.ones`
* `np.arange`
* `np.linspace`

In [9]:
print("A series of zeroes:",np.zeros(7))
print("A series of ones:",np.ones(9))
print("A series of numbers:",np.arange(5,16))
print("Numbers spaced apart by 2:",np.arange(0,11,2))
print("Numbers spaced apart by float:",np.arange(0,11,2.5))
print("Every 5th number from 30 in reverse order: ",np.arange(30,-1,-5))
print("11 linearly spaced numbers between 1 and 5: ",np.linspace(1,5,11))

A series of zeroes: [0. 0. 0. 0. 0. 0. 0.]
A series of ones: [1. 1. 1. 1. 1. 1. 1. 1. 1.]
A series of numbers: [ 5  6  7  8  9 10 11 12 13 14 15]
Numbers spaced apart by 2: [ 0  2  4  6  8 10]
Numbers spaced apart by float: [ 0.   2.5  5.   7.5 10. ]
Every 5th number from 30 in reverse order:  [30 25 20 15 10  5  0]
11 linearly spaced numbers between 1 and 5:  [1.  1.4 1.8 2.2 2.6 3.  3.4 3.8 4.2 4.6 5. ]


## Multi-dimensional arrays

In [10]:
my_mat = [[1,2,3],[4,5,6],[7,8,9]]
mat = np.array(my_mat)
print("Type/Class of this object:",type(mat))
print("Here is the matrix\n----------\n",mat,"\n----------")

Type/Class of this object: <class 'numpy.ndarray'>
Here is the matrix
----------
 [[1 2 3]
 [4 5 6]
 [7 8 9]] 
----------


In [11]:
my_tuple = np.array([(1.5,2,3), (4,5,6)])
mat_tuple = np.array(my_tuple)
print (mat_tuple)

[[1.5 2.  3. ]
 [4.  5.  6. ]]


## Dimension, shape, size, and data type of the 2D array

In [12]:
print("Dimension of this matrix: ",mat.ndim,sep='') 
print("Size of this matrix: ", mat.size,sep='') 
print("Shape of this matrix: ", mat.shape,sep='')
print("Data type of this matrix: ", mat.dtype,sep='')

Dimension of this matrix: 2
Size of this matrix: 9
Shape of this matrix: (3, 3)
Data type of this matrix: int64


## Zeros, Ones, Random, and Identity Matrices and Vectors

In [13]:
print("Vector of zeros: ",np.zeros(5))
print("Matrix of zeros: ",np.zeros((3,4)))
print("Vector of ones: ",np.ones(4))
print("Matrix of ones: ",np.ones((4,2)))
print("Matrix of 5’s: ",5*np.ones((3,3)))
print("Identity matrix of dimension 2:",np.eye(2))
print("Identity matrix of dimension 4:",np.eye(4))
print("Random matrix of shape (4,3):\n",np.random.randint(low=1,high=10,size=(4,3)))

Vector of zeros:  [0. 0. 0. 0. 0.]
Matrix of zeros:  [[0. 0. 0. 0.]
 [0. 0. 0. 0.]
 [0. 0. 0. 0.]]
Vector of ones:  [1. 1. 1. 1.]
Matrix of ones:  [[1. 1.]
 [1. 1.]
 [1. 1.]
 [1. 1.]]
Matrix of 5’s:  [[5. 5. 5.]
 [5. 5. 5.]
 [5. 5. 5.]]
Identity matrix of dimension 2: [[1. 0.]
 [0. 1.]]
Identity matrix of dimension 4: [[1. 0. 0. 0.]
 [0. 1. 0. 0.]
 [0. 0. 1. 0.]
 [0. 0. 0. 1.]]
Random matrix of shape (4,3):
 [[6 6 6]
 [8 7 6]
 [6 4 8]
 [2 1 7]]


## Reshaping, Ravel, Min, Max, Sorting

In [14]:
a = np.random.randint(1,100,30)
b = a.reshape(2,3,5)
c = a.reshape(6,5)
print ("Shape of a:", a.shape)
print ("Shape of b:", b.shape)
print ("Shape of c:", c.shape)

Shape of a: (30,)
Shape of b: (2, 3, 5)
Shape of c: (6, 5)


In [15]:
print("\na looks like:\n",a)
print("\nb looks like:\n",b)
print("\nc looks like:\n",c)


a looks like:
 [74 68 97  4 39 44 80 47 98 28 35 71 55  4 87 34 44 96 29  1 33 78 72 68
 77 81 79 39 28 85]

b looks like:
 [[[74 68 97  4 39]
  [44 80 47 98 28]
  [35 71 55  4 87]]

 [[34 44 96 29  1]
  [33 78 72 68 77]
  [81 79 39 28 85]]]

c looks like:
 [[74 68 97  4 39]
 [44 80 47 98 28]
 [35 71 55  4 87]
 [34 44 96 29  1]
 [33 78 72 68 77]
 [81 79 39 28 85]]


In [16]:
b_flat = b.ravel()
print(b_flat)

[74 68 97  4 39 44 80 47 98 28 35 71 55  4 87 34 44 96 29  1 33 78 72 68
 77 81 79 39 28 85]


## Indexing and slicing

In [17]:
arr = np.arange(0,11)
print("Array:",arr)
print("Element at 7th index is:", arr[7])
print("Elements from 3rd to 5th index are:", arr[3:6])
print("Elements up to 4th index are:", arr[:4])
print("Elements from last backwards are:", arr[-1::-1])
print("3 Elements from last backwards are:", arr[-1:-6:-2])

arr2 = np.arange(0,21,2)
print("New array:",arr2)
print("Elements at 2nd, 4th, and 9th index are:", arr2[[2,4,9]]) # Pass a list as a index to subset

Array: [ 0  1  2  3  4  5  6  7  8  9 10]
Element at 7th index is: 7
Elements from 3rd to 5th index are: [3 4 5]
Elements up to 4th index are: [0 1 2 3]
Elements from last backwards are: [10  9  8  7  6  5  4  3  2  1  0]
3 Elements from last backwards are: [10  8  6]
New array: [ 0  2  4  6  8 10 12 14 16 18 20]
Elements at 2nd, 4th, and 9th index are: [ 4  8 18]


In [18]:
mat = np.random.randint(10,100,15).reshape(3,5)
print("Matrix of random 2-digit numbers\n",mat)

print("\nDouble bracket indexing\n")
print("Element in row index 1 and column index 2:", mat[1][2])

print("\nSingle bracket with comma indexing\n")
print("Element in row index 1 and column index 2:", mat[1,2])
print("\nRow or column extract\n")

print("Entire row at index 2:", mat[2])
print("Entire column at index 3:", mat[:,3])

print("\nSubsetting sub-matrices\n")
print("Matrix with row indices 1 and 2 and column indices 3 and 4\n", mat[1:3,3:5])
print("Matrix with row indices 0 and 1 and column indices 1 and 3\n", mat[0:2,[1,3]])

Matrix of random 2-digit numbers
 [[35 97 24 94 11]
 [21 92 11 51 11]
 [56 17 14 84 35]]

Double bracket indexing

Element in row index 1 and column index 2: 11

Single bracket with comma indexing

Element in row index 1 and column index 2: 11

Row or column extract

Entire row at index 2: [56 17 14 84 35]
Entire column at index 3: [94 51 84]

Subsetting sub-matrices

Matrix with row indices 1 and 2 and column indices 3 and 4
 [[51 11]
 [84 35]]
Matrix with row indices 0 and 1 and column indices 1 and 3
 [[97 94]
 [92 51]]


## Conditional subsetting

In [19]:
mat = np.random.randint(10,100,15).reshape(3,5)
print("Matrix of random 2-digit numbers\n",mat)
print ("\nElements greater than 50\n", mat[mat>50])

Matrix of random 2-digit numbers
 [[93 45 77 56 91]
 [36 74 30 95 93]
 [20 74 51 18 26]]

Elements greater than 50
 [93 77 56 91 74 95 93 74 51]


In [20]:
mat>50

array([[ True, False,  True,  True,  True],
       [False,  True, False,  True,  True],
       [False,  True,  True, False, False]])

In [21]:
mat*(mat>50)

array([[93,  0, 77, 56, 91],
       [ 0, 74,  0, 95, 93],
       [ 0, 74, 51,  0,  0]])

## Array operations (array-array, array-scalar, universal functions)

In [22]:
mat1 = np.random.randint(1,10,9).reshape(3,3)
mat2 = np.random.randint(1,10,9).reshape(3,3)
print("\n1st Matrix of random single-digit numbers\n",mat1)
print("\n2nd Matrix of random single-digit numbers\n",mat2)

print("\nAddition\n", mat1+mat2)
print("\nMultiplication\n", mat1*mat2)
print("\nDivision\n", mat1/mat2)
print("\nLineaer combination: 3*A - 2*B\n", 3*mat1-2*mat2)

print("\nAddition of a scalar (100)\n", 100+mat1)

print("\nExponentiation, matrix cubed here\n", mat1**3)
print("\nExponentiation, sq-root using pow function\n",pow(mat1,0.5))


1st Matrix of random single-digit numbers
 [[7 8 7]
 [7 7 4]
 [2 8 3]]

2nd Matrix of random single-digit numbers
 [[1 1 1]
 [1 6 7]
 [6 9 8]]

Addition
 [[ 8  9  8]
 [ 8 13 11]
 [ 8 17 11]]

Multiplication
 [[ 7  8  7]
 [ 7 42 28]
 [12 72 24]]

Division
 [[7.         8.         7.        ]
 [7.         1.16666667 0.57142857]
 [0.33333333 0.88888889 0.375     ]]

Lineaer combination: 3*A - 2*B
 [[19 22 19]
 [19  9 -2]
 [-6  6 -7]]

Addition of a scalar (100)
 [[107 108 107]
 [107 107 104]
 [102 108 103]]

Exponentiation, matrix cubed here
 [[343 512 343]
 [343 343  64]
 [  8 512  27]]

Exponentiation, sq-root using pow function
 [[2.64575131 2.82842712 2.64575131]
 [2.64575131 2.64575131 2.        ]
 [1.41421356 2.82842712 1.73205081]]


## Pandas series

In [23]:
import pandas as pd

In [24]:
labels = ['a','b','c']
my_data = [10,20,30]
arr = np.array(my_data)
d = {'a':10,'b':20,'c':30}

print ("Labels:", labels)
print("My data:", my_data)
print("Dictionary:", d)

Labels: ['a', 'b', 'c']
My data: [10, 20, 30]
Dictionary: {'a': 10, 'b': 20, 'c': 30}


In [25]:
s1=pd.Series(data=my_data)
print(s1)

0    10
1    20
2    30
dtype: int64


In [26]:
s2=pd.Series(data=my_data, index=labels)
print(s2)

a    10
b    20
c    30
dtype: int64


In [27]:
s3=pd.Series(arr, labels)
print(s3)

a    10
b    20
c    30
dtype: int64


In [28]:
s4=pd.Series(d)
print(s4)

a    10
b    20
c    30
dtype: int64


## Pandas DataFrame

In [29]:
matrix_data = np.random.randint(1,20,size=20).reshape(5,4)
row_labels = ['A','B','C','D','E']
column_headings = ['W','X','Y','Z']

df = pd.DataFrame(data=matrix_data, index=row_labels, columns=column_headings)
print("\nThe data frame looks like\n",'-'*45, sep='')
print(df)


The data frame looks like
---------------------------------------------
    W   X   Y   Z
A   2   3  16  19
B   2  16   5  18
C   1   9   1  16
D   4   2   3  12
E  10   1  19  16


In [30]:
d={'a':[10,20],'b':[30,40],'c':[50,60]}
df2=pd.DataFrame(data=d,index=['X','Y'])
print(df2)

    a   b   c
X  10  30  50
Y  20  40  60


## DataFrame can be created reading directly from a CSV or an Excel file

Refer to this article, that I wrote for O'Reily Media's Medium publication, to understand various data sources that can be read in Pandas DataFrame directly.

**[Read in the data in a Pandas DataFrame like an expert](https://medium.com/97-things/read-in-the-data-in-a-pandas-dataframe-like-an-expert-d03058edae98)**

In [32]:
df3 = pd.read_csv("wine.data.csv")

In [33]:
df3.head()

Unnamed: 0,Class,Alcohol,Malic acid,Ash,Alcalinity of ash,Magnesium,Total phenols,Flavanoids,Nonflavanoid phenols,Proanthocyanins,Color intensity,Hue,OD280/OD315 of diluted wines,Proline
0,1,14.23,1.71,2.43,15.6,127,2.8,3.06,0.28,2.29,5.64,1.04,3.92,1065
1,1,13.2,1.78,2.14,11.2,100,2.65,2.76,0.26,1.28,4.38,1.05,3.4,1050
2,1,13.16,2.36,2.67,18.6,101,2.8,3.24,0.3,2.81,5.68,1.03,3.17,1185
3,1,14.37,1.95,2.5,16.8,113,3.85,3.49,0.24,2.18,7.8,0.86,3.45,1480
4,1,13.24,2.59,2.87,21.0,118,2.8,2.69,0.39,1.82,4.32,1.04,2.93,735


In [37]:
df4 = pd.read_excel("Height_Weight.xlsx")

In [38]:
df4

Unnamed: 0,Name,Height,Weight,Hometown
0,Ashley,155,140,Palo Alto
1,Robin,145,122,Fremont
2,Priyanka,152,131,Santa Clara
3,Youngchul,167,148,Cupertino
4,Aziz,161,139,San Francisco
5,Zoey,181,190,Hayward


## Quick checking DataFrames
* `.head()`
* `.tail()`
* `.sample()`
* `.info()`
* `.describe()`

In [39]:
df3.head()

Unnamed: 0,Class,Alcohol,Malic acid,Ash,Alcalinity of ash,Magnesium,Total phenols,Flavanoids,Nonflavanoid phenols,Proanthocyanins,Color intensity,Hue,OD280/OD315 of diluted wines,Proline
0,1,14.23,1.71,2.43,15.6,127,2.8,3.06,0.28,2.29,5.64,1.04,3.92,1065
1,1,13.2,1.78,2.14,11.2,100,2.65,2.76,0.26,1.28,4.38,1.05,3.4,1050
2,1,13.16,2.36,2.67,18.6,101,2.8,3.24,0.3,2.81,5.68,1.03,3.17,1185
3,1,14.37,1.95,2.5,16.8,113,3.85,3.49,0.24,2.18,7.8,0.86,3.45,1480
4,1,13.24,2.59,2.87,21.0,118,2.8,2.69,0.39,1.82,4.32,1.04,2.93,735


In [40]:
df3.head(3)

Unnamed: 0,Class,Alcohol,Malic acid,Ash,Alcalinity of ash,Magnesium,Total phenols,Flavanoids,Nonflavanoid phenols,Proanthocyanins,Color intensity,Hue,OD280/OD315 of diluted wines,Proline
0,1,14.23,1.71,2.43,15.6,127,2.8,3.06,0.28,2.29,5.64,1.04,3.92,1065
1,1,13.2,1.78,2.14,11.2,100,2.65,2.76,0.26,1.28,4.38,1.05,3.4,1050
2,1,13.16,2.36,2.67,18.6,101,2.8,3.24,0.3,2.81,5.68,1.03,3.17,1185


In [41]:
df3.tail(7)

Unnamed: 0,Class,Alcohol,Malic acid,Ash,Alcalinity of ash,Magnesium,Total phenols,Flavanoids,Nonflavanoid phenols,Proanthocyanins,Color intensity,Hue,OD280/OD315 of diluted wines,Proline
171,3,12.77,2.39,2.28,19.5,86,1.39,0.51,0.48,0.64,9.899999,0.57,1.63,470
172,3,14.16,2.51,2.48,20.0,91,1.68,0.7,0.44,1.24,9.7,0.62,1.71,660
173,3,13.71,5.65,2.45,20.5,95,1.68,0.61,0.52,1.06,7.7,0.64,1.74,740
174,3,13.4,3.91,2.48,23.0,102,1.8,0.75,0.43,1.41,7.3,0.7,1.56,750
175,3,13.27,4.28,2.26,20.0,120,1.59,0.69,0.43,1.35,10.2,0.59,1.56,835
176,3,13.17,2.59,2.37,20.0,120,1.65,0.68,0.53,1.46,9.3,0.6,1.62,840
177,3,14.13,4.1,2.74,24.5,96,2.05,0.76,0.56,1.35,9.2,0.61,1.6,560


In [42]:
df3.sample(5)

Unnamed: 0,Class,Alcohol,Malic acid,Ash,Alcalinity of ash,Magnesium,Total phenols,Flavanoids,Nonflavanoid phenols,Proanthocyanins,Color intensity,Hue,OD280/OD315 of diluted wines,Proline
123,2,13.05,5.8,2.13,21.5,86,2.62,2.65,0.3,2.01,2.6,0.73,3.1,380
146,3,13.88,5.04,2.23,20.0,80,0.98,0.34,0.4,0.68,4.9,0.58,1.33,415
176,3,13.17,2.59,2.37,20.0,120,1.65,0.68,0.53,1.46,9.3,0.6,1.62,840
62,2,13.67,1.25,1.92,18.0,94,2.1,1.79,0.32,0.73,3.8,1.23,2.46,630
140,3,12.93,2.81,2.7,21.0,96,1.54,0.5,0.53,0.75,4.6,0.77,2.31,600


In [43]:
df3.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 178 entries, 0 to 177
Data columns (total 14 columns):
 #   Column                        Non-Null Count  Dtype  
---  ------                        --------------  -----  
 0   Class                         178 non-null    int64  
 1   Alcohol                       178 non-null    float64
 2   Malic acid                    178 non-null    float64
 3   Ash                           178 non-null    float64
 4   Alcalinity of ash             178 non-null    float64
 5   Magnesium                     178 non-null    int64  
 6   Total phenols                 178 non-null    float64
 7   Flavanoids                    178 non-null    float64
 8   Nonflavanoid phenols          178 non-null    float64
 9   Proanthocyanins               178 non-null    float64
 10  Color intensity               178 non-null    float64
 11  Hue                           178 non-null    float64
 12  OD280/OD315 of diluted wines  178 non-null    float64
 13  Proli

In [44]:
df4.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 6 entries, 0 to 5
Data columns (total 4 columns):
 #   Column    Non-Null Count  Dtype 
---  ------    --------------  ----- 
 0   Name      6 non-null      object
 1   Height    6 non-null      int64 
 2   Weight    6 non-null      int64 
 3   Hometown  6 non-null      object
dtypes: int64(2), object(2)
memory usage: 320.0+ bytes


In [45]:
df3.describe().transpose()

Unnamed: 0,count,mean,std,min,25%,50%,75%,max
Class,178.0,1.938202,0.775035,1.0,1.0,2.0,3.0,3.0
Alcohol,178.0,13.000618,0.811827,11.03,12.3625,13.05,13.6775,14.83
Malic acid,178.0,2.336348,1.117146,0.74,1.6025,1.865,3.0825,5.8
Ash,178.0,2.366517,0.274344,1.36,2.21,2.36,2.5575,3.23
Alcalinity of ash,178.0,19.494944,3.339564,10.6,17.2,19.5,21.5,30.0
Magnesium,178.0,99.741573,14.282484,70.0,88.0,98.0,107.0,162.0
Total phenols,178.0,2.295112,0.625851,0.98,1.7425,2.355,2.8,3.88
Flavanoids,178.0,2.02927,0.998859,0.34,1.205,2.135,2.875,5.08
Nonflavanoid phenols,178.0,0.361854,0.124453,0.13,0.27,0.34,0.4375,0.66
Proanthocyanins,178.0,1.590899,0.572359,0.41,1.25,1.555,1.95,3.58


In [46]:
df4.describe()

Unnamed: 0,Height,Weight
count,6.0,6.0
mean,160.166667,145.0
std,12.687264,23.748684
min,145.0,122.0
25%,152.75,133.0
50%,158.0,139.5
75%,165.5,146.0
max,181.0,190.0


## Basic descriptive statistics on a DataFrame
* `mean()`
* `std()`
* `var()`
* `min()` and `max()`

In [47]:
df3.mean()

Class                             1.938202
Alcohol                          13.000618
Malic acid                        2.336348
Ash                               2.366517
Alcalinity of ash                19.494944
Magnesium                        99.741573
Total phenols                     2.295112
Flavanoids                        2.029270
Nonflavanoid phenols              0.361854
Proanthocyanins                   1.590899
Color intensity                   5.058090
Hue                               0.957449
OD280/OD315 of diluted wines      2.611685
Proline                         746.893258
dtype: float64

In [48]:
df3.std()

Class                             0.775035
Alcohol                           0.811827
Malic acid                        1.117146
Ash                               0.274344
Alcalinity of ash                 3.339564
Magnesium                        14.282484
Total phenols                     0.625851
Flavanoids                        0.998859
Nonflavanoid phenols              0.124453
Proanthocyanins                   0.572359
Color intensity                   2.318286
Hue                               0.228572
OD280/OD315 of diluted wines      0.709990
Proline                         314.907474
dtype: float64

In [49]:
df4.var()

TypeError: could not convert string to float: 'Ashley'

In [None]:
df4.min()

Name           Ashley
Height            145
Weight            122
Hometown    Cupertino
dtype: object

## Indexing, slicing columns and rows of a DataFrame

In [50]:
print("\nThe 'Name' column\n",'-'*25, sep='')
print(df4['Name'])
print("\nType of the column: ", type(df4['Name']), sep='')
print("\nThe 'Name' and 'Weight' columns indexed by passing a list\n",'-'*55, sep='')
print(df4[['Name','Weight']])
print("\nType of the pair of columns: ", type(df4[['Name','Weight']]), sep='')


The 'Name' column
-------------------------
0       Ashley
1        Robin
2     Priyanka
3    Youngchul
4         Aziz
5         Zoey
Name: Name, dtype: object

Type of the column: <class 'pandas.core.series.Series'>

The 'Name' and 'Weight' columns indexed by passing a list
-------------------------------------------------------
        Name  Weight
0     Ashley     140
1      Robin     122
2   Priyanka     131
3  Youngchul     148
4       Aziz     139
5       Zoey     190

Type of the pair of columns: <class 'pandas.core.frame.DataFrame'>


In [51]:
print("\nLabel-based 'loc' method can be used for selecting row(s)\n",'-'*60, sep='')
print("\nSingle row\n")
print(df.loc['C'])
print("\nMultiple rows\n")
print(df.loc[['B','C']])
print("\nIndex position based 'iloc' method can be used for selecting row(s)\n",'-'*70, sep='')
print("\nSingle row\n")
print(df.iloc[2])
print("\nMultiple rows\n")
print(df.iloc[[1,2]])


Label-based 'loc' method can be used for selecting row(s)
------------------------------------------------------------

Single row

W     1
X     9
Y     1
Z    16
Name: C, dtype: int64

Multiple rows

   W   X  Y   Z
B  2  16  5  18
C  1   9  1  16

Index position based 'iloc' method can be used for selecting row(s)
----------------------------------------------------------------------

Single row

W     1
X     9
Y     1
Z    16
Name: C, dtype: int64

Multiple rows

   W   X  Y   Z
B  2  16  5  18
C  1   9  1  16


## Conditional subsetting

In [52]:
df4['Height']>155

0    False
1    False
2    False
3     True
4     True
5     True
Name: Height, dtype: bool

In [53]:
df4[df4['Height']>155]

Unnamed: 0,Name,Height,Weight,Hometown
3,Youngchul,167,148,Cupertino
4,Aziz,161,139,San Francisco
5,Zoey,181,190,Hayward


Which students have a **height more than 155 cm and weigh less than 140 lbs**?

In [54]:
df4[(df4['Height']>155) & (df4['Weight']<140)]

Unnamed: 0,Name,Height,Weight,Hometown
4,Aziz,161,139,San Francisco


## Operations on specific columns/rows

In [55]:
df3.head()

Unnamed: 0,Class,Alcohol,Malic acid,Ash,Alcalinity of ash,Magnesium,Total phenols,Flavanoids,Nonflavanoid phenols,Proanthocyanins,Color intensity,Hue,OD280/OD315 of diluted wines,Proline
0,1,14.23,1.71,2.43,15.6,127,2.8,3.06,0.28,2.29,5.64,1.04,3.92,1065
1,1,13.2,1.78,2.14,11.2,100,2.65,2.76,0.26,1.28,4.38,1.05,3.4,1050
2,1,13.16,2.36,2.67,18.6,101,2.8,3.24,0.3,2.81,5.68,1.03,3.17,1185
3,1,14.37,1.95,2.5,16.8,113,3.85,3.49,0.24,2.18,7.8,0.86,3.45,1480
4,1,13.24,2.59,2.87,21.0,118,2.8,2.69,0.39,1.82,4.32,1.04,2.93,735


#### What is the standard deviation of Magnesium and Ash contents for the wine dataset?

In [56]:
df3[['Magnesium','Ash']].std()

Magnesium    14.282484
Ash           0.274344
dtype: float64

#### What is the range of alcohol content in the wine dataset?

In [57]:
range_alcohol=df3['Alcohol'].max()- df3['Alcohol'].min()
print("The range of alcohol content is: ", round(range_alcohol,3))

The range of alcohol content is:  3.8


#### Top 5 percentile in terms of Flavanoids?

In [58]:
np.percentile(df3['Flavanoids'],95)

3.4975000000000005

In [59]:
df3[df3['Flavanoids']>=3.4975]

Unnamed: 0,Class,Alcohol,Malic acid,Ash,Alcalinity of ash,Magnesium,Total phenols,Flavanoids,Nonflavanoid phenols,Proanthocyanins,Color intensity,Hue,OD280/OD315 of diluted wines,Proline
13,1,14.75,1.73,2.39,11.4,91,3.1,3.69,0.43,2.81,5.4,1.25,2.73,1150
14,1,14.38,1.87,2.38,12.0,102,3.3,3.64,0.29,2.96,7.5,1.2,3.0,1547
18,1,14.19,1.59,2.48,16.5,108,3.3,3.93,0.32,1.86,8.7,1.23,2.82,1680
42,1,13.88,1.89,2.59,15.0,101,3.25,3.56,0.17,1.7,5.43,0.88,3.56,1095
49,1,13.94,1.73,2.27,17.4,108,2.88,3.54,0.32,2.08,8.9,1.12,3.1,1260
52,1,13.82,1.75,2.42,14.0,111,3.88,3.74,0.32,1.87,7.05,1.01,3.26,1190
58,1,13.72,1.43,2.5,16.7,108,3.4,3.67,0.19,2.04,6.8,0.89,2.87,1285
98,2,12.37,1.07,2.1,18.5,88,3.52,3.75,0.24,1.95,4.5,1.04,2.77,660
121,2,11.56,2.05,3.23,28.5,119,3.18,5.08,0.47,1.87,6.0,0.93,3.69,465


**Show the average alcohol, ash, and magnesium content of the wine brands which rank top 5 percent in terms of flavanoids**

In [60]:
df3[df3['Flavanoids']>=3.4975][['Ash','Alcohol','Magnesium']].mean()

Ash            2.484444
Alcohol       13.623333
Magnesium    104.000000
dtype: float64

## Create a new column as a function of mathematical operations on existing columns

In [61]:
df4

Unnamed: 0,Name,Height,Weight,Hometown
0,Ashley,155,140,Palo Alto
1,Robin,145,122,Fremont
2,Priyanka,152,131,Santa Clara
3,Youngchul,167,148,Cupertino
4,Aziz,161,139,San Francisco
5,Zoey,181,190,Hayward


In [62]:
df4['BMI']=df4['Weight']*0.453592/(df4['Height']/100)**2
df4

Unnamed: 0,Name,Height,Weight,Hometown,BMI
0,Ashley,155,140,Palo Alto,26.432
1,Robin,145,122,Fremont,26.320202
2,Priyanka,152,131,Santa Clara,25.718729
3,Youngchul,167,148,Cupertino,24.071001
4,Aziz,161,139,San Francisco,24.323633
5,Zoey,181,190,Hayward,26.306425


In [63]:
df4.sort_values(by='BMI')

Unnamed: 0,Name,Height,Weight,Hometown,BMI
3,Youngchul,167,148,Cupertino,24.071001
4,Aziz,161,139,San Francisco,24.323633
2,Priyanka,152,131,Santa Clara,25.718729
5,Zoey,181,190,Hayward,26.306425
1,Robin,145,122,Fremont,26.320202
0,Ashley,155,140,Palo Alto,26.432


## Use `inplace=True` to make the changes reflected on the original DataFrame

In [64]:
df4

Unnamed: 0,Name,Height,Weight,Hometown,BMI
0,Ashley,155,140,Palo Alto,26.432
1,Robin,145,122,Fremont,26.320202
2,Priyanka,152,131,Santa Clara,25.718729
3,Youngchul,167,148,Cupertino,24.071001
4,Aziz,161,139,San Francisco,24.323633
5,Zoey,181,190,Hayward,26.306425


In [65]:
df4.sort_values(by='BMI',inplace=True)

In [66]:
df4

Unnamed: 0,Name,Height,Weight,Hometown,BMI
3,Youngchul,167,148,Cupertino,24.071001
4,Aziz,161,139,San Francisco,24.323633
2,Priyanka,152,131,Santa Clara,25.718729
5,Zoey,181,190,Hayward,26.306425
1,Robin,145,122,Fremont,26.320202
0,Ashley,155,140,Palo Alto,26.432
