In [1]:
import pandas as pd
import numpy as np

# Primer
* When selecting data in a dataframe, using square brackets is fairly straightforward e.g. `data[row]` or `data[[columns]]`
* But their functionality is quite limited as you can only choose either row/columns
* ofen you need to select both columns and rows
    * we can do this with iloc and loc

## What do they do: loc 
  * The loc method allows us to select rows and columns of your data based on labels. 
  * First, you specify the row labels to the left side, then you specify the column labels to the right side.

  
  
## What do they do: iloc
  * The iloc allows us the same thing but based on the integer positions of our DataFrame.
  * If we want to select all rows or columns we can simply type : for the rows or for the columns side. 
  * Also if we want to select specific rows but all columns, we can just pass only the rows labels.

## Create sample dataset

In [2]:
data = pd.DataFrame({
    'age' :     [ 10, 22, 13, 21, 12, 11, 17],
    'section' : [ 'A', 'B', 'C', 'B', 'B', 'A', 'A'],
    'city' :    [ 'Gurgaon', 'Delhi', 'Mumbai', 'Delhi', 'Mumbai', 'Delhi', 'Mumbai'],
    'gender' :  [ 'M', 'F', 'F', 'M', 'M', 'M', 'F'],
    'favourite_color' : [ 'red', np.NAN, 'yellow', np.NAN, 'black', 'green', 'red']
})

# view the data
data

Unnamed: 0,age,section,city,gender,favourite_color
0,10,A,Gurgaon,M,red
1,22,B,Delhi,F,
2,13,C,Mumbai,F,yellow
3,21,B,Delhi,M,
4,12,B,Mumbai,M,black
5,11,A,Delhi,M,green
6,17,A,Mumbai,F,red


## Find all the rows based on any condition in a column


In [3]:
## find all rows where age is greater than 15

data.loc[data.age>=15]

Unnamed: 0,age,section,city,gender,favourite_color
1,22,B,Delhi,F,
3,21,B,Delhi,M,
6,17,A,Mumbai,F,red


In [5]:
## iloc even number positions
data.iloc[0::2]

Unnamed: 0,age,section,city,gender,favourite_color
0,10,A,Gurgaon,M,red
2,13,C,Mumbai,F,yellow
4,12,B,Mumbai,M,black
6,17,A,Mumbai,F,red


## Find all the rows based on any condition in a column

In [6]:
# find all rows where age <12 and gender is male
data.loc[(data.age >= 12) & (data.gender == 'M')]

Unnamed: 0,age,section,city,gender,favourite_color
3,21,B,Delhi,M,
4,12,B,Mumbai,M,black


## Select a range of rows using loc


In [7]:
# index 1 to 3
data.loc[1:3]

Unnamed: 0,age,section,city,gender,favourite_color
1,22,B,Delhi,F,
2,13,C,Mumbai,F,yellow
3,21,B,Delhi,M,


## Select only required columns with a condition

In [8]:
# select few columns with a condition
data.loc[(data.age >= 12), ['city', 'gender']]

Unnamed: 0,city,gender
1,Delhi,F
2,Mumbai,F
3,Delhi,M
4,Mumbai,M
6,Mumbai,F


## Update the values of a particular column on selected rows
* For example, if the values in age are greater than equal to 12, then we want to update the values of the column section to be “M”.
* You could do this with a for loop but that takes ages, loc is much faster!
* <b> What do we have to do? </b>
    * specify the condition
    * specify the target column
    * assign the value with which we want to update


In [9]:
# change the section to M if age >=12
data.loc[(data.age >= 12), ['section']] = 'M'
data

Unnamed: 0,age,section,city,gender,favourite_color
0,10,A,Gurgaon,M,red
1,22,M,Delhi,F,
2,13,M,Mumbai,F,yellow
3,21,M,Delhi,M,
4,12,M,Mumbai,M,black
5,11,A,Delhi,M,green
6,17,M,Mumbai,F,red


## Update the values of multiple columns on selected rows
* If we want to update multiple columns with different values, then we can use the below syntax.
* In this example, if the value in the column age is greater than 20, then the loc function will update the values in the column section with “S” and the values in the column city with Pune:


In [10]:
# Update the section AND city if meet this condition
data.loc[(data.age>=20),['section','city']] = ['S','Pune']

In [11]:
data

Unnamed: 0,age,section,city,gender,favourite_color
0,10,A,Gurgaon,M,red
1,22,S,Pune,F,
2,13,M,Mumbai,F,yellow
3,21,S,Pune,M,
4,12,M,Mumbai,M,black
5,11,A,Delhi,M,green
6,17,M,Mumbai,F,red


## Select rows with indices using iloc
* specify with integer index

In [12]:
# e.g. select 1st and 3rd row
data.iloc[[0,2]]

Unnamed: 0,age,section,city,gender,favourite_color
0,10,A,Gurgaon,M,red
2,13,M,Mumbai,F,yellow


In [13]:
data.iloc[0,2]

'Gurgaon'

## Select rows with particular indices and particular columns

* We can select a few columns just like loc but instead of providing the column name, we provide the index number of the column

In [14]:
# rows 0 and 2, columns 1-3
data.iloc[[0,2],[1,3]]

Unnamed: 0,section,gender
0,A,M
2,M,F


## Select a range of rows using iloc
* we can slice with iloc too!
* just need to provide start_index and end_index+1. 

In [15]:
data.iloc[1:3]

Unnamed: 0,age,section,city,gender,favourite_color
1,22,S,Pune,F,
2,13,M,Mumbai,F,yellow


## Select a range of rows and columns using iloc
* Slice the data frame over both rows and columns. In the below example, we selected the rows from (1-2) and columns from (2-3).

In [16]:
data.iloc[1:3,2:4]

Unnamed: 0,city,gender
1,Pune,F
2,Mumbai,F
