# Pandas `loc` and `iloc` for selecting data

This is a notebook for the medium article [How to use `loc` and `iloc` for selecting data in Pandas](https://bindichen.medium.com/how-to-use-loc-and-iloc-for-selecting-data-in-pandas-bd09cb4c3d79)

Please check out article for instructions

**License**: [BSD 2-Clause](https://opensource.org/licenses/BSD-2-Clause)

In [2]:
import pandas as pd

In [3]:
df = pd.read_csv('data.csv', index_col=['Day'])
df

Unnamed: 0_level_0,Weather,Temperature,Wind,Humidity
Day,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
Mon,Sunny,12.79,13,30
Tue,Sunny,19.67,28,96
Wed,Sunny,17.51,16,20
Thu,Cloudy,14.44,11,22
Fri,Shower,10.51,26,79
Sat,Shower,11.07,27,62
Sun,Sunny,17.5,20,10


## 1. Differences between loc and iloc

The main distinction between `loc` and `iloc` is:
* `loc` is label-based, which means that you have to specify rows and columns based on their row and column labels. 
* `iloc` is integer position-based, so you have to specify rows and columns by their integer position values (0-based integer position).

## 2. Selecting via a single value 

To get Fridays' temperature

In [4]:
# Pass label to `loc`
df.loc['Fri', 'Temperature']

10.51

In [5]:
# The equivalent `iloc` statement should take row number 4 and column number 1
df.iloc[4, 1]

10.51

Use `:` to return all data

In [6]:
# To get all rows
df.loc[:, 'Temperature']

Day
Mon    12.79
Tue    19.67
Wed    17.51
Thu    14.44
Fri    10.51
Sat    11.07
Sun    17.50
Name: Temperature, dtype: float64

In [7]:
# The equivalent `iloc` statement
df.iloc[:, 1]

Day
Mon    12.79
Tue    19.67
Wed    17.51
Thu    14.44
Fri    10.51
Sat    11.07
Sun    17.50
Name: Temperature, dtype: float64

In [8]:
# To get all columns
df.loc['Fri', :]

Weather        Shower
Temperature     10.51
Wind               26
Humidity           79
Name: Fri, dtype: object

In [9]:
# The equivalent `iloc` statement
df.iloc[4, :]

Weather        Shower
Temperature     10.51
Wind               26
Humidity           79
Name: Fri, dtype: object

## 3. Selecting via a list of values

In [10]:
# Multiple rows
df.loc[['Thu', 'Fri'], 'Temperature']

Day
Thu    14.44
Fri    10.51
Name: Temperature, dtype: float64

In [11]:
# Multiple columns
df.loc['Fri', ['Temperature', 'Wind']]

Temperature    10.51
Wind              26
Name: Fri, dtype: object

In [12]:
# Multiple rows using iloc
df.iloc[[3, 4], 1]

Day
Thu    14.44
Fri    10.51
Name: Temperature, dtype: float64

In [13]:
# Multiple columns using iloc
df.iloc[4, [1, 2]]

Temperature    10.51
Wind              26
Name: Fri, dtype: object

In [14]:
# Multiple rows and columns
rows = ['Thu', 'Fri']
cols=['Temperature','Wind']

df.loc[rows, cols]

Unnamed: 0_level_0,Temperature,Wind
Day,Unnamed: 1_level_1,Unnamed: 2_level_1
Thu,14.44,11
Fri,10.51,26


In [15]:
# the equivalent iloc statement
rows = [3, 4]
cols = [1, 2]
df.iloc[rows, cols]

Unnamed: 0_level_0,Temperature,Wind
Day,Unnamed: 1_level_1,Unnamed: 2_level_1
Thu,14.44,11
Fri,10.51,26


## 4. Selecting a range of data via slice

For loc, we can use the syntax `A:B` to select data from label `A` to label `B` (Both `A` and `B` are included):

In [16]:
# Slicing column labels
rows=['Thu', 'Fri']
df.loc[rows, 'Temperature':'Humidity' ]

Unnamed: 0_level_0,Temperature,Wind,Humidity
Day,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
Thu,14.44,11,22
Fri,10.51,26,79


In [17]:
# Slicing row labels
cols = ['Temperature', 'Wind']
df.loc['Mon':'Thu', cols]

Unnamed: 0_level_0,Temperature,Wind
Day,Unnamed: 1_level_1,Unnamed: 2_level_1
Mon,12.79,13
Tue,19.67,28
Wed,17.51,16
Thu,14.44,11


We can use the syntax `A:B:S` to select data from label `A` to label `B` with step size `S` (Both `A` and `B` are included):

In [18]:
# Slicing with step
df.loc['Mon':'Fri':2 , :]

Unnamed: 0_level_0,Weather,Temperature,Wind,Humidity
Day,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
Mon,Sunny,12.79,13,30
Wed,Sunny,17.51,16,20
Fri,Shower,10.51,26,79


With iloc, we can also use the syntax `n:m` to select data from position `n` (included) to position `m` (excluded).

In [19]:
df.iloc[[1, 2], 0 : 3]

Unnamed: 0_level_0,Weather,Temperature,Wind
Day,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
Tue,Sunny,19.67,28
Wed,Sunny,17.51,16


In [20]:
df.iloc[0:4:2, :]

Unnamed: 0_level_0,Weather,Temperature,Wind,Humidity
Day,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
Mon,Sunny,12.79,13,30
Wed,Sunny,17.51,16,20


## 5. Selecting via conditions and callable

### 5.2 Conditions

In [21]:
(df.Humidity > 50).values

array([False,  True, False, False,  True,  True, False])

In [22]:
(df.Humidity>50).loc['Mon']

False

In [20]:
# One condition
df.loc[df.Humidity > 50, :]

Unnamed: 0_level_0,Weather,Temperature,Wind,Humidity
Day,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
Tue,Sunny,19.67,28,96
Fri,Shower,10.51,26,79
Sat,Shower,11.07,27,62


In [8]:
## multiple conditions
df.loc[
    (df.Humidity > 50) & (df.Weather == 'Shower'), 
    ['Temperature','Wind']
]

Unnamed: 0_level_0,Temperature,Wind
Day,Unnamed: 1_level_1,Unnamed: 2_level_1
Fri,10.51,26
Sat,11.07,27


In [22]:
# Getting ValueError
df.iloc[df.Humidity > 50, :]

ValueError: Location based indexing can only have [integer, integer slice (START point is INCLUDED, END point is EXCLUDED), listlike of integers, boolean array] types

In [23]:
# Single condition
df.iloc[list(df.Humidity > 50)]

Unnamed: 0_level_0,Weather,Temperature,Wind,Humidity
Day,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
Tue,Sunny,19.67,28,96
Fri,Shower,10.51,26,79
Sat,Shower,11.07,27,62


In [24]:
## multiple conditions
df.iloc[
    list((df.Humidity > 50) & (df.Weather == 'Shower')), 
    :,
]

Unnamed: 0_level_0,Weather,Temperature,Wind,Humidity
Day,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
Fri,Shower,10.51,26,79
Sat,Shower,11.07,27,62


### 5.2 Callable

In [25]:
# Selecting columns
df.loc[:, lambda df: ['Humidity', 'Wind']]

Unnamed: 0_level_0,Humidity,Wind
Day,Unnamed: 1_level_1,Unnamed: 2_level_1
Mon,30,13
Tue,96,28
Wed,20,16
Thu,22,11
Fri,79,26
Sat,62,27
Sun,10,20


In [26]:
# With condition
df.loc[lambda df: df.Humidity > 50, :]

Unnamed: 0_level_0,Weather,Temperature,Wind,Humidity
Day,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
Tue,Sunny,19.67,28,96
Fri,Shower,10.51,26,79
Sat,Shower,11.07,27,62


In [27]:
df.iloc[lambda df: [0,1], :]

Unnamed: 0_level_0,Weather,Temperature,Wind,Humidity
Day,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
Mon,Sunny,12.79,13,30
Tue,Sunny,19.67,28,96


In [28]:
df.iloc[lambda df: list(df.Humidity > 50), :]

Unnamed: 0_level_0,Weather,Temperature,Wind,Humidity
Day,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
Tue,Sunny,19.67,28,96
Fri,Shower,10.51,26,79
Sat,Shower,11.07,27,62


## 6. `loc` and `iloc` are interchangeable when labels are 0-based integers

In [29]:
df = pd.read_csv(
    'data/data.csv', 
    header=None, 
    skiprows=[0],
)
df

Unnamed: 0,0,1,2,3,4
0,Mon,Sunny,12.79,13,30
1,Tue,Sunny,19.67,28,96
2,Wed,Sunny,17.51,16,20
3,Thu,Cloudy,14.44,11,22
4,Fri,Shower,10.51,26,79
5,Sat,Shower,11.07,27,62
6,Sun,Sunny,17.5,20,10


Now, `loc`, a label-based data selector, can accept a single integer and a list of integer values.

In [30]:
df.loc[1, 2]

19.67

In [31]:
df.loc[1, [1, 2]]

1    Sunny
2    19.67
Name: 1, dtype: object

`loc` and `iloc` are interchangeable when selecting via a single value or a list of values.

In [32]:
df.loc[1, 2] == df.iloc[1, 2]

True

In [33]:
df.loc[1, [1, 2]] == df.iloc[1, [1, 2]]

1    True
2    True
Name: 1, dtype: bool