<h1>Table of Contents<span class="tocSkip"></span></h1>
<div class="toc"><ul class="toc-item"><li><span><a href="#Pandas-loc-and-iloc-for-selecting-data" data-toc-modified-id="Pandas-loc-and-iloc-for-selecting-data-1"><span class="toc-item-num">1&nbsp;&nbsp;</span>Pandas <code>loc</code> and <code>iloc</code> for selecting data</a></span><ul class="toc-item"><li><span><a href="#1.-Differences-between-loc-and-iloc" data-toc-modified-id="1.-Differences-between-loc-and-iloc-1.1"><span class="toc-item-num">1.1&nbsp;&nbsp;</span>1. Differences between loc and iloc</a></span></li><li><span><a href="#2.-Selecting-via-a-single-value" data-toc-modified-id="2.-Selecting-via-a-single-value-1.2"><span class="toc-item-num">1.2&nbsp;&nbsp;</span>2. Selecting via a single value</a></span></li><li><span><a href="#3.-Selecting-via-a-list-of-values" data-toc-modified-id="3.-Selecting-via-a-list-of-values-1.3"><span class="toc-item-num">1.3&nbsp;&nbsp;</span>3. Selecting via a list of values</a></span></li><li><span><a href="#4.-Selecting-a-range-of-data-via-slice" data-toc-modified-id="4.-Selecting-a-range-of-data-via-slice-1.4"><span class="toc-item-num">1.4&nbsp;&nbsp;</span>4. Selecting a range of data via slice</a></span></li><li><span><a href="#5.-Selecting-via-conditions-and-callable" data-toc-modified-id="5.-Selecting-via-conditions-and-callable-1.5"><span class="toc-item-num">1.5&nbsp;&nbsp;</span>5. Selecting via conditions and callable</a></span><ul class="toc-item"><li><span><a href="#5.2-Conditions" data-toc-modified-id="5.2-Conditions-1.5.1"><span class="toc-item-num">1.5.1&nbsp;&nbsp;</span>5.2 Conditions</a></span></li><li><span><a href="#5.2-Callable" data-toc-modified-id="5.2-Callable-1.5.2"><span class="toc-item-num">1.5.2&nbsp;&nbsp;</span>5.2 Callable</a></span></li></ul></li><li><span><a href="#6.-loc-and-iloc-are-interchangeable-when-labels-are-0-based-integers" data-toc-modified-id="6.-loc-and-iloc-are-interchangeable-when-labels-are-0-based-integers-1.6"><span class="toc-item-num">1.6&nbsp;&nbsp;</span>6. <code>loc</code> and <code>iloc</code> are interchangeable when labels are 0-based integers</a></span></li><li><span><a href="#Exercise" data-toc-modified-id="Exercise-1.7"><span class="toc-item-num">1.7&nbsp;&nbsp;</span>Exercise</a></span></li></ul></li></ul></div>

# Pandas `loc` and `iloc` for selecting data

This is a notebook for the medium article [How to use `loc` and `iloc` for selecting data in Pandas](https://bindichen.medium.com/how-to-use-loc-and-iloc-for-selecting-data-in-pandas-bd09cb4c3d79)

Please check out article for instructions

**License**: [BSD 2-Clause](https://opensource.org/licenses/BSD-2-Clause)

In [1]:
import pandas as pd

In [2]:
data = {
    'Weather': ['Sunny','Sunny','Sunny','Cloudy','Shower','Shower','Sunny'], 
    'Temperature': [78,76,78,68,70,71,82],
    'Wind': [13,28,16,11,26,27,20],
    'Humidity': [30,96,20,22,79,62,10],
}
df = pd.DataFrame(data, index = ['Mon', 'Tue', 'Wed', 'Thu', 'Fri', 'Sat', 'Sun'])
df

Unnamed: 0,Weather,Temperature,Wind,Humidity
Mon,Sunny,78,13,30
Tue,Sunny,76,28,96
Wed,Sunny,78,16,20
Thu,Cloudy,68,11,22
Fri,Shower,70,26,79
Sat,Shower,71,27,62
Sun,Sunny,82,20,10


## 1. Differences between loc and iloc

The main distinction between `loc` and `iloc` is:
* `loc` is label-based, which means that you have to specify rows and columns based on their row and column labels. 
* `iloc` is integer position-based, so you have to specify rows and columns by their integer position values (0-based integer position).

## 2. Selecting via a single value 

In [3]:
df

Unnamed: 0,Weather,Temperature,Wind,Humidity
Mon,Sunny,78,13,30
Tue,Sunny,76,28,96
Wed,Sunny,78,16,20
Thu,Cloudy,68,11,22
Fri,Shower,70,26,79
Sat,Shower,71,27,62
Sun,Sunny,82,20,10


To get Fridays' temperature

In [4]:
# Pass label to `loc`
df.loc['Fri', 'Temperature']

70

In [5]:
# The equivalent `iloc` statement should take row number 4 and column number 1
df.iloc[4, 1]

70

Use `:` to return all data

In [6]:
# To get all rows
df.loc[:, 'Temperature']

Mon    78
Tue    76
Wed    78
Thu    68
Fri    70
Sat    71
Sun    82
Name: Temperature, dtype: int64

In [7]:
# The equivalent `iloc` statement
df.iloc[:, 1]

Mon    78
Tue    76
Wed    78
Thu    68
Fri    70
Sat    71
Sun    82
Name: Temperature, dtype: int64

In [8]:
# To get all columns
df.loc['Fri', :]

Weather        Shower
Temperature        70
Wind               26
Humidity           79
Name: Fri, dtype: object

In [9]:
# The equivalent `iloc` statement
df.iloc[4, :]

Weather        Shower
Temperature        70
Wind               26
Humidity           79
Name: Fri, dtype: object

## 3. Selecting via a list of values

In [10]:
df

Unnamed: 0,Weather,Temperature,Wind,Humidity
Mon,Sunny,78,13,30
Tue,Sunny,76,28,96
Wed,Sunny,78,16,20
Thu,Cloudy,68,11,22
Fri,Shower,70,26,79
Sat,Shower,71,27,62
Sun,Sunny,82,20,10


In [11]:
# Multiple rows
df.loc[['Thu', 'Fri'], 'Temperature']

Thu    68
Fri    70
Name: Temperature, dtype: int64

In [12]:
# Multiple columns
df.loc['Fri', ['Temperature', 'Wind']]

Temperature    70
Wind           26
Name: Fri, dtype: object

In [13]:
# Multiple rows using iloc
df.iloc[[3, 4], 1]

Thu    68
Fri    70
Name: Temperature, dtype: int64

In [14]:
# Multiple columns using iloc
df.iloc[4, [1, 2]]

Temperature    70
Wind           26
Name: Fri, dtype: object

In [15]:
# Multiple rows and columns
rows = ['Thu', 'Fri']
cols=['Temperature','Wind']

df.loc[rows, cols]

Unnamed: 0,Temperature,Wind
Thu,68,11
Fri,70,26


In [16]:
# the equivalent iloc statement
rows = [3, 4]
cols = [1, 2]
df.iloc[rows, cols]

Unnamed: 0,Temperature,Wind
Thu,68,11
Fri,70,26


## 4. Selecting a range of data via slice

For loc, we can use the syntax `A:B` to select data from label `A` to label `B` (Both `A` and `B` are included):

In [18]:
df

Unnamed: 0,Weather,Temperature,Wind,Humidity
Mon,Sunny,78,13,30
Tue,Sunny,76,28,96
Wed,Sunny,78,16,20
Thu,Cloudy,68,11,22
Fri,Shower,70,26,79
Sat,Shower,71,27,62
Sun,Sunny,82,20,10


In [17]:
# Slicing column labels
rows=['Thu', 'Fri']
df.loc[rows, 'Temperature':'Humidity' ]

Unnamed: 0,Temperature,Wind,Humidity
Thu,68,11,22
Fri,70,26,79


In [19]:
# Slicing row labels
cols = ['Temperature', 'Wind']
df.loc['Mon':'Thu', cols]

Unnamed: 0,Temperature,Wind
Mon,78,13
Tue,76,28
Wed,78,16
Thu,68,11


We can use the syntax `A:B:S` to select data from label `A` to label `B` with step size `S` (Both `A` and `B` are included):

In [20]:
# Slicing with step
df.loc['Mon':'Fri':2 , :]

Unnamed: 0,Weather,Temperature,Wind,Humidity
Mon,Sunny,78,13,30
Wed,Sunny,78,16,20
Fri,Shower,70,26,79


With iloc, we can also use the syntax `n:m` to select data from position `n` (included) to position `m` (excluded).

In [21]:
df.iloc[[1, 2], 0 : 3]

Unnamed: 0,Weather,Temperature,Wind
Tue,Sunny,76,28
Wed,Sunny,78,16


In [22]:
df.iloc[0:4:2, :]

Unnamed: 0,Weather,Temperature,Wind,Humidity
Mon,Sunny,78,13,30
Wed,Sunny,78,16,20


## 5. Selecting via conditions and callable

### 5.2 Conditions

In [23]:
df

Unnamed: 0,Weather,Temperature,Wind,Humidity
Mon,Sunny,78,13,30
Tue,Sunny,76,28,96
Wed,Sunny,78,16,20
Thu,Cloudy,68,11,22
Fri,Shower,70,26,79
Sat,Shower,71,27,62
Sun,Sunny,82,20,10


In [24]:
# One condition
df.loc[df.Humidity > 50, :]

Unnamed: 0,Weather,Temperature,Wind,Humidity
Tue,Sunny,76,28,96
Fri,Shower,70,26,79
Sat,Shower,71,27,62


In [25]:
## multiple conditions
df.loc[
    (df.Humidity > 50) & (df.Weather == 'Shower'), 
    ['Temperature','Wind'],
]

Unnamed: 0,Temperature,Wind
Fri,70,26
Sat,71,27


In [None]:
# Getting ValueError
#df.iloc[df.Humidity > 50, :]

In [28]:
# Single condition
df.iloc[list(df.Humidity > 50)]

Unnamed: 0,Weather,Temperature,Wind,Humidity
Tue,Sunny,76,28,96
Fri,Shower,70,26,79
Sat,Shower,71,27,62


In [29]:
## multiple conditions
df.iloc[
    list((df.Humidity > 50) & (df.Weather == 'Shower')), 
    :,
]

Unnamed: 0,Weather,Temperature,Wind,Humidity
Fri,Shower,70,26,79
Sat,Shower,71,27,62


### 5.2 Callable

In [30]:
# Selecting columns
df.loc[:, lambda df: ['Humidity', 'Wind']]

Unnamed: 0,Humidity,Wind
Mon,30,13
Tue,96,28
Wed,20,16
Thu,22,11
Fri,79,26
Sat,62,27
Sun,10,20


In [31]:
# With condition
df.loc[lambda df: df.Humidity > 50, :]

Unnamed: 0,Weather,Temperature,Wind,Humidity
Tue,Sunny,76,28,96
Fri,Shower,70,26,79
Sat,Shower,71,27,62


In [None]:
df.iloc[lambda df: [0,1], :]

In [None]:
df.iloc[lambda df: list(df.Humidity > 50), :]

## 6. `loc` and `iloc` are interchangeable when labels are 0-based integers

In [37]:
data = {
    'Weather': ['Sunny','Sunny','Sunny','Cloudy','Shower','Shower','Sunny'], 
    'Temperature': [78,76,78,68,70,71,82],
    'Wind': [13,28,16,11,26,27,20],
    'Humidity': [30,96,20,22,79,62,10],
}
df1 = pd.DataFrame(data, index = ['Mon', 'Tue', 'Wed', 'Thu', 'Fri', 'Sat', 'Sun'])
df1

Unnamed: 0,Weather,Temperature,Wind,Humidity
Mon,Sunny,78,13,30
Tue,Sunny,76,28,96
Wed,Sunny,78,16,20
Thu,Cloudy,68,11,22
Fri,Shower,70,26,79
Sat,Shower,71,27,62
Sun,Sunny,82,20,10


In [32]:
data = [
    ['Mon','Sunny',78,13,30],
    ['Tue','Sunny',76,28,96],
    ['Wed','Sunny',78,16,20],
    ['Thu','Cloudy',68,11,22],
    ['Fri','Shower',70,26,79],
    ['Sat','Shower',71,27,62],
    ['Sun','Sunny',82,20,10]]
df = pd.DataFrame(data)
df

Unnamed: 0,0,1,2,3,4
0,Mon,Sunny,78,13,30
1,Tue,Sunny,76,28,96
2,Wed,Sunny,78,16,20
3,Thu,Cloudy,68,11,22
4,Fri,Shower,70,26,79
5,Sat,Shower,71,27,62
6,Sun,Sunny,82,20,10


Now, `loc`, a label-based data selector, can accept a single integer and a list of integer values.

In [33]:
df.loc[1, 2]

76

In [34]:
df.loc[1, [1, 2]]

1    Sunny
2       76
Name: 1, dtype: object

`loc` and `iloc` are interchangeable when selecting via a single value or a list of values.

In [35]:
df.loc[1, 2] == df.iloc[1, 2]

True

In [36]:
df.loc[1, [1, 2]] == df.iloc[1, [1, 2]]

1    True
2    True
Name: 1, dtype: bool

## Exercise

In [38]:
import numpy as np
import pandas as pd
from numpy.random import randn
np.random.seed(1234)  
np.random.randint(1,100,6)
df = pd.DataFrame(randn(4,5), index=['IL','GA','MA','VT'],columns=['Sent','Used','Expired','Lost','Destroyed'])
df

Unnamed: 0,Sent,Used,Expired,Lost,Destroyed
IL,0.511316,0.384703,-0.370064,-0.428838,-0.398851
GA,0.368431,-2.073882,-0.161588,-0.085147,-0.659354
MA,0.824481,-0.877485,-0.256941,-0.925408,1.823615
VT,-2.2597,-0.170905,1.45385,1.887133,-1.756462


In [None]:
get all of MA data
get all of the expired data
what is the data for IL and GA that was lost or destroyed
For each state, what is the data for sent, expired and destroyed