# Selecting data from a DataFrame

Pandas DataFrames offer 2 methods that enable you to query for specific rows of data. 

`loc` allows you to select data using label based indices. In other words, it can be used when your DataFrame contains strings as indices.

`iloc` is similar, except that it allows you to query for rows using integer-based indices. This is much like how you would select data from a list, using position-based indices.


#### Import Dependencies

In [10]:
import pandas as pd
import os

#### Set the path to your file

In [11]:
file = os.path.join("..", "Resources","sampleData.csv")

#### Read the csv

In [17]:
df_original = pd.read_csv(file)
df_original.head(10)

Unnamed: 0,id,first_name,last_name,Phone Number,Time zone
0,1,Peter,Richardson,7-(789)867-9023,Europe/Moscow
1,2,Janice,Berry,86-(614)973-1727,Asia/Harbin
2,3,Andrea,Hudson,86-(918)527-6371,Asia/Shanghai
3,4,Arthur,Mcdonald,420-(553)779-7783,Europe/Prague
4,5,Kathy,Morales,351-(720)541-2124,Europe/Lisbon
5,6,Juan,Reyes,507-(957)942-8540,America/Panama
6,7,Joseph,Kim,62-(764)534-1192,Asia/Jakarta
7,8,Frances,Hudson,57-(752)864-4744,America/Bogota
8,9,Judy,Day,7-(863)797-2311,Europe/Moscow
9,10,Robert,Ford,92-(784)853-3450,Asia/Karachi


#### Set new index to *last_name*

`set_index()` allows you to establish one of the existing columns in your DataFrame to be the index. 

**NOTE:** Indices in a DataFrame do not have to be unique.

In [18]:
df = df_original.set_index("last_name")

#### Retrieve the data contained within the *Berry* row and the *Phone Number* column (Janice Berry's Phone Number)

Much like the `Cells()` function in VBA, these functions take **rows** first, and **columns** second.

In [24]:
print(f'Number using loc:{df.loc["Berry", "Phone Number"]}')
print('Number using iloc:{df.iloc[1,2]}')

Number using loc:86-(614)973-1727
Number using iloc:{df.iloc[1,2]}


#### Trying to select the first 5 rows of data with `loc()`

Trying to use label indices to select the first 5 rows of data may not always work. Pandas allows for duplicate indices, so you'll likely get more data than you anticipate.

In [25]:
df.loc[["Richardson", "Berry", "Hudson", "Mcdonald", "Morales",], ['id', 'first_name']]

Unnamed: 0_level_0,id,first_name
last_name,Unnamed: 1_level_1,Unnamed: 2_level_1
Richardson,1,Peter
Richardson,25,Donald
Berry,2,Janice
Hudson,3,Andrea
Hudson,8,Frances
Hudson,90,Norma
Mcdonald,4,Arthur
Morales,5,Kathy


#### Selecting the first 5 rows of data with `iloc`

`iloc` selects by numeric index, which is more precise since positional indices are always unique.

In [27]:
# df.iloc[0:5]
df.iloc[0:5,2:4]

Unnamed: 0_level_0,Phone Number,Time zone
last_name,Unnamed: 1_level_1,Unnamed: 2_level_1
Richardson,7-(789)867-9023,Europe/Moscow
Berry,86-(614)973-1727,Asia/Harbin
Hudson,86-(918)527-6371,Asia/Shanghai
Mcdonald,420-(553)779-7783,Europe/Prague
Morales,351-(720)541-2124,Europe/Lisbon


#### Select all rows for columns *first_name* and *Phone Number* using `iloc`

In [32]:
df.iloc[:,[1,2]]

Unnamed: 0_level_0,first_name,Phone Number
last_name,Unnamed: 1_level_1,Unnamed: 2_level_1
Richardson,Peter,7-(789)867-9023
Berry,Janice,86-(614)973-1727
Hudson,Andrea,86-(918)527-6371
Mcdonald,Arthur,420-(553)779-7783
Morales,Kathy,351-(720)541-2124
...,...,...
Henderson,Arthur,81-(353)751-4060
Riley,Christina,93-(374)749-5085
Green,Nicholas,86-(750)462-3375
Peters,Debra,86-(879)987-9025


#### Select all rows for columns *first_name* and *Phone Number* using `loc`

In [29]:
# All data
#df.loc[:,]

#First 20 rows
#df.loc[20:]

df.loc[:, ['first_name', 'Phone Number']]

Unnamed: 0_level_0,first_name,Phone Number
last_name,Unnamed: 1_level_1,Unnamed: 2_level_1
Richardson,Peter,7-(789)867-9023
Berry,Janice,86-(614)973-1727
Hudson,Andrea,86-(918)527-6371
Mcdonald,Arthur,420-(553)779-7783
Morales,Kathy,351-(720)541-2124
...,...,...
Henderson,Arthur,81-(353)751-4060
Riley,Christina,93-(374)749-5085
Green,Nicholas,86-(750)462-3375
Peters,Debra,86-(879)987-9025


#### Performing a conditional statement returns a series of boolean values

In [36]:
# df['first_name']
df['first_name'] == "Billy"

last_name
Richardson    False
Berry         False
Hudson        False
Mcdonald      False
Morales       False
              ...  
Henderson     False
Riley         False
Green         False
Peters        False
Murray        False
Name: first_name, Length: 100, dtype: bool

#### Using conditional statements allows for easy filtering

Using the same logic as above, we get only the rows where the first_name is Billy

In [41]:
df[df['first_name'] == "Billy"]

Unnamed: 0_level_0,id,first_name,Phone Number,Time zone
last_name,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
Clark,20,Billy,62-(213)345-2549,Asia/Makassar
Andrews,23,Billy,86-(859)746-5367,Asia/Chongqing
Price,59,Billy,86-(878)547-7739,Asia/Shanghai


#### With `loc` you use conditionals and column filters

In [42]:
df.loc[df['first_name'] == "Billy",['first_name', 'Phone Number']]

Unnamed: 0_level_0,first_name,Phone Number
last_name,Unnamed: 1_level_1,Unnamed: 2_level_1
Clark,Billy,62-(213)345-2549
Andrews,Billy,86-(859)746-5367
Price,Billy,86-(878)547-7739


#### Multiple conditions can be set to narrow or widen the filter

This filter selects for all entries where *first_name* equals "Billy" **OR** "Peter".

In [44]:
df[(df['first_name'] == 'Billy') | (df['first_name']=='Peter')]

Unnamed: 0_level_0,id,first_name,Phone Number,Time zone
last_name,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
Richardson,1,Peter,7-(789)867-9023,Europe/Moscow
Clark,20,Billy,62-(213)345-2549,Asia/Makassar
Andrews,23,Billy,86-(859)746-5367,Asia/Chongqing
Price,59,Billy,86-(878)547-7739,Asia/Shanghai


In [45]:
name = ['Billy', 'Peter', 'Arthur']
df[df['first_name'].isin(name)]

Unnamed: 0_level_0,id,first_name,Phone Number,Time zone
last_name,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
Richardson,1,Peter,7-(789)867-9023,Europe/Moscow
Mcdonald,4,Arthur,420-(553)779-7783,Europe/Prague
Clark,20,Billy,62-(213)345-2549,Asia/Makassar
Andrews,23,Billy,86-(859)746-5367,Asia/Chongqing
Price,59,Billy,86-(878)547-7739,Asia/Shanghai
Franklin,82,Arthur,86-(599)522-0287,Asia/Chongqing
Henderson,96,Arthur,81-(353)751-4060,Asia/Tokyo
