# Selecting data from a DataFrame

Pandas DataFrames offer 2 methods that enable you to query for specific rows of data. 

`loc` allows you to select data using label based indices. In other words, it can be used when your DataFrame contains strings as indices.

`iloc` is similar, except that it allows you to query for rows using integer-based indices. This is much like how you would select data from a list, using position-based indices.


#### Import Dependencies

In [None]:
import pandas as pd
import os

#### Set the path to your file

In [None]:
file = os.path.join("..", "Resources","sampleData.csv")

#### Read the csv

In [None]:
df_original = pd.read_csv(file)
df_original.head(10)

#### Set new index to *last_name*

`set_index()` allows you to establish one of the existing columns in your DataFrame to be the index. 

**NOTE:** Indices in a DataFrame do not have to be unique.

In [None]:
df = df_original.set_index("last_name")
df.head()

#### Retrieve the data contained within the *Berry* row and the *Phone Number* column

Much like the `Cells()` function in VBA, these functions take **rows** first, and **columns** second.

In [None]:
berry_phone = df.loc["Berry", "Phone Number"]
print("Using Loc: " + berry_phone)

also_berry_phone = df.iloc[1, 2]
print("Using Iloc: " + also_berry_phone)

#### Trying to select the first 5 rows of data with `loc`

Trying to use label indices to select the first 5 rows of data may not always work. Pandas allows for duplicate indices, so you'll likely get more data than you anticipate.

In [None]:
richardson_to_morales = df.loc[["Richardson", "Berry", "Hudson", "Mcdonald", "Morales"], ["id", "first_name", "Phone Number"]]
richardson_to_morales

#### Selecting the first 5 rows of data with `iloc`

`iloc` selects by numeric index, which is more precise since positional indices are always unique.

In [None]:
also_richardson_to_morales = df.iloc[0:5, 0:3]
also_richardson_to_morales

#### Select all rows for columns *first_name* and *Phone Number* using `iloc`

In [None]:
df.iloc[:, 1:3]

#### Select all rows for columns *first_name* and *Phone Number* using `loc`

In [None]:
df.loc[:, ["first_name", "Phone Number"]].head()

#### Performing a conditional statement returns a series of boolean values

In [None]:
named_billy = df["first_name"] == "Billy"
named_billy.head()

#### Using conditional statements allows for easy filtering

Using the same logic as above, we get only the rows where the first_name is Billy

In [None]:
only_billys = df[df['first_name'] == "Billy"]
only_billys

#### With `loc` you use conditionals and column filters

In [None]:
billys_name_and_phone = df.loc[df["first_name"] == "Billy", ['first_name', 'Phone Number']]

#### Multiple conditions can be set to narrow or widen the filter

This filter selects for all entries where *first_name* equals "Billy" **OR** "Peter".

In [None]:
only_billy_and_peter = df[(df["first_name"] == "Billy") | (df["first_name"] == "Peter")]
only_billy_and_peter