# `loc` and `iloc`

`loc` and `iloc` in pandas are technically properties that return a specialized indexer object. These indexer objects allow for label-based (`loc`) and integer-based (`iloc`) indexing. Although they behave similarly to methods in that you use them to perform actions (i.e., selecting or slicing data), they are properties because they return an indexing object rather than directly executing a function.

When you access `loc` or `iloc` on a `DataFrame`, what you're getting is not a simple value but an object that has its own methods and behaviors, specifically designed to handle indexing operations. This design allows for the concise and powerful data selection syntax that pandas is known for.

___

```{admonition} When to use loc or iloc

1. **Positional Selection**: When the selection is purely positional (selecting the first five rows, or columns 2 to 4), `iloc` is the most straightforward tool for the job.

2. **Inclusive Slicing:** `loc` allows for inclusive label-based slicing, simplifying the specification of row and column ranges.

3. **Label-based Complex Indexing**: For complex indexing scenarios where you need to select rows and columns based on their labels, `loc` is indispensable. It allows for intuitive and concise syntax that boolean masks cannot provide alone.

4. **Efficiency and Readability**: In cases where using a boolean mask leads to verbose or inefficient code, `loc` or `iloc` might offer a more elegant and performant solution. 

In summary, while boolean masks are powerful for filtering rows based on conditions, `loc` and `iloc` are essential for a wide range of other data selection tasks, particularly those involving specific column selection, positional indexing, and complex indexing scenarios that go beyond what boolean masks can achieve on their own.
```

## Import Pandas & create a `DataFrame`

Aliasing pandas as `pd` is a widely adopted convention that simplifies the syntax for accessing its functionalities.\
After this statement, you can use `pd` to access all the functionalities provided by the pandas library.

In [12]:
# This line imports the pandas library and aliases it as 'pd'.

import pandas as pd

### Create a `DataFrame` from csv

The `pd.read_csv()` function is used to read the data from the file 'data/titanic.csv'.\
The file is located in the 'data' directory, relative to the current working directory.\
The resulting `DataFrame` 'titanic' contains the dataset, ready for analysis and manipulation.

In [19]:
# Load the Titanic dataset from a CSV file into a DataFrame named 'titanic'.

titanic = pd.read_csv('data/titanic.csv')

___

## Select  rows and columns using `loc` and `iloc`

![How to select specific rows and columns from a DataFrame](images/03_subset_columns_rows.svg)

We have seen how we can filter rows in the titanic `DataFrame`.

The code below uses boolean indexing to first filter rows where the age is greater than 35, and then selects specific columns 'Name' and 'Pclass' using double square brackets.

* Filter rows in the 'titanic' DataFrame where the age is greater than 35,
* Then select only the 'Name' and 'Pclass' columns for these filtered rows. 

In [21]:
titanic[titanic['Age'] > 35][['Name', 'Pclass']]

Unnamed: 0,Name,Pclass
1,"Cumings, Mrs. John Bradley (Florence Briggs Th...",1
6,"McCarthy, Mr. Timothy J",1
11,"Bonnell, Miss. Elizabeth",1
13,"Andersson, Mr. Anders Johan",3
15,"Hewlett, Mrs. (Mary D Kingcome)",2
...,...,...
865,"Bystrom, Mrs. (Karolina)",2
871,"Beckwith, Mrs. Richard Leonard (Sallie Monypeny)",1
873,"Vander Cruyssen, Mr. Victor",3
879,"Potter, Mrs. Thomas Jr (Lily Alexenia Wilson)",1


```{admonition} When using loc/iloc
:class: tip
The part *before the comma* is the *rows* you want

The part *after the comma* is the *columns* you want to select
```

We can perform the same filtering using label-based indexing: `loc`.

The following code provides a more explicit way of selecting rows and columns based on labels, where
* the first argument specifies the row selection condition ('Age' > 35),
* and the second argument specifies the column selection ('Name' and 'Pclass').

In [31]:
titanic.loc[titanic['Age'] > 35, ['Name', 'Pclass']]

Unnamed: 0,Name,Pclass
1,"Cumings, Mrs. John Bradley (Florence Briggs Th...",1
6,"McCarthy, Mr. Timothy J",1
11,"Bonnell, Miss. Elizabeth",1
13,"Andersson, Mr. Anders Johan",3
15,"Hewlett, Mrs. (Mary D Kingcome)",2
...,...,...
865,"Bystrom, Mrs. (Karolina)",2
871,"Beckwith, Mrs. Richard Leonard (Sallie Monypeny)",1
873,"Vander Cruyssen, Mr. Victor",3
879,"Potter, Mrs. Thomas Jr (Lily Alexenia Wilson)",1


We can also filter using integer position-based location: `iloc`.

The code below selects a subset of rows and columns from the 'titanic' `DataFrame` using integer positions.\
Index positions start from 0 and the end position is exclusive, so the selected rows correspond to the 10th to the 25th row in the `DataFrame`, and the selected columns correspond to the 3rd to the 6th column.

* Use `iloc` to select rows from index position 9 to 25 (exclusive) and columns from index position 2 to 6 (exclusive).

In [35]:
titanic.iloc[9:25, 2:6]

Unnamed: 0,Pclass,Name,Sex,Age
9,2,"Nasser, Mrs. Nicholas (Adele Achem)",female,14.0
10,3,"Sandstrom, Miss. Marguerite Rut",female,4.0
11,1,"Bonnell, Miss. Elizabeth",female,58.0
12,3,"Saundercock, Mr. William Henry",male,20.0
13,3,"Andersson, Mr. Anders Johan",male,39.0
14,3,"Vestrom, Miss. Hulda Amanda Adolfina",female,14.0
15,2,"Hewlett, Mrs. (Mary D Kingcome)",female,55.0
16,3,"Rice, Master. Eugene",male,2.0
17,2,"Williams, Mr. Charles Eugene",male,
18,3,"Vander Planke, Mrs. Julius (Emelia Maria Vande...",female,31.0


We can rewrite the statement above using `loc` instead.

**Note that with `loc` the slicing is inclusive on both ends!**

['Pclass', 'Name', 'Sex', 'Age'] explicitly lists the column names corresponding to the integer positions 2 to 5 (inclusive).

In [7]:
titanic.loc[9:24, ['Pclass', 'Name', 'Sex', 'Age']]

Unnamed: 0,Pclass,Name,Sex,Age
9,2,"Nasser, Mrs. Nicholas (Adele Achem)",female,14.0
10,3,"Sandstrom, Miss. Marguerite Rut",female,4.0
11,1,"Bonnell, Miss. Elizabeth",female,58.0
12,3,"Saundercock, Mr. William Henry",male,20.0
13,3,"Andersson, Mr. Anders Johan",male,39.0
14,3,"Vestrom, Miss. Hulda Amanda Adolfina",female,14.0
15,2,"Hewlett, Mrs. (Mary D Kingcome)",female,55.0
16,3,"Rice, Master. Eugene",male,2.0
17,2,"Williams, Mr. Charles Eugene",male,
18,3,"Vander Planke, Mrs. Julius (Emelia Maria Vande...",female,31.0


`9:24` specifies the rows from index label 9 to 24 (inclusive).


___

## Combining `loc` and `iloc`

We can use `iloc` to select
* rows from index position 9 to 25 (exclusive)
* and columns from index position 2 to 6 (exclusive)

Then further filter these rows using `loc` based on the condition where
* 'Age' is greater than 35

In [8]:
titanic.iloc[9:25, 2:6].loc[titanic['Age'] > 35]

Unnamed: 0,Pclass,Name,Sex,Age
11,1,"Bonnell, Miss. Elizabeth",female,58.0
13,3,"Andersson, Mr. Anders Johan",male,39.0
15,2,"Hewlett, Mrs. (Mary D Kingcome)",female,55.0


___

## More information about `loc` and `iloc`

* [A short video (you can turn up the speed)](https://www.youtube.com/watch?v=naRQyRZrXCE)

* [Official documentation loc](https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.loc.html)

* [Offical documentation iloc](https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.iloc.html)


* [Difference between `loc` and `iloc` described](https://www.geeksforgeeks.org/difference-between-loc-and-iloc-in-pandas-dataframe/)