# What are `loc` and `iloc`?

`loc` and `iloc` in pandas are technically properties that return a specialized indexer object. These indexer objects allow for label-based (`loc`) and integer-based (`iloc`) indexing. Although they behave similarly to methods in that you use them to perform actions (i.e., selecting or slicing data), they are properties because they return an indexing object rather than directly executing a function.

When you access `loc` or `iloc` on a DataFrame, what you're getting is not a simple value but an object that has its own methods and behaviors, specifically designed to handle indexing operations. This design allows for the concise and powerful data selection syntax that pandas is known for.

## When to use `loc` or `iloc`

1. **Positional Selection**: When the selection is purely positional (selecting the first five rows, or columns 2 to 4), `iloc` is the most straightforward tool for the job.

2. **Inclusive Slicing:** `loc` allows for inclusive label-based slicing, simplifying the specification of row and column ranges.

3. **Label-based Complex Indexing**: For complex indexing scenarios where you need to select rows and columns based on their labels, `loc` is indispensable. It allows for intuitive and concise syntax that boolean masks cannot provide alone.

4. **Efficiency and Readability**: In cases where using a boolean mask leads to verbose or inefficient code, `loc` or `iloc` might offer a more elegant and performant solution. 

In summary, while boolean masks are powerful for filtering rows based on conditions, `loc` and `iloc` are essential for a wide range of other data selection tasks, particularly those involving specific column selection, positional indexing, and complex indexing scenarios that go beyond what boolean masks can achieve on their own.

# Import Pandas

In [None]:
# This line imports the pandas library and aliases it as 'pd'.
# Aliasing pandas as 'pd' is a widely adopted convention that simplifies the syntax for accessing its functionalities.
# After this statement, you can use 'pd' to access all the functionalities provided by the pandas library.

import pandas as pd

# Creating a `DataFrame` from a csv file

In [None]:
# Load the Titanic dataset from a CSV file into a DataFrame named 'titanic'.
# The 'pd.read_csv()' function is used to read the data from the file 'data/titanic.csv'.
# The file is located in the 'data' directory, relative to the current working directory.
# The resulting DataFrame 'titanic' contains the dataset, ready for analysis and manipulation.

titanic = pd.read_csv('data/titanic.csv')

In [None]:
# Display the DataFrame 'titanic'.
# Note, even though we only see the first and last five rows, we actually read the whole DataFrame into the kernel's memory.
# The pressure on memory usage can be alleviated by using the 'head()' method described below.
# However, this will only be an issue with very large datasets, so don't worry too much about it for now.
# You can find out how much memory a DataFrame uses by using the 'memory_usage()' method:
# titanic.memory_usage(deep=True).sum()

titanic

# How to select specific rows and columns from a `DataFrame` using `loc` and `iloc`

![How to select specific rows and columns from a DataFrame](images/03_subset_columns_rows.svg)

### We have seen how we can filter rows in the titanic `DataFrame`

In [None]:
# Filter rows in the 'titanic' DataFrame where the age is greater than 35,
# then select only the 'Name' and 'Pclass' columns for these filtered rows.
# This command uses boolean indexing to first filter rows where the age is greater than 35,
# and then selects specific columns 'Name' and 'Pclass' using double square brackets.

titanic[titanic['Age'] > 35][['Name', 'Pclass']]

## When using `loc`/`iloc`
* ### the part before the comma is the *rows* you want
* ### and the part after the comma is the *columns* you want to select

### We can perform the same filtering using label-based indexing: `loc`

In [None]:
# Use the 'loc' indexer to select rows where the age is greater than 35,
# and then select only the 'Name' and 'Pclass' columns for these rows.
# This command provides a more explicit way of selecting rows and columns based on labels,
# where the first argument specifies the row selection condition ('Age' > 35),
# and the second argument specifies the column selection ('Name' and 'Pclass').

titanic.loc[titanic['Age'] > 35, ['Name', 'Pclass']]

### We can also filter using integer position-based location: `iloc`

In [None]:
# Use iloc to select rows from index position 9 to 25 (exclusive) and columns from index position 2 to 6 (exclusive).
# This command selects a subset of rows and columns from the 'titanic' DataFrame using integer positions.
# Index positions start from 0, so the selected rows correspond to rows 10 to 25 in the DataFrame,
# and the selected columns correspond to columns 3 to 6.

titanic.iloc[9:25, 2:6]

### We can rewrite the statement above using `loc` instead.

In [None]:
# '9:24' specifies the rows from index label 9 to 24 (inclusive).
# Note that with 'loc', the slicing is inclusive on both ends!
# ['Pclass', 'Name', 'Sex', 'Age'] explicitly lists the column names corresponding to the integer positions 2 to 5 (inclusive).

titanic.loc[9:24, ['Pclass', 'Name', 'Sex', 'Age']]

## `iloc` and `loc` can be combined

In [None]:
# Use iloc to select rows from index position 9 to 25 (exclusive) and columns from index position 2 to 6 (exclusive),
# then further filter these rows using loc based on the condition where 'Age' is greater than 35.

titanic.iloc[9:25, 2:6].loc[titanic['Age'] > 35]

# More information about `loc` and `iloc`

* A short video (you can turn up the speed): \
  [https://www.youtube.com/watch?v=naRQyRZrXCE](https://www.youtube.com/watch?v=naRQyRZrXCE)

* Official documentation: \
  [https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.loc.html](https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.loc.html) \
  And: \
  [https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.iloc.html](https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.iloc.html)

* Difference between loc and iloc described: \
  [https://www.geeksforgeeks.org/difference-between-loc-and-iloc-in-pandas-dataframe/](https://www.geeksforgeeks.org/difference-between-loc-and-iloc-in-pandas-dataframe/)