# `DataFrame` Indexing and Slicing

Access specific rows of a `DataFrame` by their location is referred to as *indexing*.  

If you are accessing a sequence of contiguous rows, this action is sometimes called *slicing*.

The purpose of this tutorial is to survey various methods for indexing and slicing in `pandas`.

### Importing Packages

Let's begin by importing the packages that we will need.

In [None]:
##> import numpy as np
##> import pandas as pd
##> import pandas_datareader as pdr




### Reading-In Data

Next, lets grab some data from Yahoo finance.  In particular, we'll grab `SPY` price data from July 2021.

In [None]:
##> df_spy = pdr.get_data_yahoo('SPY', start='2021-06-30', end='2021-07-31')
##> df_spy = df_spy.round(2)
##> df_spy.head()




The following code resets the index so that `date` is a regular column; it also puts the column names into snake-case.

In [None]:
##> df_spy.reset_index(inplace=True)
##> df_spy.columns = df_spy.columns.str.lower().str.replace(' ', '_')
##> df_spy.head()




It is often useful to look at the data type of each of the columns of a new data set.  We can do so with the `DataFrame.dtypes` attribute.

In [None]:
##> df_spy.dtypes



### Row Slicing

The simplest way to slice a `DataFrame` is to use square brackets: `[]`.  The syntax `df[i:j]` will generate a `DataFrame` who's first row is the `i`th row of `df` and who's last row is the `(j-1)`th row of `df`.   Let's demonstrate this with a some examples:

Starting from the 0th row, and ending with the 0th row:

In [None]:
##> df_spy[0:1]



Starting with the 3rd row, and ending with the 6th row:

In [None]:
##> df_spy[3:7]



**Code Challenge:** Retrieve the 15th, 16th, and 17th rows of `df_spy`.

Using the syntax `df[:n]` automatically starts the indexing at `0`.  For example, the following code retrieves all of `df_spy` (notice that `len(df_spy)` gives the number of rows of `df_spy`):

In [None]:
##> df_spy[:len(df_spy)]



**Code Challenge:** Retrieve the first five rows of `df_spy`.

There are a couple of row slicing tricks that involve negative numbers that are worth mentioning.

The syntax `df[-n:]` retrieves the last `n` rows of `df`.  The following code retrieves the last five rows of `df_spy`.

In [None]:
##> df_spy[-5:]



The syntax `df[:-n]` retrieves all but the last `n` rows of `df`.  The following code retrieves all but the last 10 rows of `df_spy`:

In [None]:
##> df_spy[:-10]



**Code Challenge:** Retrieve the first row of `df_spy` with negative indexing.

**Code Challenge:** Use simple slicing to select the last three rows of a `df_spy` without explicitly using row numbers. 

### `DataFrame` Indexes

Under the hood, a `DataFrame` has several `indexes`:

`columns` - the set of column names is an (explicit) index.

`row` - whenever a `DataFrame` is created, there is an explicit row index that is created.  If one isn't specified, then a sequence of non-negative integers is used.

`implicit` - each row has an implicit row-number, and each column has an implicit column-number.

Let's take a look at the `columns` index of `df_spy`:

In [None]:
##> df_spy.columns



In [None]:
##> type(df_spy.columns)



Next, let's take a look at the explicit row `index` attribute of `df_spy`:

In [None]:
##> df_spy.index



In [None]:
##> type(df_spy.index)



Since we reset the index for `df_spy`, a `RangeIndex` object is used for the explicit row `index`.  You can think of a `RangeIndex` object as a glorified set of consecutive integers.

For the most part, we won't be too concerned with `indexes`.  A lot of data analysis can be done without worrying about them.  However, it's good to be aware `indexes` exist becase they can come into play for more advanced topics, such as joining tables together; they also come up in Stack Overflow examples frequently.

For the purposes of this tutorial, our interest in `indexes` comes from how they are related to two built-in `DataFrame` *indexers*: `DataFrame.iloc` and `DataFrame.loc`.

### Indexing with `DataFrame.iloc`

The indexer `DataFrame.iloc` can be used to access rows and columns using their implicit row and column numbers.

Here is an example of `iloc` that retrieves the first two rows of `df_spy`:

In [None]:
##> df_spy.iloc[0:2,]



Notice, that because we didn't specify any column numbers, the code above retrieves all columns.

The following code grabs the first three row and the first three columns of `df_spy`:

In [None]:
##> df_spy.iloc[0:3, 0:3]



We can also supply `.iloc` with `lists` rather than ranges to specify custom sets of columns and rows:

In [None]:
##> lst_row = [0, 2] # 0th and 2nd row
##> lst_col = [0, 6] # date and adj_close columns
##> df_spy.iloc[lst_row, lst_col]

Using `lists` as a means of indexing is sometimes referred to as *fancy indexing*.

**Code Challenge** Use fancy indexing to grab the 14th, 0th, and 5th rows of `df_spy` - in that order.

### Indexing with `DataFrame.loc`

Rather than using the implicit row or column numbers, it is often more useful to access data by using the explicit row or column indices.

Let's use the `DataFrame.set_index()` method to set the `date` column as our new index.  The `dates` will be a more interesting explicit index.

In [None]:
##> df_spy.set_index('date', inplace = True)
##> df_spy.head()




To see the effect of the above code, we can have a look at the `index` of `df_spy`.

In [None]:
##> df_spy.index



And notice that `date` is no longer column of `df_spy`:

In [None]:
##> df_spy.columns



Now that we have successfully set the row `index` of `df_spy` to be the `date`, let's see how we can use this `index` to access the data via `.loc`.
        
Here is an example of how we can grab a slice of rows, associated with a date-range:

In [None]:
##> df_spy.loc['2021-07-23':'2021-07-31']



If we want to select only the `volume` and `adjusted` columns for these dates, we would type the following: 

In [None]:
##> df_spy.loc['2021-07-23':'2021-07-31', ['volume', 'adj_close']]



**Code Challenge:** Use `.loc` to grab the `date`, `volume`, and `close` columns from `df_spy`.

## Related Reading

*PDSH* - 2.7 - Fancy Indexing

*PDSH* - 3.2 - Data Indexing and Selection 