# __Slicing and Indexig DataFrames__

# Outline
- [1 Explicit Indexes](#exp-ind)
- [&nbsp;&nbsp; 1.1 Setting and removing indexes](#set-rm-ind)
- [&nbsp;&nbsp; 1.2 Subsetting with .loc[]](#sub-loc)
- [&nbsp;&nbsp; 1.3 Setting multi-level indexes](#sub-multi-lvl-ind)
- [&nbsp;&nbsp; 1.4 Sorting by index values](#sortby-ind-val)
- [2 Slicing and subsetting with .loc and .iloc](#slice-sub-loc-iloc)
- [&nbsp;&nbsp; 2.1 Slicing index values](#slice-ind-vals)
- [&nbsp;&nbsp; 2.2 Slicing in both directions](#slice-directions)
- [&nbsp;&nbsp; 2.3 Slicing time series](#slice-time)
- [&nbsp;&nbsp; 2.4 Subsetting by row/column number](#sub-row-col)
- [3 Working with pivot tables](#pvt-tbl)
- [&nbsp;&nbsp; 3.1 Pivot temperatures by city and year](#pvt-temp)
- [&nbsp;&nbsp; 3.2 Subsetting pivot tables](#sub-pvt-tbl)
- [&nbsp;&nbsp; 3.3 Calculating on a pivot table](#calc-pvt-tbl)

<a id="exp-ind"></a>
# 1 Explicit Indexes
<a id="set-rm_ind"></a>
## 1.1 Setting and removing indexes
pandas allows you to designate columns as an index. This enables cleaner code when taking subsets (as well as providing more efficient lookup under some circumstances).

In this chapter, you'll be exploring temperatures, a DataFrame of average temperatures in cities around the world.

In [None]:
import pandas as pd

temperatures = pd.read_csv("./../../data/temperatures.csv", index_col=0)

Set the index of temperatures to "city", assigning to temperatures_ind.

In [None]:
temperatures_ind = temperatures.set_index("city")
temperatures_ind

Look at temperatures_ind. How is it different from temperatures?

Reset the index of temperatures_ind, keeping its contents.

In [None]:
temperatures_ind.reset_index()

Reset the index of temperatures_ind, dropping its contents.

In [None]:
temperatures_ind.reset_index(drop=True)

<a id="sub-loc"></a>
## 1.2 Subsetting with .loc[]
The killer feature for indexes is .loc[]: a subsetting method that accepts index values. When you pass it a single argument, it will take a subset of rows.

The code for subsetting using .loc[] can be easier to read than standard square bracket subsetting, which can make your code less burdensome to maintain.

Create a list called cities that contains "Moscow" and "Saint Petersburg".

In [None]:
cities = ["Moscow", "Saint Petersburg"]

Use [] subsetting to filter temperatures for rows where the city column takes a value in the cities list.

In [None]:
temperatures[temperatures["city"].isin(cities)]

Use .loc[] subsetting to filter temperatures_ind for rows where the city is in the cities list.

In [None]:
temperatures_ind.loc[cities]

<a id="sub-multi-lvl-ind"></a>
## 1.3 Setting multi-level indexes
Indexes can also be made out of multiple columns, forming a multi-level index (sometimes called a hierarchical index). There is a trade-off to using these.

The benefit is that multi-level indexes make it more natural to reason about nested categorical variables. For example, in a clinical trial, you might have control and treatment groups. Then each test subject belongs to one or another group, and we can say that a test subject is nested inside the treatment group. Similarly, in the temperature dataset, the city is located in the country, so we can say a city is nested inside the country.

The main downside is that the code for manipulating indexes is different from the code for manipulating columns, so you have to learn two syntaxes and keep track of how your data is represented.

Set the index of temperatures to the "country" and "city" columns, and assign this to temperatures_ind.

Specify two country/city pairs to keep: "Brazil"/"Rio De Janeiro" and "Pakistan"/"Lahore", assigning to rows_to_keep.

Print and subset temperatures_ind for rows_to_keep using .loc[].

<a id="sortby-ind-val"></a>
## 1.4 Sorting by index values
Previously, you changed the order of the rows in a DataFrame by calling .sort_values(). It's also useful to be able to sort by elements in the index. For this, you need to use .sort_index().

Sort temperatures_ind by the index values.

Sort temperatures_ind by the index values at the "city" level.

Sort temperatures_ind by ascending country then descending city.