## Slicing and subsetting with .loc and .iloc

In [28]:
import numpy as np
import pandas as pd
# Slicing is a technique for selecting consecutive elements from objects
dogs = pd.read_csv('dogs.csv')
dogs
# Here, the dogs dataset has been given a multi-level index
# of breed and color, then the index is sorted with sort index

Unnamed: 0,date_of_birth,name,breed,color,height_cm,weight_kg
0,2013-07-01,Bella,Labrador,Brown,56,25
1,2016-09-16,Charlie,Poodle,Black,43,23
2,2014-08-25,Lucy,Chow Chow,Brown,46,22
3,2011-12-11,Cooper,Schnauzer,Grey,49,17
4,2017-01-20,Max,Labrador,Black,59,29
5,2015-04-20,Stella,Chihuahua,Tan,18,2
6,2018-02-27,Bernie,St. Bernard,White,77,74


### Sort the index before you slice

In [5]:
dogs_srt = dogs.set_index(["breed", "color"]).sort_index()
dogs_srt


Unnamed: 0_level_0,Unnamed: 1_level_0,name,height_cm,weight_kg
breed,color,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
Chihuahua,Tan,Stella,18,2
Chow Chow,Brown,Lucy,46,22
Labrador,Black,Max,59,29
Labrador,Brown,Bella,56,25
Poodle,Black,Charlie,43,23
Schnauzer,Grey,Cooper,49,17
St. Bernard,White,Bernie,77,74


### Slicing the outer index level


In [7]:
# To slice rows at the outer level of an index, you call loc,
# passing the first and last values separated by a colon.
dogs_srt.loc["Chow Chow":"Poodle"]

Unnamed: 0_level_0,Unnamed: 1_level_0,name,height_cm,weight_kg
breed,color,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
Chow Chow,Brown,Lucy,46,22
Labrador,Black,Max,59,29
Labrador,Brown,Bella,56,25
Poodle,Black,Charlie,43,23


**Difference from slicing in lists are
Rather than specifying row numbers, you specify index values.
Notice that the final value is included.**

### Slicing the inner index level badly
same technique doesn't work on inner index levels.

In [9]:
# like
dogs_srt.loc["Tan":"Grey"] # returns an empty dataframe
# Important: Pandas does not throw out any error in here


Unnamed: 0_level_0,Unnamed: 1_level_0,name,height_cm,weight_kg
breed,color,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1


In [10]:
# The correct approach to slicing at inner index levels is to pass
# the first and last positions as tuples
dogs_srt.loc[("Labrador", "Brown"):("Schnauzer", "Grey")]

Unnamed: 0_level_0,Unnamed: 1_level_0,name,height_cm,weight_kg
breed,color,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
Labrador,Brown,Bella,56,25
Poodle,Black,Charlie,43,23
Schnauzer,Grey,Cooper,49,17


### Slicing columns

In [13]:
# Since DataFrames are two dimensional objects, we can also slice columns
dogs_srt.loc[:, "name":"height_cm"]
# [rows, columns]; here we are subsetting columns, but keeping all rows
# colon by itself means "keep everything"

Unnamed: 0_level_0,Unnamed: 1_level_0,name,height_cm
breed,color,Unnamed: 2_level_1,Unnamed: 3_level_1
Chihuahua,Tan,Stella,18
Chow Chow,Brown,Lucy,46
Labrador,Black,Max,59
Labrador,Brown,Bella,56
Poodle,Black,Charlie,43
Schnauzer,Grey,Cooper,49
St. Bernard,White,Bernie,77


### Slice Twice
We can slice on rows and columns at the same time: simply pass the appropriate slice to each argument

In [14]:
dogs_srt.loc[("Labrador", "Brown"):("Schnauzer", "Grey"), "name":"height_cm"]

Unnamed: 0_level_0,Unnamed: 1_level_0,name,height_cm
breed,color,Unnamed: 2_level_1,Unnamed: 3_level_1
Labrador,Brown,Bella,56
Poodle,Black,Charlie,43
Schnauzer,Grey,Cooper,49


### Dog days
An important use case of slicing is to subset DataFrames by a range of dates

In [29]:
dogs_dob = dogs.set_index("date_of_birth").sort_index()

In [30]:
dogs_dob
# Here we set the date_of_birth column as the index, and sort by this index

Unnamed: 0_level_0,name,breed,color,height_cm,weight_kg
date_of_birth,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1
2011-12-11,Cooper,Schnauzer,Grey,49,17
2013-07-01,Bella,Labrador,Brown,56,25
2014-08-25,Lucy,Chow Chow,Brown,46,22
2015-04-20,Stella,Chihuahua,Tan,18,2
2016-09-16,Charlie,Poodle,Black,43,23
2017-01-20,Max,Labrador,Black,59,29
2018-02-27,Bernie,St. Bernard,White,77,74


### Slicing by date
slice dates with the same syntax as other types
The first and last dates are passed as strings

In [31]:
dogs_dob.loc["2014-08-25":"2016-09-16"]

Unnamed: 0_level_0,name,breed,color,height_cm,weight_kg
date_of_birth,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1
2014-08-25,Lucy,Chow Chow,Brown,46,22
2015-04-20,Stella,Chihuahua,Tan,18,2
2016-09-16,Charlie,Poodle,Black,43,23


### Slicing by partial dates

In [32]:
dogs_dob.loc["2014":"2016"] #start of 2014 and start of 2016 or end of 2015

Unnamed: 0_level_0,name,breed,color,height_cm,weight_kg
date_of_birth,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1
2014-08-25,Lucy,Chow Chow,Brown,46,22
2015-04-20,Stella,Chihuahua,Tan,18,2


### Subsetting by row/column number
We can also slice DataFrames by row or column number using the iloc method

In [35]:
dogs.iloc[2:5, 1:4]
# This uses a similar syntax to slicing lists, except that there are
# two arguments: one for rows and one for columns
# Like list slicing but unlike loc, the final values aren't included
# in the slice

Unnamed: 0,name,breed,color
2,Lucy,Chow Chow,Brown
3,Cooper,Schnauzer,Grey
4,Max,Labrador,Black


### Slicing index values

Slicing lets you select consecutive elements of an object using first:last syntax. DataFrames can be sliced by index values, or by row/column number; we'll start with the first case. This involves slicing inside the .loc[] method.

Compared to slicing lists, there are a few things to remember.

    You can only slice an index if the index is sorted (using .sort_index()).
    To slice at the outer level, first and last can be strings.
    To slice at inner levels, first and last should be tuples.
    If you pass a single slice to .loc[], it will slice the rows.
