# Selecting Subsets of Data from DataFrames with `iloc`

## Getting started with `iloc`
The `iloc` indexer is very similar to `loc` but only uses **integer location** to make its selections. The word `iloc` itself stands for integer location so that should help remind you what it does.

### Simultaneous row and column subset selection with `iloc`
Selection with iloc will look like the following with a comma separating the row and column selections.

```
df.iloc[rows, cols]
```

Let's read in some sample data and then begin making selections with integer location.

In [1]:
import pandas as pd
df = pd.read_csv('../data/sample_data.csv', index_col=0)
df

Unnamed: 0,state,color,food,age,height,score
Jane,NY,blue,Steak,30,165,4.6
Niko,TX,green,Lamb,2,70,8.3
Aaron,FL,red,Mango,12,120,9.0
Penelope,AL,white,Apple,4,80,3.3
Dean,AK,gray,Cheese,32,180,1.8
Christina,TX,black,Melon,33,172,9.5
Cornelia,TX,red,Beans,69,150,2.2


### Use a list for both rows and columns

Let's select rows with integer location 2 and 4 along with the first and last columns. It is possible to use negative integers in the same manner as Python lists.

In [2]:
rows = [2, 4]
cols = [0, -1]
df.iloc[rows, cols]

Unnamed: 0,state,score
Aaron,FL,9.0
Dean,AK,1.8


### The possible types of selections for `iloc`
In the above example, we used a list of integers for both the row and column selection. You are not limited to just lists. All of the following are valid objects available for both row and column selections with `iloc`.  The `iloc` indexer, unlike `loc`, is unable to do boolean selection. 

* A single integer
* A list of integers
* A slice with integers

### Slice the rows and use a list for the columns
Let's use slice notation to select rows with integer location 2 and 3 and a list to select columns with integer location 4 and 2. Notice that the stop integer location is **excluded** with `iloc`, which is exactly how slicing works with Python lists, tuples, and strings. Slicing with `loc` is **inclusive** of the stop label.

In [3]:
cols = [4, 2]
df.iloc[2:4, cols]

Unnamed: 0,height,food
Aaron,120,Mango
Penelope,80,Apple


### Use a list for the rows and a slice for the columns

In this example, we use a list for the row selection and slice notation for the columns.

In [4]:
rows = [5, 2, 4]
df.iloc[rows, 3:]

Unnamed: 0,age,height,score
Christina,33,172,9.5
Aaron,12,120,9.0
Dean,32,180,1.8


### Selecting some rows and all of the columns
If you leave the column selection empty, then all of the columns will be selected.

In [5]:
rows = [3, 2]
df.iloc[rows]

Unnamed: 0,state,color,food,age,height,score
Penelope,AL,white,Apple,4,80,3.3
Aaron,FL,red,Mango,12,120,9.0


It is possible to rewrite the above with both row and column selections by using a colon to represent a slice of all of the columns. Just as with `loc`, this can be instructive and reinforce the concept that the `iloc` does simultaneous row and column selection with the row selection first.

In [6]:
df.iloc[rows, :]

Unnamed: 0,state,color,food,age,height,score
Penelope,AL,white,Apple,4,80,3.3
Aaron,FL,red,Mango,12,120,9.0


### Select all of the rows and some of the columns
Let's use a single colon to create slice notation to select all of the rows and a list to select two columns.

In [7]:
cols = [1, 5]
df.iloc[:, cols]

Unnamed: 0,color,score
Jane,blue,4.6
Niko,green,8.3
Aaron,red,9.0
Penelope,white,3.3
Dean,gray,1.8
Christina,black,9.5
Cornelia,red,2.2


### Cannot do this with *just the brackets*
Just the brackets does select columns but it only understands **labels** and not **integer location**. The following produces an error as pandas is looking for column names that are the integers `1` and `5`.

In [8]:
cols = [1, 5]
df[cols]

KeyError: "None of [Int64Index([1, 5], dtype='int64')] are in the [columns]"

### Integer column names
pandas allows integers as column names and in fact you can have a mix of strings and integers (along with other types). So, if a column name was the integer 1, you would select it by writing `df[1]`. I would avoid using integer column names if possible as they do not provide descriptive names.

### Select some rows and a single column
In this example, a list of integers is used for the rows along with a single integer for the columns. pandas returns a Series when a single integer is used to select either a row or column.

In [9]:
rows = [2, 3, 5]
cols = 4
df.iloc[rows, cols]

Aaron        120
Penelope      80
Christina    172
Name: height, dtype: int64

### Select a single row or column as a DataFrame and NOT a Series
You can select a single row (or column) and return a DataFrame and not a Series if you use a list to make the selection. Let's replicate the selection from the previous example, but use a one-item list for the column selection.

In [10]:
rows = [2, 3, 5]
cols = [4]
df.iloc[rows, cols]

Unnamed: 0,height
Aaron,120
Penelope,80
Christina,172


### Select a single row as a Series with `iloc`
By passing a single integer to `iloc`, it will select one row as a Series.

In [11]:
df.iloc[2]

state        FL
color       red
food      Mango
age          12
height      120
score       9.0
Name: Aaron, dtype: object

## Summary of `iloc`
The `iloc` indexer is analogous to `loc` but only uses **integer location** for selection. The official pandas documentation refers to this as selection by **position**.

* Uses only integer location
* Selects rows and columns simultaneously
* Selection can be a single integer, a list of integers, or a slice of integers
* A comma separates row and column selections

## Exericses
* Use the movie dataset for the following exercises

### Exercise 1
<span  style="color:green; font-size:16px">Select the rows with integer location 10, 5, and 1</span>

In [12]:
movie = pd.read_csv('../data/movie.csv', index_col='title')
movie.iloc[[10, 5, 1]]

Unnamed: 0_level_0,year,color,content_rating,duration,director_name,director_fb,actor1,actor1_fb,actor2,actor2_fb,...,actor3_fb,gross,genres,num_reviews,num_voted_users,plot_keywords,language,country,budget,imdb_score
title,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
Batman v Superman: Dawn of Justice,2016.0,Color,PG-13,183.0,Zack Snyder,0.0,Henry Cavill,15000.0,Lauren Cohan,4000.0,...,2000.0,330249062.0,Action|Adventure|Sci-Fi,673.0,371639,based on comic book|batman|sequel to a reboot|...,English,USA,250000000.0,6.9
John Carter,2012.0,Color,PG-13,132.0,Andrew Stanton,475.0,Daryl Sabara,640.0,Samantha Morton,632.0,...,530.0,73058679.0,Action|Adventure|Sci-Fi,462.0,212204,alien|american civil war|male nipple|mars|prin...,English,USA,263700000.0,6.6
Pirates of the Caribbean: At World's End,2007.0,Color,PG-13,169.0,Gore Verbinski,563.0,Johnny Depp,40000.0,Orlando Bloom,5000.0,...,1000.0,309404152.0,Action|Adventure|Fantasy,302.0,471220,goddess|marriage ceremony|marriage proposal|pi...,English,USA,300000000.0,7.1


### Exericse 2
<span  style="color:green; font-size:16px">Select the columns with integer location 10, 5, and 1</span>

In [13]:
movie.iloc[:, [10, 5, 1]].head(3)

Unnamed: 0_level_0,actor3,director_fb,color
title,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
Avatar,Wes Studi,0.0,Color
Pirates of the Caribbean: At World's End,Jack Davenport,563.0,Color
Spectre,Stephanie Sigman,0.0,Color


### Exercise 3
<span  style="color:green; font-size:16px">Select rows with integer location 100 to 104 along with the column integer location 5.</span>

In [14]:
movie.iloc[100:105, 5]

title
The Fast and the Furious                   357.0
The Curious Case of Benjamin Button      21000.0
X-Men: First Class                         905.0
The Hunger Games: Mockingjay - Part 2      508.0
The Sorcerer's Apprentice                  226.0
Name: director_fb, dtype: float64