# 6. Selecting Subsets of Data from DataFrames with `.iloc`

# Getting started with `.iloc`
The `.iloc` indexer is very similar to `.loc` but only uses **integer locations** to make its selections. The word `iloc` itself stands for integer location so that should help remind you what it does.

## Simultaneous row and column subset selection with `.iloc`
Selection with .iloc will look like the following:

```
df.iloc[rows, cols]
```

In [None]:
import pandas as pd
df = pd.read_csv('../data/sample_data.csv', index_col=0)
df

### Use a list for both rows and columns

In [None]:
rows = [2, 4]
cols = [0, -1]

df.iloc[rows, cols]

## The possible types of selections for `.iloc`
Row or column selections can be any of the following:

* A single integer
* A list of integers
* A slice with integers

### Slice the rows and use a list for the columns

In [None]:
cols = [4, 2]
df.iloc[::2, cols]

### Use a list for the rows and a slice for the columns

In [None]:
rows = [5, 2, 4]
df.iloc[rows, 3:]

## Selecting some rows and all of the columns
If you leave the column selection empty, then all of the columns will be selected.

In [None]:
rows = [3, 2]
df.iloc[rows]

## Select all of the rows and some of the columns

In [None]:
cols = [1, 5]
df.iloc[:, cols]

## Cannot do this with *just the brackets*
Just the brackets does select columns but it only understands **labels** and not **integer location**.

In [None]:
cols = [1, 5]
df[cols]

## Select some rows and a single column
Note that a Series is returned whenever a single row or column is selected.

In [None]:
rows = [2, 3, 5]
cols = 4

df.iloc[rows, cols]

## A trick to select a single row row or column as a DataFrame and NOT a Series
You can select a single row (or column) and return a DataFrame and not a Series if you use a list to make the selection.

In [None]:
rows = [2, 3, 5]
cols = [4]

df.iloc[rows, cols]

## Select a single row as a Series with `.iloc`
By passing a single integer to `.iloc`, it will select one row as a Series:

In [None]:
df.iloc[2]

# Summary of `.iloc`
Is the exact same as `.loc` but uses **integer location** only for selection. The official Pandas documentation refers to this as selection by **position**.

# Exericses
* Use the movie dataset for the following exercises

### Problem 1
<span  style="color:green; font-size:16px">Select the rows with integer location 10, 5, and 1</span>

In [10]:
import pandas as pd
movie = pd.read_csv('../data/movie.csv')
rows = [10,5,1]
movie.iloc[rows]

Unnamed: 0,title,year,color,content_rating,duration,director_name,director_fb,actor1,actor1_fb,actor2,...,actor3_fb,gross,genres,num_reviews,num_voted_users,plot_keywords,language,country,budget,imdb_score
10,Batman v Superman: Dawn of Justice,2016.0,Color,PG-13,183.0,Zack Snyder,0.0,Henry Cavill,15000.0,Lauren Cohan,...,2000.0,330249062.0,Action|Adventure|Sci-Fi,673.0,371639,based on comic book|batman|sequel to a reboot|...,English,USA,250000000.0,6.9
5,John Carter,2012.0,Color,PG-13,132.0,Andrew Stanton,475.0,Daryl Sabara,640.0,Samantha Morton,...,530.0,73058679.0,Action|Adventure|Sci-Fi,462.0,212204,alien|american civil war|male nipple|mars|prin...,English,USA,263700000.0,6.6
1,Pirates of the Caribbean: At World's End,2007.0,Color,PG-13,169.0,Gore Verbinski,563.0,Johnny Depp,40000.0,Orlando Bloom,...,1000.0,309404152.0,Action|Adventure|Fantasy,302.0,471220,goddess|marriage ceremony|marriage proposal|pi...,English,USA,300000000.0,7.1


### Problem 2
<span  style="color:green; font-size:16px">Select the columns with integer location 10, 5, and 1</span>

In [11]:
cols = rows

In [13]:
movie.iloc[:, cols].head(3)

Unnamed: 0,actor2_fb,director_name,year
0,936.0,James Cameron,2009.0
1,5000.0,Gore Verbinski,2007.0
2,393.0,Sam Mendes,2015.0


### Problem 3
<span  style="color:green; font-size:16px">Select rows with integer location 100 to 104 along with the column integer location 5.</span>

In [18]:
rows = [100,104,1]
movie.iloc[100:104,5]

100           Rob Cohen
101       David Fincher
102      Matthew Vaughn
103    Francis Lawrence
Name: director_name, dtype: object

# Continue making selections with `.iloc` below

## DURING CLASS .iloc

* Select by integer location instead of labels


In [1]:
import pandas as pd
df = pd.read_csv('../data/sample_data.csv', index_col=0)
df

Unnamed: 0,state,color,food,age,height,score
Jane,NY,blue,Steak,30,165,4.6
Niko,TX,green,Lamb,2,70,8.3
Aaron,FL,red,Mango,12,120,9.0
Penelope,AL,white,Apple,4,80,3.3
Dean,AK,gray,Cheese,32,180,1.8
Christina,TX,black,Melon,33,172,9.5
Cornelia,TX,red,Beans,69,150,2.2


In [3]:
rows = [3,1,4]
cols = [-2,0]

In [4]:
df.iloc[rows,cols]

Unnamed: 0,height,state
Penelope,80,AL
Niko,70,TX
Dean,180,AK


In [5]:
df.iloc[2,3]

12

In [6]:
df.iloc[2,3:]

age        12
height    120
score       9
Name: Aaron, dtype: object

In [7]:
df.iloc[:2,3:]

Unnamed: 0,age,height,score
Jane,30,165,4.6
Niko,2,70,8.3


In [8]:
df.iloc[:,3:]

Unnamed: 0,age,height,score
Jane,30,165,4.6
Niko,2,70,8.3
Aaron,12,120,9.0
Penelope,4,80,3.3
Dean,32,180,1.8
Christina,33,172,9.5
Cornelia,69,150,2.2
