<a href="https://colab.research.google.com/github/ludawg44/jigsawlabs/blob/master/28Mar20_7_selecting_rows_iloc_loc.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Selecting Data

### Introduction

So far we have gotten the sense that we can select information from pandas almost like we would with a list, or with a dictionary.  In this lesson, we'll the different ways that we can select information.

### Using Integer Locations with iloc

Let's begin by loading up our data.

In [0]:
import pandas as pd
url = "https://raw.githubusercontent.com/fivethirtyeight/data/master/bechdel/movies.csv"
movies_df = pd.read_csv(url)

Now in pandas, we can use the `iloc` method to select data using the same techniques that we saw with numpy.  Let's still walk through some examples.

In [0]:
movies_df.iloc[0]

year                       2013
imdb                  tt1711425
title             21 &amp; Over
test                     notalk
clean_test               notalk
binary                     FAIL
budget                 13000000
domgross            2.56824e+07
intgross            4.21958e+07
code                   2013FAIL
budget_2013$           13000000
domgross_2013$      2.56824e+07
intgross_2013$      4.21958e+07
period code                   1
decade code                   1
Name: 0, dtype: object

So above we just selected the first row by using `iloc` and then the row index of 0.

> Using `iloc` means we are selecting with an integer based position.

Let's select more than one row.

In [0]:
movies_df.iloc[:2]

Unnamed: 0,year,imdb,title,test,clean_test,binary,budget,domgross,intgross,code,budget_2013$,domgross_2013$,intgross_2013$,period code,decade code
0,2013,tt1711425,21 &amp; Over,notalk,notalk,FAIL,13000000,25682380.0,42195766.0,2013FAIL,13000000,25682380.0,42195766.0,1.0,1.0
1,2012,tt1343727,Dredd 3D,ok-disagree,ok,PASS,45000000,13414714.0,40868994.0,2012PASS,45658735,13611086.0,41467257.0,1.0,1.0


We can also select rows by providing a list of indices.

In [0]:
movies_df.iloc[[1, 2, 4]]

Unnamed: 0,year,imdb,title,test,clean_test,binary,budget,domgross,intgross,code,budget_2013$,domgross_2013$,intgross_2013$,period code,decade code
1,2012,tt1343727,Dredd 3D,ok-disagree,ok,PASS,45000000,13414714.0,40868994.0,2012PASS,45658735,13611086.0,41467257.0,1.0,1.0
2,2013,tt2024544,12 Years a Slave,notalk-disagree,notalk,FAIL,20000000,53107035.0,158607035.0,2013FAIL,20000000,53107035.0,158607035.0,1.0,1.0
4,2013,tt0453562,42,men,men,FAIL,40000000,95020213.0,95020213.0,2013FAIL,40000000,95020213.0,95020213.0,1.0,1.0


Note, that this means we can also select every other row, if we prefer.

In [0]:
movies_df.iloc[range(0, 10, 2)]

Unnamed: 0,year,imdb,title,test,clean_test,binary,budget,domgross,intgross,code,budget_2013$,domgross_2013$,intgross_2013$,period code,decade code
0,2013,tt1711425,21 &amp; Over,notalk,notalk,FAIL,13000000,25682380.0,42195766.0,2013FAIL,13000000,25682380.0,42195766.0,1.0,1.0
2,2013,tt2024544,12 Years a Slave,notalk-disagree,notalk,FAIL,20000000,53107035.0,158607035.0,2013FAIL,20000000,53107035.0,158607035.0,1.0,1.0
4,2013,tt0453562,42,men,men,FAIL,40000000,95020213.0,95020213.0,2013FAIL,40000000,95020213.0,95020213.0,1.0,1.0
6,2013,tt1606378,A Good Day to Die Hard,notalk,notalk,FAIL,92000000,67349198.0,304249198.0,2013FAIL,92000000,67349198.0,304249198.0,1.0,1.0
8,2013,tt1814621,Admission,ok,ok,PASS,13000000,18007317.0,18007317.0,2013PASS,13000000,18007317.0,18007317.0,1.0,1.0


### Selecting Columns

Now beyond selecting rows, we can also use `iloc` to also select certain columns.

In [0]:
movies_df.iloc[:2, :3]

Unnamed: 0,year,imdb,title
0,2013,tt1711425,21 &amp; Over
1,2012,tt1343727,Dredd 3D


So columns is the second argument to our `iloc` function.  And it works just like selecting from a list of elements in Python.

So with iloc we think of our dataframe as a list of row lists, and we selected our elements based on their order.  To select the first row and first column, we simply use the indices:  

In [0]:
movies_df.iloc[0, 0]

2013

### Selecting with loc

Now we can also think of our dataframe as a set of key value pairs, like a dictionary.  And that is what the `loc` method is for.  Once again, here are the first couple of rows of our dataframe.

When we use `loc` we should use the names in the index to select the proper rows, and the names in the columns to select the proper columns.

> Let's just take another look at our `movies_df`.

In [0]:
movies_df[:2]

Unnamed: 0,year,imdb,title,test,clean_test,binary,budget,domgross,intgross,code,budget_2013$,domgross_2013$,intgross_2013$,period code,decade code
0,2013,tt1711425,21 &amp; Over,notalk,notalk,FAIL,13000000,25682380.0,42195766.0,2013FAIL,13000000,25682380.0,42195766.0,1.0,1.0
1,2012,tt1343727,Dredd 3D,ok-disagree,ok,PASS,45000000,13414714.0,40868994.0,2012PASS,45658735,13611086.0,41467257.0,1.0,1.0


Ok, so now let's try using loc, by referencing the labels in the index and the names of the columns.

In [0]:
movies_df.loc[2:4, 'title':'budget']

Unnamed: 0,title,test,clean_test,binary,budget
2,12 Years a Slave,notalk-disagree,notalk,FAIL,20000000
3,2 Guns,notalk,notalk,FAIL,61000000
4,42,men,men,FAIL,40000000


> That seemed to have worked well.

Notice that unlike slicing in `iloc`, or in numpy, that `loc` is inclusive.  It selects up to and including the matching rows and columns.

So for example with `iloc` we would have to perform the following.

In [0]:
movies_df.iloc[2:5, 2:7]

Unnamed: 0,title,test,clean_test,binary,budget
2,12 Years a Slave,notalk-disagree,notalk,FAIL,20000000
3,2 Guns,notalk,notalk,FAIL,61000000
4,42,men,men,FAIL,40000000


### Summary

In this lesson, we saw how to select data from our dataframe using both the `iloc` and `loc` methods.  The `iloc` method works in the same way that we select data in numpy.  

For loc, we must specify the labels of the index to select certain rows, and the column names to select columns.  Unlike in numpy or python, the `loc` method is inclusive.  That is, it selects up to and including the matching rows and columns.