## Indexing with Pandas

### Import Modules

In [6]:
%matplotlib inline
import matplotlib.pyplot as plt
import pandas as pd
import numpy as np

### Load a CSV in Pandas

We are going to use the `read_csv` function to read from the Titanic training dataset. This function sotres the content of the `titanic-train.csv` dataset on to a variable call `df`. We call it `df` because the type of this variables is `DataFrame` as you can see. As you can see is a Pandas core frame `DataFrame`.

In [25]:
df = pd.read_csv('../data/titanic-train.csv')
df.head()

Unnamed: 0,PassengerId,Survived,Pclass,Name,Sex,Age,SibSp,Parch,Ticket,Fare,Cabin,Embarked
0,1,0,3,"Braund, Mr. Owen Harris",male,22.0,1,0,A/5 21171,7.25,,S
1,2,1,1,"Cumings, Mrs. John Bradley (Florence Briggs Th...",female,38.0,1,0,PC 17599,71.2833,C85,C
2,3,1,3,"Heikkinen, Miss. Laina",female,26.0,0,0,STON/O2. 3101282,7.925,,S
3,4,1,1,"Futrelle, Mrs. Jacques Heath (Lily May Peel)",female,35.0,1,0,113803,53.1,C123,S
4,5,0,3,"Allen, Mr. William Henry",male,35.0,0,0,373450,8.05,,S


## Selecting columns

### Select a column

In [10]:
df['Pclass'].head()

0    3
1    1
2    3
3    1
4    3
Name: Pclass, dtype: int64

### Select multiple columns

In [11]:
df[['Pclass', 'Sex']].head()

Unnamed: 0,Pclass,Sex
0,3,male
1,1,female
2,3,female
3,1,female
4,3,male


### Select columnbs by column location

In [21]:
# Select the first 2 columns


df.iloc[:,:2].head()

Unnamed: 0,PassengerId,Survived
0,1,0
1,2,1
2,3,1
3,4,1
4,5,0


## Selecting rows

### Select rows by row location

Pandas allows us to index our data in different ways. For example, we can retrive a record by its ordinal position. So, if we want the fourth record, remember we start counting from zero, we use `iloc` that stands for **'integer location'** of three. This will give us all the information about the fourth passager in the table.

In [12]:
df.iloc[3]

PassengerId                                               4
Survived                                                  1
Pclass                                                    1
Name           Futrelle, Mrs. Jacques Heath (Lily May Peel)
Sex                                                  female
Age                                                      35
SibSp                                                     1
Parch                                                     0
Ticket                                               113803
Fare                                                   53.1
Cabin                                                  C123
Embarked                                                  S
Name: 3, dtype: object

In [14]:
# Select every row up to 3
df.iloc[:2]

Unnamed: 0,PassengerId,Survived,Pclass,Name,Sex,Age,SibSp,Parch,Ticket,Fare,Cabin,Embarked
0,1,0,3,"Braund, Mr. Owen Harris",male,22.0,1,0,A/5 21171,7.25,,S
1,2,1,1,"Cumings, Mrs. John Bradley (Florence Briggs Th...",female,38.0,1,0,PC 17599,71.2833,C85,C


In [15]:
# Select the second and third row
df.iloc[1:2]

Unnamed: 0,PassengerId,Survived,Pclass,Name,Sex,Age,SibSp,Parch,Ticket,Fare,Cabin,Embarked
1,2,1,1,"Cumings, Mrs. John Bradley (Florence Briggs Th...",female,38.0,1,0,PC 17599,71.2833,C85,C


In [17]:
# Select every row after the third row
df.iloc[2:].head()

Unnamed: 0,PassengerId,Survived,Pclass,Name,Sex,Age,SibSp,Parch,Ticket,Fare,Cabin,Embarked
2,3,1,3,"Heikkinen, Miss. Laina",female,26.0,0,0,STON/O2. 3101282,7.925,,S
3,4,1,1,"Futrelle, Mrs. Jacques Heath (Lily May Peel)",female,35.0,1,0,113803,53.1,C123,S
4,5,0,3,"Allen, Mr. William Henry",male,35.0,0,0,373450,8.05,,S
5,6,0,3,"Moran, Mr. James",male,,0,0,330877,8.4583,,Q
6,7,0,1,"McCarthy, Mr. Timothy J",male,54.0,0,0,17463,51.8625,E46,S


### Select rows by index label

In [13]:
# Select all rows with the index label "0"
df.loc[:0]

Unnamed: 0,PassengerId,Survived,Pclass,Name,Sex,Age,SibSp,Parch,Ticket,Fare,Cabin,Embarked
0,1,0,3,"Braund, Mr. Owen Harris",male,22.0,1,0,A/5 21171,7.25,,S


### Selecting rows combining labels and position using `.ix`

`.ix` is the combination of both `.loc` and `.iloc`. Integers are first considered labels, but if not found, falls back on positional indexing.

In [32]:
# Select the rows called 0 and 5
df.ix[[0, 5]]

Unnamed: 0,PassengerId,Survived,Pclass,Name,Sex,Age,SibSp,Parch,Ticket,Fare,Cabin,Embarked
0,1,0,3,"Braund, Mr. Owen Harris",male,22.0,1,0,A/5 21171,7.25,,S
5,6,0,3,"Moran, Mr. James",male,,0,0,330877,8.4583,,Q


In [34]:
# Select the 'Sex' column in the row named 5
df.ix[5, 'Sex']

'male'

In [35]:
# Select the 'Sex' column in the row named 5
df.ix[5, 4]

'male'

Note that the index could be a string. In this case something like the following is corect:

In [38]:
# Select the third cell in the row named Arizona
#df.ix['Arizona', 'deaths']

### Selecting columns and rows

Also, we can retrive the location by using the `loc` identifier with the row (or the rows) and the columns by name. In this case we have the column `Ticket` and we retreive the first five rows of just the `Ticket` column.

In [22]:
df.loc[0:4,'Ticket']

0           A/5 21171
1            PC 17599
2    STON/O2. 3101282
3              113803
4              373450
Name: Ticket, dtype: object

Notice that we can get the same think by asking by the `head` of the `Ticket` column.

In [23]:
df['Ticket'].head()

0           A/5 21171
1            PC 17599
2    STON/O2. 3101282
3              113803
4              373450
Name: Ticket, dtype: object