# 2.4 Indexes

## Slicing Rows and Columns
Remember slicing NumPy arrays? NumPy slicing allowed us to select one or more columns and one or more rows at a time. Because Pandas is built on top of NumPy, we can slice a dataframe in the same way. You can select an individual column or a chunck of columns. You can also select an individual row or a chunk of rows.

In a dataframe, you can access rows and columns either by their named index or by their integer index. In other words, you can access columns either by their name or by their position relative to the other columns. This will be shown in more detail later.

In [3]:
import pandas as pd
df = pd.read_csv("./data/titanic.csv")

In [4]:
df.head()

Unnamed: 0,PassengerId,Survived,Pclass,Name,Sex,Age,SibSp,Parch,Ticket,Fare,Cabin,Embarked
0,1,0,3,"Braund, Mr. Owen Harris",male,22.0,1,0,A/5 21171,7.25,,S
1,2,1,1,"Cumings, Mrs. John Bradley (Florence Briggs Th...",female,38.0,1,0,PC 17599,71.2833,C85,C
2,3,1,3,"Heikkinen, Miss. Laina",female,26.0,0,0,STON/O2. 3101282,7.925,,S
3,4,1,1,"Futrelle, Mrs. Jacques Heath (Lily May Peel)",female,35.0,1,0,113803,53.1,C123,S
4,5,0,3,"Allen, Mr. William Henry",male,35.0,0,0,373450,8.05,,S


## Working with columns

### Get a single column

To get an individual column out of a pandas dataframe, we can access the column by using its named index in square brackets after the name of the dataframe as exemplified below.

In [5]:
df['Sex']

0        male
1      female
2      female
3      female
4        male
        ...  
886      male
887    female
888    female
889      male
890      male
Name: Sex, Length: 891, dtype: object

Notice that when we select a single column (as opposed to several columns or an entire dataframe), the formatting is different. This is because when we select a single column or row, we are actually getting back a Series object instead of an entire dataframe. Remember that **in the same way that a table is made of many columns, a dataframe is made of many series**.

### The Series object

The Series object acts a lot like an NumPy array in that functions can be directly applied to all the values at once. However, Series objects also retain some of the methods that dataframes have built in.

For example, the `value_counts()` method can be used either on a dataframe or a Series object. However, it makes a little more sense to use on a single Series at a time.

In [6]:
# Value counts on a Series
df['Sex'].value_counts()

male      577
female    314
Name: Sex, dtype: int64

In [7]:
# Value counts on a dataframe -- counts distinct rows with no null values. Not very useful! (in this example)
# df.value_counts() # Uncomment me to see

### Get multiple columns

To get back several columns in a dataframe, replace the single column name (shown above) with a list of column names. Notice that because we selected more than one column, a new dataframe is returned instead of a Series object.

In [8]:
df[['Age', 'Sex']]

Unnamed: 0,Age,Sex
0,22.0,male
1,38.0,female
2,26.0,female
3,35.0,female
4,35.0,male
...,...,...
886,27.0,male
887,19.0,female
888,,female
889,26.0,male


## Working with rows
### Get a single row
To get an individual row, use the .loc[] property an pass in an index label. In this case, the index labels are numbers 0-890, although they can take other values.

Note that `.loc` is not a function, but rather a property. Thus, it is not called with parentheses but is instead passed an index inside square brackets `[]`.

In [9]:
df.loc[1]

PassengerId                                                    2
Survived                                                       1
Pclass                                                         1
Name           Cumings, Mrs. John Bradley (Florence Briggs Th...
Sex                                                       female
Age                                                         38.0
SibSp                                                          1
Parch                                                          0
Ticket                                                  PC 17599
Fare                                                     71.2833
Cabin                                                        C85
Embarked                                                       C
Name: 1, dtype: object

Notice again that the formatting of this row is different-- this is another Series object! Pandas converted this single row of data into an array-like data structure.

### Get multiple rows

However, if we get multiple rows by slicing...

In [10]:
df.loc[1:3]

Unnamed: 0,PassengerId,Survived,Pclass,Name,Sex,Age,SibSp,Parch,Ticket,Fare,Cabin,Embarked
1,2,1,1,"Cumings, Mrs. John Bradley (Florence Briggs Th...",female,38.0,1,0,PC 17599,71.2833,C85,C
2,3,1,3,"Heikkinen, Miss. Laina",female,26.0,0,0,STON/O2. 3101282,7.925,,S
3,4,1,1,"Futrelle, Mrs. Jacques Heath (Lily May Peel)",female,35.0,1,0,113803,53.1,C123,S


... we can see just a few rows returned as a new dataframe.

You can also get many different rows by passing a list of indexes to the `.loc` property.

In [12]:
df.loc[[3, 44, 610]]

Unnamed: 0,PassengerId,Survived,Pclass,Name,Sex,Age,SibSp,Parch,Ticket,Fare,Cabin,Embarked
3,4,1,1,"Futrelle, Mrs. Jacques Heath (Lily May Peel)",female,35.0,1,0,113803,53.1,C123,S
44,45,1,3,"Devaney, Miss. Margaret Delia",female,19.0,0,0,330958,7.8792,,Q
610,611,0,3,"Andersson, Mrs. Anders Johan (Alfrida Konstant...",female,39.0,1,5,347082,31.275,,S


In addition to locating a row by its label, we can also use the `.iloc` property to get a row by its location. Thus, the first row of the dataframe can be obtained without knowing what its index label is.

In [13]:
df.iloc[0]

PassengerId                          1
Survived                             0
Pclass                               3
Name           Braund, Mr. Owen Harris
Sex                               male
Age                               22.0
SibSp                                1
Parch                                0
Ticket                       A/5 21171
Fare                              7.25
Cabin                              NaN
Embarked                             S
Name: 0, dtype: object

In [14]:
df.iloc[[1, 44, 610]]

Unnamed: 0,PassengerId,Survived,Pclass,Name,Sex,Age,SibSp,Parch,Ticket,Fare,Cabin,Embarked
1,2,1,1,"Cumings, Mrs. John Bradley (Florence Briggs Th...",female,38.0,1,0,PC 17599,71.2833,C85,C
44,45,1,3,"Devaney, Miss. Margaret Delia",female,19.0,0,0,330958,7.8792,,Q
610,611,0,3,"Andersson, Mrs. Anders Johan (Alfrida Konstant...",female,39.0,1,5,347082,31.275,,S


## Individual Cells and Mixed rows/columns
### Get individual cell or mixed rows and columns
To get an individual cell, you can use either `.loc` or `.iloc` to pass in a row and a column that you want to get back. Additionally, you can pass in a list of rows, a list of columns, or a list of both rows and columns to return. Note that when you use `.iloc` with columns, you are searching for the columns by their position and not by their name as you would with `.loc`.

In [15]:
df.loc[50, 'Name']

'Panula, Master. Juha Niilo'

In [18]:
# Get rows 10 through 15 (not inclusive) and columns in positions 3 and 7
df.iloc[ 10:15, [3, 7] ]

Unnamed: 0,Name,Parch
10,"Sandstrom, Miss. Marguerite Rut",1
11,"Bonnell, Miss. Elizabeth",0
12,"Saundercock, Mr. William Henry",0
13,"Andersson, Mr. Anders Johan",5
14,"Vestrom, Miss. Hulda Amanda Adolfina",0
