# 2.4 Indexes

## Slicing Rows and Columns
Remember slicing NumPy arrays? NumPy slicing allowed us to select one or more columns and one or more rows at a time. Because Pandas is built on top of NumPy, we can slice a dataframe in the same way. You can select an individual column or a chunck of columns. You can also select an individual row or a chunk of rows.

In a dataframe, you can access rows and columns either by their named index or by their integer index. In other words, you can access columns either by their name or by their position relative to the other columns. This will be shown in more detail later.

### About the data
​
The data used in this notebook shows information about passengers on the *Titanic* cruiseliner, a ship which set out from Southampton, U.K. to sail across the Atlantic ocean and which tragically sank upon collision with an iceberg. The dataset contains information about each passenger's passenger class, name, sex, age, siblings, parents/children, ticket number, ticket fare, cabin number, and the embarked location. It also contains information about each passenger's survival status. This data set is extremely popular among data scientists and will facilitate demonstrations of Pandas concepts.

To begin, we will import pandas and read the data into a dataframe.

In [None]:
import pandas as pd
df = pd.read_csv("./data/titanic.csv")

## Seeing the indexes in our dataframe

As stated previously, we can access row and columns by either their **named** index or their **positional** index. We can see the named indexes of our dataframe by printing it out.

In [None]:
df.head()

The **bolded** column names are actually the *named indexes* for the columns. In contrast, the **bolded** row numbers are the *named indexes* for the rows. Index names can be changed, but you can always print out your dataframe to see what they actually are. They will always be in bold font if you are using Jupyter Lab.

The positional index cannot be changed and is relative to the number of rows or columns. The column `PassengerId`, in this case, has a positional index of `0`, since it is the first column. Likewise, the first row has a positional index of `0` as well, which just so happens to correspond with the default *named index* of the row. If the named index for the rows were changed to something else (like letters of the alphabet), the positional index of the first row would still be `0`.

## Working with columns

### Get a single column

To get an individual column out of a pandas dataframe, we can access the column by using its named index in square brackets after the name of the dataframe as exemplified below.

In [None]:
df['Sex']

Notice that when we select a single column (as opposed to several columns or an entire dataframe), the formatting is different. This is because when we select a single column or row, we are actually getting back a Series object instead of an entire dataframe. Remember that **in the same way that a table is made of many columns, a dataframe is made of many Series objects**.

### The Series object

The Series object acts a lot like an NumPy array in that functions can be directly applied to all the values at once. However, Series objects also retain some of the methods that dataframes have built in. Thus, Series objects are an even better version of the NumPy array.

For example, the `.value_counts()` method can be used either on a dataframe or a Series object. The `.value_counts()` method counts up each unique value and returns a new Series (if used on a Series) or dataframe (if used on a dataframe) with the counts of the number of times each value appeared. `.value_counts()` isn't usually used on a dataframe with many columns, since it will count up each unique row looking across each column and likely not return any interesting information.

We can, however, use `.value_counts()` on a single Series at a time to learn more about the data. In this case, we can use `.value_counts()` on the column `Sex` to learn about how many male and female passengers were on the *Titanic*.

In [None]:
# Value counts on a Series
df['Sex'].value_counts()

In [None]:
# Value counts on a dataframe -- counts distinct rows with no null values. Not very useful! (in this example)
# df.value_counts() # Uncomment me to see

#### Converting a Series to a list
At any time, you can convert the values in a Series (a column) into a list by using the `.tolist()` method. This method may be useful if you want to do something with the list of values in a row using normal Python functions, such as using the `in` operator to check if a value is in the dataframe. This will be covered in more detail later on.

In [None]:
df['Name'].tolist()

### Get multiple columns

To get back several columns in a dataframe, replace the single column name (shown above) with a list of column names. Notice that because we selected more than one column, a new dataframe is returned instead of a Series object.

In [None]:
df[['Age', 'Sex']]

## Working with rows
### Get a single row
To get an individual row, use the `.loc[]` property an pass in an index label. In this case, the index labels are numbers 0-890, although they can take other values.

Note that `.loc` is not a function, but rather a property. Thus, it is not called with parentheses but is instead passed an index inside square brackets `[]`.

In [None]:
df.loc[1]

Notice again that the formatting of this row is different-- this is another Series object! Pandas converted this single row of data into an array-like data structure.

### Get multiple rows

However, if we get multiple rows by slicing...

In [None]:
df.loc[1:3]

... we can see just a few rows returned as a new dataframe.

You can also get many different rows by passing a list of indexes to the `.loc` property.

In [None]:
df.loc[[3, 44, 610]]

In addition to locating a row by its named index, we can also use the `.iloc` property to get a row by its positional index. Thus, the first row of the dataframe can be obtained without knowing what its index label is.

In [None]:
df.iloc[0]

In [None]:
df.iloc[[1, 44, 610]]

## Individual Cells and Mixed rows/columns
### Get individual cell or mixed rows and columns
To get an individual cell, you can use either `.loc` or `.iloc` to pass in a row and a column that you want to get back. `.loc` is used with named indexes, and `.iloc` is used with positional indexes. Note that neither of these properties use parentheses, but instead use square brackets to allow you to individually select items.

#### `.loc`

The `.loc` property allows you to access items from a dataframe by their named indexes. You can access either a single item, a single column and a slice of rows, a slice of columns and a single row, or a slice of columns and a slice of rows. You can also use `.loc` to get the entire dataframe. Note that you can get a slice of columns by placing a `:` between column names to get every column between them.

Additionally, you can pass in a list of rows, a list of columns, or a list of both rows and columns to return. Note that when you use `.iloc` with columns, you are searching for the columns by their positional index and not by their named index as you would with `.loc`.

In [None]:
# Get a single cell from the table
df.loc[50, 'Name']

In [None]:
# Get a slice of rows and a single column
df.loc[50:55, 'Name']

In [None]:
# Get a single row and a slice of columns
df.loc[50, 'Name':'SibSp']

In [None]:
# Get a slice of rows and a slice of columns
df.loc[50:55, 'Name':'SibSp']

In [None]:
# Get a slice of rows and all columns
df.loc[50:55, :]

In [None]:
# Get all rows and a slice of columns
df.loc[:, 'Name':'SibSp']

#### `.iloc`
The `.iloc` property allows you to access items from a dataframe by their positional indexes. You can access either a single item, a single column and a slice of rows, a slice of columns and a single row, or a slice of columns and a slice of rows. You can also use `.iloc` to get the entire dataframe.

In [None]:
# Get the value in the 51st row (50) and the fourth column (3)
df.iloc[50, 3]

In [None]:
# Get a slice of rows and a single column
df.iloc[50:55, 3]

In [None]:
# Get a single row and a slice of columns
df.iloc[50, 3:6]

In [None]:
# Get a slice of rows and a slice of columns
df.iloc[50:55, 3:6]

In [None]:
# Get a slice of rows and all columns
df.iloc[50:55, :]

In [None]:
# Get all rows and a slice of columns
df.iloc[:, 3:6]

While not used as frequently, you can also use `.loc` and `.iloc` to pick out specific rows and columns by passing in a list of rows and/or columns.

In [None]:
# Get specific rows and columns with .loc[]
df.loc[[3, 66, 549], ['Name', 'Ticket']]

In [None]:
# Get specific rows and columns with .iloc[]
df.iloc[[3, 66, 549], [3, 8]]