# Problem Set 2.5: Indexes: `.loc[]` vs `.iloc[]`

Explore what "indexes" are in Series and DataFrames.

We'll also explore the difference between selecting rows in a DataFrame using
labels (`loc[]`) and positions (`iloc[]`). We'll use same deck of cards as last
time:

In [None]:
import pandas as pd
deck_df = pd.read_csv('deck.csv')
deck_df

## Understanding Indexes

Indexes in pandas provide a way to uniquely identify each row in a DataFrame.
The index of `deck_df` is the row numbers we've been seeing:

In [None]:
deck_df.index

In this case, it's just the numbers from 0 to 51. You can get the underlying
numpy array with `.values`:

In [None]:
deck_df.index.values

But the index isn't always just the row numbers. Let's inspect a little closer.

First, let's pull out the spades and diamonds from the deck:

In [None]:
spades = deck_df[deck_df['suit'] == 'spades']
spades

In [None]:
diamonds = deck_df.loc[deck_df['suit'] == 'diamonds']
diamonds

### Upending our worldview

HOLD ON! Those aren't row numbers!

In [None]:
diamonds.index

Well, they are, but they're the row numbers *from when we read the CSV*. When
we select a subset of a DataFrame, the index of each row stays what it was
before.

What we have been thinking of as row numbers so far are actually what pandas
calls the "label" for a row. A row's label is actually its Series's name. And
an "index" is a collection of labels.

Let's look:

In [None]:
diamonds.loc[13]

The whole time we've been using `.loc[]`, we've actually been selecting with
labels, not row numbers.

### Series have indexes too

Let's examine that card:

In [None]:
two_of_diamonds = diamonds.loc[13]
two_of_diamonds.index

The card (remember, this represents a row) is a Series, and it has an index
too. The valid labels are 'suit', 'symbol', 'rank', and 'value'.

And you can use `.loc[]` on those labels:

In [None]:
two_of_diamonds.loc['symbol']

### Indexes and Labels: Recap

So, what's going on?

- A Series has a `name`, an `index`, and `values`. The `index` contains the
  valid labels in that Series, and each label corresponds to one of the values.
  The values are just in a numpy array.
- A DataFrame has `columns`. It acts like a `dict` where if you select a column
  by name, you get a Series (whose name matches the column name).
- Every Series in a DataFrame has an index. Actually, they all have *the same*
  index. Meaning, any label corresponds to some value in each column.
- The index of the DataFrame is just that. It's the index all the columns
  share.
- This is really all DataFrames are. They are a collection of columns, in a
  `dict`-like structure where you can identify a column by its name. All the
  columns are "aligned" (which means they have the same index).

In [None]:
diamonds.index

In [None]:
diamonds['suit'].index

In [None]:
diamonds['rank'].index

## Selecting rows with `.loc[]` and `.iloc[]`

As we've seen, `.loc[]` selects an item (an element of a Series, or a row of a
DataFrame) by its label.

In [None]:
diamonds.loc[13:18]

In [None]:
diamonds['rank'].loc[13:18]

### Introducing `.iloc[]`

But how do we actually select things by row number? For that, we use `.iloc[]`
(for "integer location"):

In [None]:
diamonds.iloc[0]

`.iloc[]` also supports slices, but they are *exclusive* of the end point.

`.loc[]` is inclusive of the end point: see above when we selected
`diamonds.loc[13:18]`, the row with label 18 was included.

With `.iloc[]`, it acts more like a Python list slice: the end point is not
included. This makes sense for collections that are indexed by integers,
[according to
Dijkstra](https://www.cs.utexas.edu/users/EWD/ewd08xx/EWD831.PDF).

In [None]:
diamonds.iloc[0:6]

#### Selecting along both axes with `.iloc[]`

Just as `.loc[]` can select along both axes, so can `.iloc[]`. But the second
axis is also integers, not column names:

In [None]:
diamonds.iloc[0:6, 1]

In [None]:
diamonds.iloc[0:6, 1:3]

Notice that the index labels (names) are different from the row numbers. Indexes don't have to be numeric and can be anything that uniquely identifies each row.

## Recap

In a DataFrame, row numbers are not the same as row labels. Row labels are
preserved when selecting subsets of rows.

You can select rows by label with `.loc[]` and by position with `.iloc[]`.

You can also select columns by their label (name) with `.loc[]` or by their
position with `.iloc[]`.

## Examples

Let's now practice selecting rows using both `.loc[]` and `.iloc[]` methods. Complete the exercises in the following code cells.

### Example 1

Select all cards from the suit 'hearts' using `.loc[]`.

### Example 2

Select the first and last cards from the suit 'clubs' using `.iloc[]`.

### Example 3

Select all cards from the suit 'hearts' with ranks between 7 and 10 (inclusive) using `.loc[]`.

### Example 4

Split the deck in two, selecting the odd numbered cards (by row number) into
one DataFrame and the even numbered cards into another DataFrame, using
`.iloc[]`.

In [None]:
# Even cards


In [None]:
# Odd cards


### Example 5

Draw a hand of five random cards. Here are five random row numbers to use:

In [None]:
import numpy as np
rows = np.random.choice(deck_df.index.values, 5)
rows

In [None]:
# Select rows with .iloc[]
