# Data Access

## Accessing Series Elements

In the section above, you’ve created a Pandas Series based on a Python list and compared the two data structures. You’ve seen how a Series object is similar to lists and dictionaries in several ways. A further similarity is that you can use the **indexing operator** (`[]`) for Series as well.

You’ll also learn how to use two Pandas-specific access methods:

- `.loc`
- `.iloc`

You’ll see that these data access methods can be much more readable than the indexing operator.

### Using the Indexing Operator

Recall that a Series has two indices:

- A positional or implicit index, which is always a `RangeIndex`
- A label or explicit index, which can contain any hashable objects

In [5]:
city_revenues = pd.Series(
    [4200, 8000, 6500],
    index=["Amsterdam", "Toronto", "Tokyo"]
)

You can conveniently access the values in a `Series` with both the label and positional indices:

In [6]:
city_revenues["Toronto"]

8000

In [7]:
city_revenues[1]

8000

You can also use negative indices and slices, just like you would for a list:

In [8]:
city_revenues[-1]

6500

In [9]:
city_revenues[1:]

Toronto    8000
Tokyo      6500
dtype: int64

In [10]:
city_revenues['Toronto':]

Toronto    8000
Tokyo      6500
dtype: int64

In [12]:
city_revenues[['Toronto', 'Tokyo']]

Toronto    8000
Tokyo      6500
dtype: int64

### Using `.loc` and `.iloc`

The indexing operator (`[]`) is convenient, but there’s a caveat. What if the labels are also numbers? Say you have to work with a `Series` object like this:

In [13]:
colors = pd.Series(
    ["red", "purple", "blue", "green", "yellow"],
    index=[1, 2, 3, 5, 8]
)
colors

1       red
2    purple
3      blue
5     green
8    yellow
dtype: object

What will `colors[1]` return? For a positional index, `colors[1]` is `"purple"`. However, if you go by the label index, then `colors[1]` is referring to `"red"`.

In [14]:
colors[1]

'red'

The good news is, you don’t have to figure it out! Instead, to avoid confusion, the Pandas Python library provides two data access methods:

- `.loc` refers to the **label index**.
- `.iloc` refers to the **positional index**.

These data access methods are much more readable:

In [16]:
colors.loc[1]

'red'

In [17]:
colors.iloc[1]

'purple'

`colors.loc[1]` returned `"red"`, the element with the label `1`. `colors.iloc[1]` returned `"purple"`, the element with the index `1`.

The following figure shows which elements `.loc` and `.iloc` refer to:

<img src="../images/loc-iloc.png" alt="loc-iloc" width=400 align="left" />

Again, `.loc` points to the **label index** on the right-hand side of the image. Meanwhile, `.iloc` points to the **positional index** on the left-hand side of the picture.

It’s easier to keep in mind the distinction between `.loc` and `.iloc` than it is to figure out what the indexing operator will return. Even if you’re familiar with all the quirks of the indexing operator, it can be dangerous to assume that everybody who reads your code has internalized those rules as well!

> **Note:** In addition to being confusing for Series with numeric labels, the Python indexing operator has some **performance drawbacks**. It’s perfectly okay to use it in interactive sessions for ad-hoc analysis, but for production code, the `.loc` and `.iloc` data access methods are preferable. For further details, check out the Pandas User Guide section on [indexing and selecting data](https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html).

`.loc` and `.iloc` also support the features you would expect from indexing operators, like slicing. However, these data access methods have an important difference. While `.iloc` **excludes** the closing element, `.loc` **includes** it. Take a look at this code block:

In [19]:
# Return the elements with the implicit index: 1, 2
colors.iloc[1:3]

2    purple
3      blue
dtype: object

If you compare this code with the image above, then you can see that `colors.iloc[1:3]` returns the elements with the positional indices of 1 and 2. The closing item `"green"` with a positional index of `3` is excluded.

On the other hand, `.loc` includes the closing element:

In [20]:
# Return the elements with the explicit index between 3 and 8
colors.loc[3:8]

3      blue
5     green
8    yellow
dtype: object

This code block says to return all elements with a label index between 3 and 8. Here, the closing item `"yellow"` has a label index of `8` and is included in the output.

You can also pass a negative positional index to `.iloc`:

In [21]:
colors.iloc[-2]

'green'

> **Note:** There used to be an `.ix` indexer, which tried to guess whether it should apply positional or label indexing depending on the data type of the index. Because it caused a lot of confusion, it has been deprecated since Pandas version 0.20.0.
>
> It’s highly recommended that you **do not use `.ix`** for indexing. Instead, always use .loc for label indexing and .iloc for positional indexing. For further details, check out the [Pandas User Guide](https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#ix-indexer-is-deprecated).

You can use the code blocks above to distinguish between two Series behaviors:

- You can use `.iloc` on a Series similar to using `[]` on a list.
- You can use `.loc` on a Series similar to using `[]` on a dictionary.

Be sure to keep these distinctions in mind as you access elements of your `Series` objects.

## Accessing DataFrame Elements

Since a `DataFrame` consists of `Series` objects, you can use the very same tools to access its elements. The crucial difference is the additional dimension of the DataFrame. You’ll use the indexing operator for the columns and the access methods `.loc` and `.iloc` on the rows.

### Using the Indexing Operator

If you think of a `DataFrame` as a dictionary whose values are Series, then it makes sense that you can access its columns with the indexing operator: