# Position and Label Based Indexing: ```df.iloc``` and ```df.loc```

You have seen some ways of selecting rows and columns from dataframes. Let's now see some other ways of indexing dataframes, which pandas recommends, since they are more explicit (and less ambiguous).

There are two main ways of indexing dataframes:
1. Position based indexing using ```df.iloc```
2. Label based indexing using ```df.loc```

Using both the methods, we will do the following indexing operations on a dataframe:
* Selecting single elements/cells
* Selecting single and multiple rows
* Selecting single and multiple columns
* Selecting multiple rows and columns

In [1]:
# loading libraries and reading the data
import numpy as np
import pandas as pd

df = pd.read_excel("iris.xls")
df.head()

Unnamed: 0,sepal length,sepal width,petal length,petal width,iris
0,5.1,3.5,1.4,0.2,Iris-setosa
1,4.9,3.0,1.4,0.2,Iris-setosa
2,4.7,3.2,1.3,0.2,Iris-setosa
3,4.6,3.1,1.5,0.2,Iris-setosa
4,5.0,3.6,1.4,0.2,Iris-setosa


### Position (Integer) Based Indexing

Pandas provides the ```df.iloc``` functionality to index dataframes **using integer indices**. 


In [2]:
help(pd.DataFrame.iloc)

Help on property:

    Purely integer-location based indexing for selection by position.
    
    ``.iloc[]`` is primarily integer position based (from ``0`` to
    ``length-1`` of the axis), but may also be used with a boolean
    array.
    
    Allowed inputs are:
    
    - An integer, e.g. ``5``.
    - A list or array of integers, e.g. ``[4, 3, 0]``.
    - A slice object with ints, e.g. ``1:7``.
    - A boolean array.
    - A ``callable`` function with one argument (the calling Series or
      DataFrame) and that returns valid output for indexing (one of the above).
      This is useful in method chains, when you don't have a reference to the
      calling object, but would like to base your selection on some value.
    
    ``.iloc`` will raise ``IndexError`` if a requested indexer is
    out-of-bounds, except *slice* indexers which allow out-of-bounds
    indexing (this conforms with python/numpy *slice* semantics).
    
    See more at :ref:`Selection by Position <indexing.inte

As mentioned in the documentation, the inputs x, y to ```df.iloc[x, y]``` can be:
* An integer, e.g. ```3```
* A list or array of integers, e.g. ```[3, 7, 8]```
* An integer range, i.e. ```3:8```
* A boolean array

Let's see some examples.

In [2]:
# Selecting a single element
# Note that 2, 4 corresponds to the third row and fifth column (Sales)
df.iloc[2, 4]

'Iris-setosa'

Note that simply writing ```df[2, 4]``` will throw an error, since pandas gets confused whether the 2 is an integer index (the third row), or is it a row with label = 2? 

On the other hand, ```df.iloc[2, 4]``` tells pandas explicitly that it should assume **integer indices**.

In [4]:
# Selecting a single row, and all columns
# Select the 6th row, with label (and index) = 5
df.iloc[5]

sepal length            5.4
sepal width             3.9
petal length            1.7
petal width             0.4
iris            Iris-setosa
Name: 5, dtype: object

In [5]:
# The above is equivalent to this
# The ":" indicates "all rows/columns"
df.iloc[5, :]

# equivalent to market_df.iloc[5, ]

sepal length            5.4
sepal width             3.9
petal length            1.7
petal width             0.4
iris            Iris-setosa
Name: 5, dtype: object

In [6]:
# Select multiple rows using a list of indices
df.iloc[[3, 7, 8]]

Unnamed: 0,sepal length,sepal width,petal length,petal width,iris
3,4.6,3.1,1.5,0.2,Iris-setosa
7,5.0,3.4,1.5,0.2,Iris-setosa
8,4.4,2.9,1.4,0.2,Iris-setosa


In [7]:
# Equivalently, you can use:
df.iloc[[3, 7, 8], :]

# same as market_df.iloc[[3, 7, 8], ]

Unnamed: 0,sepal length,sepal width,petal length,petal width,iris
3,4.6,3.1,1.5,0.2,Iris-setosa
7,5.0,3.4,1.5,0.2,Iris-setosa
8,4.4,2.9,1.4,0.2,Iris-setosa


In [8]:
# Selecting rows using a range of integer indices
# Notice that 4 is included, 8 is not
df.iloc[4:8]

Unnamed: 0,sepal length,sepal width,petal length,petal width,iris
4,5.0,3.6,1.4,0.2,Iris-setosa
5,5.4,3.9,1.7,0.4,Iris-setosa
6,4.6,3.4,1.4,0.3,Iris-setosa
7,5.0,3.4,1.5,0.2,Iris-setosa


In [13]:
# or equivalently
df.iloc[4:8, :]

# or market_df.iloc[4:8, ]

Unnamed: 0,sepal length,sepal width,petal length,petal width,iris
4,5.0,3.6,1.4,0.2,Iris-setosa
5,5.4,3.9,1.7,0.4,Iris-setosa
6,4.6,3.4,1.4,0.3,Iris-setosa
7,5.0,3.4,1.5,0.2,Iris-setosa


In [10]:
# Selecting a single column
# Notice that the column index starts at 0, and 2 represents the third column 
df.iloc[:, 2]

0      1.4
1      1.4
2      1.3
3      1.5
4      1.4
      ... 
145    5.2
146    5.0
147    5.2
148    5.4
149    5.1
Name: petal length, Length: 150, dtype: float64

In [14]:
# Selecting multiple columns
df.iloc[:, 2:5]

Unnamed: 0,petal length,petal width,iris
0,1.4,0.2,Iris-setosa
1,1.4,0.2,Iris-setosa
2,1.3,0.2,Iris-setosa
3,1.5,0.2,Iris-setosa
4,1.4,0.2,Iris-setosa
...,...,...,...
145,5.2,2.3,Iris-virginica
146,5.0,1.9,Iris-virginica
147,5.2,2.0,Iris-virginica
148,5.4,2.3,Iris-virginica


In [15]:
# Selecting multiple rows and columns
df.iloc[2:5, 1:5]

Unnamed: 0,sepal width,petal length,petal width,iris
2,3.2,1.3,0.2,Iris-setosa
3,3.1,1.5,0.2,Iris-setosa
4,3.6,1.4,0.2,Iris-setosa


In [18]:
# Using booleans
# This selects the rows corresponding to True
df.iloc[[True, True, False, True, True, False, True]]

IndexError: Boolean index has wrong length: 7 instead of 150

To summarise, ```df.iloc[x, y]``` uses integer indices starting at 0.

The other common way of indexing is the **label based** indexing, which uses ```df.loc[]```. 


### Label Based Indexing

Pandas provides the ```df.loc[]``` functionality to index dataframes **using labels**. 

In [19]:
help(pd.DataFrame.loc)

Help on property:

    Access a group of rows and columns by label(s) or a boolean array.
    
    ``.loc[]`` is primarily label based, but may also be used with a
    boolean array.
    
    Allowed inputs are:
    
    - A single label, e.g. ``5`` or ``'a'``, (note that ``5`` is
      interpreted as a *label* of the index, and **never** as an
      integer position along the index).
    - A list or array of labels, e.g. ``['a', 'b', 'c']``.
    - A slice object with labels, e.g. ``'a':'f'``.
    
          start and the stop are included
    
    - A boolean array of the same length as the axis being sliced,
      e.g. ``[True, False, True]``.
    - An alignable boolean Series. The index of the key will be aligned before
      masking.
    - An alignable Index. The Index of the returned selection will be the input.
    - A ``callable`` function with one argument (the calling Series or
      DataFrame) and that returns valid output for indexing (one of the above)
    
    See more at 

As mentioned in the documentation, the inputs x, y to df.loc[x, y] can be:
* A single label, e.g. ```'3'``` or ```'row_index'```
* A list or array of labels, e.g. ```['3', '7', '8']```
* A range of labels, where ```row_x``` and ```row_y``` **both are included**, i.e. ```'row_x':'row_y'```
* A boolean array <br>
Let's see some examples.

In [20]:
# Selecting a single element
# Select row label = 2 and column label = 'Sales
df.loc[2, 'petal width']

0.2

In [21]:
# Selecting a single row using a single label
# df.loc reads 5 as a label, not index
df.loc[5]

sepal length            5.4
sepal width             3.9
petal length            1.7
petal width             0.4
iris            Iris-setosa
Name: 5, dtype: object

In [22]:
# or equivalently
df.loc[5, :]

# or market_df.loc[5, ]

sepal length            5.4
sepal width             3.9
petal length            1.7
petal width             0.4
iris            Iris-setosa
Name: 5, dtype: object

In [23]:
# Select multiple rows using a list of row labels
df.loc[[4, 6, 9]]

Unnamed: 0,sepal length,sepal width,petal length,petal width,iris
4,5.0,3.6,1.4,0.2,Iris-setosa
6,4.6,3.4,1.4,0.3,Iris-setosa
9,4.9,3.1,1.5,0.1,Iris-setosa


In [24]:
# Selecting rows using a range of labels
# Notice that with df.loc, both 4 and 8 are included, unlike with df.iloc
# This is an important difference between iloc and loc
df.loc[4:8]

Unnamed: 0,sepal length,sepal width,petal length,petal width,iris
4,5.0,3.6,1.4,0.2,Iris-setosa
5,5.4,3.9,1.7,0.4,Iris-setosa
6,4.6,3.4,1.4,0.3,Iris-setosa
7,5.0,3.4,1.5,0.2,Iris-setosa
8,4.4,2.9,1.4,0.2,Iris-setosa


In [25]:
# Or equivalently
df.loc[4:8, ]

Unnamed: 0,sepal length,sepal width,petal length,petal width,iris
4,5.0,3.6,1.4,0.2,Iris-setosa
5,5.4,3.9,1.7,0.4,Iris-setosa
6,4.6,3.4,1.4,0.3,Iris-setosa
7,5.0,3.4,1.5,0.2,Iris-setosa
8,4.4,2.9,1.4,0.2,Iris-setosa


In [None]:
# Or equivalently
df.loc[4:8, :]

In [26]:
# The use of label based indexing will be more clear when we have custom row indices
# Let's change the indices to Ord_id
df.set_index('sepal length', inplace = True)
df.head()

Unnamed: 0_level_0,sepal width,petal length,petal width,iris
sepal length,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
5.1,3.5,1.4,0.2,Iris-setosa
4.9,3.0,1.4,0.2,Iris-setosa
4.7,3.2,1.3,0.2,Iris-setosa
4.6,3.1,1.5,0.2,Iris-setosa
5.0,3.6,1.4,0.2,Iris-setosa


To summarise, we discussed two **explicit ways of indexing dataframes** - ```df.iloc[]``` and ```df.loc[]```. Next, let's study how to slice and dice sections of dataframes. 