# .loc and .iloc

In [25]:
import pandas as pd

raw_data = {'Value1': [1,2,3,4,5,6],
            'Value2': ['A', 'B', 'C', 'D', 'E', 'F'],
            'Value3': [100, 200, 300, 400, 500, 600],
            'Value4': [1000, 2000, 3000, 4000, 5000, 6000]}

testData = pd.DataFrame(raw_data, columns = ['Value1','Value2', 'Value3', 'Value4'])

Both .loc and .iloc allow us to select parts of a DataFrame that are not limited to complete columns or rows. The main difference is that .loc works with labels, and .iloc works with numerical values. With labels we mean the values in the index or columns properties.

## .loc

We will change the index several times to explain how .loc and .iloc works. Here we switch to a column that has the letters A to F in it. We then tell .loc to show from the row with index "A" to "C". **.loc is inclusive**, so row "C" will also be shown.

In [26]:
testData.set_index("Value2", inplace=True)
testData.loc['A':'C', :]

Unnamed: 0_level_0,Value1,Value3,Value4
Value2,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
A,1,100,1000
B,2,200,2000
C,3,300,3000


You might find the following in code:
```python
testData.loc[0:3, :]
```
Before we said that .loc works with labels, so this will not work with the dataframe we are using now. The error given will be: *TypeError: cannot do slice indexing on <class 'pandas.core.indexes.base.Index'> with these indexers [0] of <class 'int'>*

However, if the index column contains numerical values, these will be seen as labels and this will work. We will replace the index column with the values 0,3,6,9,12,15 and try to get the first 4 rows:

In [37]:
tmp = testData.copy()
tmp.index=[0,3,6,9,12,15]
tmp.loc[0:3, :]

Unnamed: 0,Value1,Value3,Value4
0,1,100,1000
3,2,200,2000


This only shows us two rows and not 4 as might be expected, because the .loc[0:3, :] does not see the 0 and 3 as numerical values but as labels. It does not give the first 4 rows in the column, but it returns the rows where the content of the index is between 0 and 3 inclusive.
To get the first 4 rows here we would need to use tmp.loc[0:9, :]

Something to keep in mind when using a datatime column as the index, is that you can not just pass in a string to .loc and expect it to work. The string needs to be converted into a datetime to work.

.loc also allows to do *filtering inside it's indexers*, which can be usefull to save space it the selection criteria are not too complex.

In [41]:
testData.loc[testData.Value3 > 300, :]

Unnamed: 0_level_0,Value1,Value3,Value4
Value2,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
D,4,400,4000
E,5,500,5000
F,6,600,6000


## .iloc

Iloc (Integer LOCation) uses numerical indexes to get it's results. It does not accept strings as column or row indexes. Contrary to .loc, iloc is exclusive with the ranges passed to it. 

In [40]:
tmp.iloc[0:3,:]

Unnamed: 0,Value1,Value3,Value4
0,1,100,1000
3,2,200,2000
6,3,300,3000


Here we get the first three rows from the DataFrame (as it is exclusive with it's ranges), it no longer looks at the content of the indexes to determine which ones to include. Therefore it's better to use when you want a certain amount of rows and columns.


Next: [Plotting](09-Plotting.ipynb) | [Content](00-Content.ipynb)