In [30]:
import numpy as np
import pandas as pd
from sklearn.datasets import load_iris

Let's review indexing of dataframes with `[ ]`, `loc` and `iloc`. Indexing with `[ ]` can be confusing, so better to use `loc` (actual explicit index) and `iloc` (implicit index, only accepts numbers). We may use slicing or so called fancy indexing (`['sepal length (cm)', 'petal width (cm)']`).

It's important to understand that we may consider a dataframe as a dictionary of series. So better to use `[ ]` only for accessing the columns. Again, there's also dot notation for this, which can be confusing as well.

In [31]:
# Load dataset and convert to dataframe
iris = load_iris()
df = pd.DataFrame(data=iris.data, columns=iris.feature_names)

# Add target column
df['target'] = iris.target

In [32]:
df.head()

Unnamed: 0,sepal length (cm),sepal width (cm),petal length (cm),petal width (cm),target
0,5.1,3.5,1.4,0.2,0
1,4.9,3.0,1.4,0.2,0
2,4.7,3.2,1.3,0.2,0
3,4.6,3.1,1.5,0.2,0
4,5.0,3.6,1.4,0.2,0


In [33]:
df.shape

(150, 5)

In [34]:
df.iloc[0]

sepal length (cm)    5.1
sepal width (cm)     3.5
petal length (cm)    1.4
petal width (cm)     0.2
target               0.0
Name: 0, dtype: float64

In [35]:
df.iloc[0:5, 0:2]

Unnamed: 0,sepal length (cm),sepal width (cm)
0,5.1,3.5
1,4.9,3.0
2,4.7,3.2
3,4.6,3.1
4,5.0,3.6


In [36]:
df.iloc[0:5, [0, 2]]

Unnamed: 0,sepal length (cm),petal length (cm)
0,5.1,1.4
1,4.9,1.4
2,4.7,1.3
3,4.6,1.5
4,5.0,1.4


In [37]:
df.loc[0:5, ['sepal length (cm)', 'petal width (cm)']]

Unnamed: 0,sepal length (cm),petal width (cm)
0,5.1,0.2
1,4.9,0.2
2,4.7,0.2
3,4.6,0.2
4,5.0,0.2
5,5.4,0.4


In [38]:
df.loc[0:5, 'sepal length (cm)': 'petal length (cm)']

Unnamed: 0,sepal length (cm),sepal width (cm),petal length (cm)
0,5.1,3.5,1.4
1,4.9,3.0,1.4
2,4.7,3.2,1.3
3,4.6,3.1,1.5
4,5.0,3.6,1.4
5,5.4,3.9,1.7


Here's one example when indexing can be confusing: `data['California']` is not allowed, but you can do slicing `data['California':'Florida']`. With `loc` we can access rows: `data.loc['California']` or `data.loc['California', :]`.

In [39]:
area = pd.Series({'California': 423967, 'Texas': 695662,
'Florida': 170312, 'New York': 141297,
'Pennsylvania': 119280})
pop = pd.Series({'California': 39538223, 'Texas': 29145505,
'Florida': 21538187, 'New York': 20201249,
'Pennsylvania': 13002700})
data = pd.DataFrame({'area':area, 'pop':pop})

In [40]:
data

Unnamed: 0,area,pop
California,423967,39538223
Texas,695662,29145505
Florida,170312,21538187
New York,141297,20201249
Pennsylvania,119280,13002700


In [41]:
# Not allowed
# data['California']

In [42]:
data['California':'Florida']

Unnamed: 0,area,pop
California,423967,39538223
Texas,695662,29145505
Florida,170312,21538187


In [43]:
data.loc['California', :]

area      423967
pop     39538223
Name: California, dtype: int64