# Pandas - DataFrame Indexing and Slicing


---



In [1]:
import pandas as pd

## Indexing dataframes

**Access a single value** of a dataframe by first specifying its series, then its (explicit or implicit) index in square brackets.

In [23]:
# Numerical index
series_a = pd.Series([1, 2, 3], index=[1, 2, 3])
series_b = pd.Series([4, 5, 6], index=[1, 2, 3])
dataframe_a = pd.DataFrame({"col_a": series_a, "col_b": series_b})
print(dataframe_a)

print("\nThe element at explicit index ['col_a'][2] is:", dataframe_a["col_a"][2])

   col_a  col_b
1      1      4
2      2      5
3      3      6

The element at explicit index ['col_a'][2] is: 2


In [24]:
# String index
series_a = pd.Series([1, 2, 3], index=["1", "2", "3"])
series_b = pd.Series([4, 5, 6], index=["1", "2", "3"])
dataframe_a = pd.DataFrame({"col_a": series_a, "col_b": series_b})
print(dataframe_a)

print("\nThe element at explicit index ['col_a']['2'] is:", dataframe_a["col_a"]["2"])
print("The element at implicit index ['col_a'][1] is:", dataframe_a["col_a"][1])

   col_a  col_b
1      1      4
2      2      5
3      3      6

The element at explicit index ['col_a']['2'] is: 2
The element at implicit index ['col_a'][1] is: 2


**Access multiple values** of a dataframe by first specifying their series, then their (explicit or implicit) index in square brackets.

In [41]:
# Numerical index
series_a = pd.Series([1, 2, 3], index=[1, 2, 3])
series_b = pd.Series([4, 5, 6], index=[1, 2, 3])
dataframe_a = pd.DataFrame({"col_a": series_a, "col_b": series_b})
print(dataframe_a)

print("\nThe elements at explicit indexes ['col_a'][[2, 3]] are:")
print(dataframe_a["col_a"][[2, 3]])

   col_a  col_b
1      1      4
2      2      5
3      3      6

The elements at explicit indexes ['col_a'][[2, 3]] are:
2    2
3    3
Name: col_a, dtype: int64


**Modify a single value** of a dataframe by first specifying its series, then its (explicit or implicit) index in square brackets.

In [27]:
# Numerical index
series_a = pd.Series([1, 2, 3], index=[1, 2, 3])
series_b = pd.Series([4, 5, 6], index=[1, 2, 3])
dataframe_a = pd.DataFrame({"col_a": series_a, "col_b": series_b})

print(dataframe_a)
print("\nThe original value at explicit index ['col_a][1] is:", dataframe_a["col_a"][1])
print()

dataframe_a["col_a"][1] = 0
print(dataframe_a)
print("\nThe modified value at explicit index ['col_a][1] is:", dataframe_a["col_a"][1])

   col_a  col_b
1      1      4
2      2      5
3      3      6

The original value at explicit index ['col_a][1] is: 1

   col_a  col_b
1      0      4
2      2      5
3      3      6

The modified value at explicit index ['col_a][1] is: 0


Each series within a dataframe can contain only **one type of data**. This implies, for example, that if you insert a float into an integer series, the float will be truncated.

In [29]:
# Numerical index
series_a = pd.Series([1, 2, 3], index=[1, 2, 3])
series_b = pd.Series([4, 5, 6], index=[1, 2, 3])
dataframe_a = pd.DataFrame({"col_a": series_a, "col_b": series_b})

print(dataframe_a)
print("\nThe original value at explicit index ['col_a][1] is:", dataframe_a["col_a"][1])
print()

dataframe_a["col_a"][1] = 0.99
print(dataframe_a)
print("\nThe modified value at explicit index ['col_a][1] is:", dataframe_a["col_a"][1])

   col_a  col_b
1      1      4
2      2      5
3      3      6

The original value at explicit index ['col_a][1] is: 1

   col_a  col_b
1      0      4
2      2      5
3      3      6

The modified value at explicit index ['col_a][1] is: 0


**Add a series** to a dataframe by specifying the column name of the additional series. 

In [35]:
# Numerical index
series_a = pd.Series([1, 2, 3], index=[1, 2, 3])
series_b = pd.Series([4, 5, 6], index=[1, 2, 3])
dataframe_a = pd.DataFrame({"col_a": series_a, "col_b": series_b})
print("Original dataframe:")
print(dataframe_a)

series_c = pd.Series([7, 8, 9], [1, 2, 3])
dataframe_a["col_c"] = series_c
print("\nExtended dataframe:")
print(dataframe_a)

Original dataframe:
   col_a  col_b
1      1      4
2      2      5
3      3      6

Extended dataframe:
   col_a  col_b  col_c
1      1      4      7
2      2      5      8
3      3      6      9


## Slicing dataframes

**Slice a dataframe** by first specifying a series, then the (explicit or implicit) indexes in square brackets, if needed. Note that when slicing with the explicit index the final index is included, while when slicing with the implicit index the final index is excluded. 

In [40]:
# Numerical index
series_a = pd.Series([1, 2, 3], index=[1, 2, 3])
series_b = pd.Series([4, 5, 6], index=[1, 2, 3])
dataframe_a = pd.DataFrame({"col_a": series_a, "col_b": series_b})
print(dataframe_a)

print("\nThe slice at column ['col_a'] is:")
print(dataframe_a["col_a"])

print("\nThe slice at implicit indexes ['col_a'][1:3] is:")
print(dataframe_a["col_a"][1:3])

   col_a  col_b
1      1      4
2      2      5
3      3      6

The element at explicit index ['col_a'][2] is:
1    1
2    2
3    3
Name: col_a, dtype: int64

The element at explicit index ['col_a'][2] is:
2    2
3    3
Name: col_a, dtype: int64


In [44]:
# String index
series_a = pd.Series([1, 2, 3], index=["1", "2", "3"])
series_b = pd.Series([4, 5, 6], index=["1", "2", "3"])
dataframe_a = pd.DataFrame({"col_a": series_a, "col_b": series_b})
print(dataframe_a)

print("\nThe slice at column ['col_a'] is:")
print(dataframe_a["col_a"])

print("\nThe slice at explicit indexes ['col_a']['2':'3'] is:")
print(dataframe_a["col_a"]["2":"3"])

   col_a  col_b
1      1      4
2      2      5
3      3      6

The slice at column ['col_a'] is:
1    1
2    2
3    3
Name: col_a, dtype: int64

The slice at explicit indexes ['col_a']['2':'3'] is:
2    2
3    3
Name: col_a, dtype: int64


**Apply boolean indexing** to a dataframe by specifying the rule in square brackets. 

In [48]:
# Numerical index
series_a = pd.Series([1, 2, 3], index=[1, 2, 3])
series_b = pd.Series([4, 5, 6], index=[1, 2, 3])
dataframe_a = pd.DataFrame({"col_a": series_a, "col_b": series_b})
print(dataframe_a)

print("\nThe slice that satisfies the rule [>1] is:")
print(dataframe_a[dataframe_a["col_a"] > 1])

   col_a  col_b
1      1      4
2      2      5
3      3      6

The slice that satisfies the rule [>1] is:
   col_a  col_b
2      2      5
3      3      6


## Series indexers 

The patterns for data indexing and slicing just explained can be a source of confusion: as visible from the previous examples, if a dataframe has an explicit integer index, an indexing operation will use the explicit indices, while a slicing operation will use the implicit index. To obviate this potential confusion, Pandas provides special indexer attributes. 

**Index or slice a dataframe via explicit index** using `.loc`. Note that, since this indexer uses the explicit index, the final index is included.

In [70]:
# Numerical index
series_a = pd.Series([1, 2, 3], index=[1, 2, 3])
series_b = pd.Series([4, 5, 6], index=[1, 2, 3])
dataframe_a = pd.DataFrame({"col_a": series_a, "col_b": series_b})
print(dataframe_a)

print("\nThe element at explicit index .loc[2, 'col_a'] is:", dataframe_a.loc[2, "col_a"])
print("\nThe elements at explicit indexes .loc[[2, 3], 'col_a'] are:")
print(dataframe_a.loc[[2, 3], "col_a"])
print("\nThe slice at explicit indexes .loc[2:3, 'col_a'] is:")
print(dataframe_a.loc[2:3, "col_a"])

   col_a  col_b
1      1      4
2      2      5
3      3      6

The element at explicit index .loc[2, 'col_a'] is: 2

The elements at explicit indexes .loc[[2, 3], 'col_a'] are:
2    2
3    3
Name: col_a, dtype: int64

The slice at implicit indexes .loc[2:3, 'col_a'] is:
2    2
3    3
Name: col_a, dtype: int64


**Index or slice a dataframe via implicit index** using `.iloc`. Note that, since this indexer uses the implicit index, the final index is excluded.

In [77]:
# Numerical index
series_a = pd.Series([1, 2, 3], index=[1, 2, 3])
series_b = pd.Series([4, 5, 6], index=[1, 2, 3])
dataframe_a = pd.DataFrame({"col_a": series_a, "col_b": series_b})
print(dataframe_a)

print("\nThe element at implicit index .iloc[1, 0] is:", dataframe_a.iloc[1, 0])
print("\nThe elements at implicit indexes .iloc[[1, 2], 0] are:")
print(dataframe_a.iloc[[1, 2], 0])
print("\nThe slice at implicit indexes .iloc[1:3, 0] is:")
print(dataframe_a.iloc[1:3, 0])

   col_a  col_b
1      1      4
2      2      5
3      3      6

The element at implicit index .iloc[1, 0] is: 2

The elements at implicit indexes .iloc[[1, 2], 0] are:
2    2
3    3
Name: col_a, dtype: int64

The slice at implicit indexes .iloc[1:3, 0] is:
2    2
3    3
Name: col_a, dtype: int64


## Views and copies

Subdataframes return **views not copies** of the dataframes. Therefore, if a subdataframes is modified, the original dataframes changes as well.

In [78]:
# Numerical index
series_a = pd.Series([1, 2, 3], index=[1, 2, 3])
series_b = pd.Series([4, 5, 6], index=[1, 2, 3])
dataframe_a = pd.DataFrame({"col_a": series_a, "col_b": series_b})

print("Original dataframe:")
print(dataframe_a)

dataframe_b = dataframe_a.iloc[:, 0]
dataframe_b.iloc[0] = 0
dataframe_b.iloc[1] = 0
dataframe_b.iloc[2] = 0

print("\nModified dataframe:")
print(dataframe_a)

Original dataframe:
   col_a  col_b
1      1      4
2      2      5
3      3      6

Modified dataframe:
   col_a  col_b
1      0      4
2      0      5
3      0      6


**Create a copy** of a subdataframe using `.copy`

In [80]:
# Numerical index
series_a = pd.Series([1, 2, 3], index=[1, 2, 3])
series_b = pd.Series([4, 5, 6], index=[1, 2, 3])
dataframe_a = pd.DataFrame({"col_a": series_a, "col_b": series_b})

print("Original dataframe:")
print(dataframe_a)

dataframe_b = dataframe_a.iloc[:, 0].copy()
dataframe_b.iloc[0] = 0
dataframe_b.iloc[1] = 0
dataframe_b.iloc[2] = 0

print("\nModified dataframe:")
print(dataframe_b)


print("\nVerify series:")
print(dataframe_a)

Original dataframe:
   col_a  col_b
1      1      4
2      2      5
3      3      6

Modified dataframe:
1    0
2    0
3    0
Name: col_a, dtype: int64

Verify series:
   col_a  col_b
1      1      4
2      2      5
3      3      6
