A MultiIndex in Pandas is a hierarchical indexing structure that allows us to represent and work with higher-dimensional data efficiently.

While a typical index refers to a single column, a MultiIndex contains multiple levels of indexes. Each column in a MultiIndex is linked to one another through a parent/relationship.

In [1]:
import pandas as pd

In [8]:
data = {
    "Continent": ["North America", "Europe", "Asia", "North America", "Asia", "Europe", "North America", "Asia", "Europe", "Asia"],
    "Country": ["United States", "Germany", "China", "Canada", "Japan", "France", "Mexico", "India", "United Kingdom", "Nepal"],
    "Population": [331002651, 83783942, 1439323776, 37742154, 126476461, 65273511, 128932753, 1380004385, 67886011, 29136808]
}

df = pd.DataFrame(data)
print("Before MultiIndexing: \n",df)

df.sort_values('Continent',inplace=True)
df.set_index(['Continent','Country'],inplace=True)

print("After MultiIndexing: \n",df)


Before MultiIndexing: 
        Continent         Country  Population
0  North America   United States   331002651
1         Europe         Germany    83783942
2           Asia           China  1439323776
3  North America          Canada    37742154
4           Asia           Japan   126476461
5         Europe          France    65273511
6  North America          Mexico   128932753
7           Asia           India  1380004385
8         Europe  United Kingdom    67886011
9           Asia           Nepal    29136808
After MultiIndexing: 
                               Population
Continent     Country                   
Asia          China           1439323776
              Japan            126476461
              India           1380004385
              Nepal             29136808
Europe        Germany           83783942
              France            65273511
              United Kingdom    67886011
North America United States    331002651
              Canada            37742154
       

Access Rows With MultiIndex


In [10]:
# access all entries under Asia
print(df.loc['Asia'])

# access Canada
print(df.loc[('North America','Canada')])

         Population
Country            
China    1439323776
Japan     126476461
India    1380004385
Nepal      29136808
Population    37742154
Name: (North America, Canada), dtype: int64
