Pandas set_index() method is used to set one or more columns of a DataFrame as the index. This is useful when we need to modify or add new indices to our data as it enhances data retrieval, indexing and merging tasks. Setting the index is helpful for organizing the data more efficiently, especially when we have meaningful column values that can act as identifiers such as employee names, IDs or dates.

Lets see a basic example:

Here we are using a Employee Dataset which you can download it from here. Let’s first load the Employee Dataset to see how to use set_index().

In [1]:
import pandas as pd

data = pd.read_csv("employees.csv")
print("Employee Dataset:")
display(data.head(5))

Employee Dataset:


Unnamed: 0,First Name,Gender,Start Date,Last Login Time,Salary,Bonus %,Senior Management,Team
0,Douglas,Male,8/6/1993,12:42 PM,97308,6.945,True,Marketing
1,Thomas,Male,3/31/1996,6:53 AM,61933,4.17,True,
2,Maria,Female,4/23/1993,11:17 AM,130590,11.858,False,Finance
3,Jerry,Male,3/4/2005,1:00 PM,138705,9.34,True,Finance
4,Larry,Male,1/24/1998,4:47 PM,101004,1.389,True,Client Services


In [2]:
data.set_index("First Name", inplace=True)
print("\nEmployee Dataset with 'First Name' as Index:")
display(data.head(5))


Employee Dataset with 'First Name' as Index:


Unnamed: 0_level_0,Gender,Start Date,Last Login Time,Salary,Bonus %,Senior Management,Team
First Name,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1
Douglas,Male,8/6/1993,12:42 PM,97308,6.945,True,Marketing
Thomas,Male,3/31/1996,6:53 AM,61933,4.17,True,
Maria,Female,4/23/1993,11:17 AM,130590,11.858,False,Finance
Jerry,Male,3/4/2005,1:00 PM,138705,9.34,True,Finance
Larry,Male,1/24/1998,4:47 PM,101004,1.389,True,Client Services


# Pandas `DataFrame.set_index()`

**Syntax:**  
`DataFrame.set_index(keys, drop=True, append=False, inplace=False, verify_integrity=False)`

---

## Parameters  

| Parameter          | Description                                                                 |
|--------------------|-----------------------------------------------------------------------------|
| `keys`             | A single column name or a list of column names to set as the index.         |
| `drop`             | Boolean (default: `True`). If `True`, the specified column is removed from the DataFrame; if `False`, it is retained. |
| `append`           | Boolean (default: `False`). If `True`, the column is added to the existing index, creating a multi-level index. |
| `inplace`          | Boolean (default: `False`). If `True`, modifies the original DataFrame without returning a new one. |
| `verify_integrity` | Boolean (default: `False`). If `True`, checks for duplicate index values.   |

---

## Return  

Returns a **new DataFrame** with the specified index, unless `inplace=True`, which modifies the original DataFrame directly.

---



## 1. Setting Multiple Columns as Index (MultiIndex)
In this example, we set both First Name and Gender as the index columns using the set_index() method with the append and drop parameters. This is useful when we want to organize data by multiple columns.

In [4]:
import pandas as pd
data = pd.read_csv("employees.csv")

data.set_index(["First Name", "Gender"], inplace=True, append=True, drop=False)
data.head()

Unnamed: 0_level_0,Unnamed: 1_level_0,Unnamed: 2_level_0,First Name,Gender,Start Date,Last Login Time,Salary,Bonus %,Senior Management,Team
Unnamed: 0_level_1,First Name,Gender,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1
0,Douglas,Male,Douglas,Male,8/6/1993,12:42 PM,97308,6.945,True,Marketing
1,Thomas,Male,Thomas,Male,3/31/1996,6:53 AM,61933,4.17,True,
2,Maria,Female,Maria,Female,4/23/1993,11:17 AM,130590,11.858,False,Finance
3,Jerry,Male,Jerry,Male,3/4/2005,1:00 PM,138705,9.34,True,Finance
4,Larry,Male,Larry,Male,1/24/1998,4:47 PM,101004,1.389,True,Client Services


## 2. Setting a Float Column as Index
In some cases, we may want to use numeric or float columns as the index which is useful for datasets with scores or other numeric data that should act as unique identifiers. Here, we set the Agg_Marks (a float column) as the index for a DataFrame containing student data.

In [5]:
import pandas as pd

students = [['jack', 34, 'Sydeny', 'Australia', 85.96],
            ['Riti', 30, 'Delhi', 'India', 95.20],
            ['Vansh', 31, 'Delhi', 'India', 85.25],
            ['Nanyu', 32, 'Tokyo', 'Japan', 74.21],
            ['Maychan', 16, 'New York', 'US', 99.63],
            ['Mike', 17, 'Las Vegas', 'US', 47.28]]

df = pd.DataFrame(students, columns=['Name', 'Age', 'City', 'Country', 'Agg_Marks'])

df.set_index('Agg_Marks', inplace=True)
display(df)

Unnamed: 0_level_0,Name,Age,City,Country
Agg_Marks,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
85.96,jack,34,Sydeny,Australia
95.2,Riti,30,Delhi,India
85.25,Vansh,31,Delhi,India
74.21,Nanyu,32,Tokyo,Japan
99.63,Maychan,16,New York,US
47.28,Mike,17,Las Vegas,US


## 3. Setting Index of Specific Column (with drop=False)
By default, set_index() removes the column used as the index. However, if we want to keep the column after it’s set as the index, we can use the drop=False parameter.

In [6]:
import pandas as pd

data = pd.read_csv("employees.csv")

data.set_index("First Name", drop=False, inplace=True)

print(data.head())

           First Name  Gender Start Date Last Login Time  Salary  Bonus %  \
First Name                                                                  
Douglas       Douglas    Male   8/6/1993        12:42 PM   97308    6.945   
Thomas         Thomas    Male  3/31/1996         6:53 AM   61933    4.170   
Maria           Maria  Female  4/23/1993        11:17 AM  130590   11.858   
Jerry           Jerry    Male   3/4/2005         1:00 PM  138705    9.340   
Larry           Larry    Male  1/24/1998         4:47 PM  101004    1.389   

           Senior Management             Team  
First Name                                     
Douglas                 True        Marketing  
Thomas                  True              NaN  
Maria                  False          Finance  
Jerry                   True          Finance  
Larry                   True  Client Services  


## 4. Setting Index Using inplace=True
When we want to modify the original DataFrame directly rather than creating a new DataFrame, we can use inplace=True.






In [7]:
import pandas as pd

data = {'Name': ['Geek1', 'Geek2', 'Geek3'],
        'Age': [25, 30, 35],
        'City': ['New York', 'San Francisco', 'Los Angeles']}

df = pd.DataFrame(data)

df.set_index('Name', inplace=True)
display(df)

Unnamed: 0_level_0,Age,City
Name,Unnamed: 1_level_1,Unnamed: 2_level_1
Geek1,25,New York
Geek2,30,San Francisco
Geek3,35,Los Angeles


## Pandas DataFrame.reset_index()


The reset_index() method in Pandas is used to manage and reset the index of a DataFrame. It is useful after performing operations that modify the index such as filtering, grouping or setting a custom index. By default reset_index() reverts to a clean, default integer-based index (0, 1, 2, ...) which makes the DataFrame easier to work with.

Now lets see a basic example:

In [8]:
import pandas as pd

data = {'Name': ['Alice', 'Bob', 'Max'], 'Age': [25, 30, 35]}
df = pd.DataFrame(data, index=['a', 'b', 'c'])

print("Original DataFrame:")
print(df)

df_reset = df.reset_index()

print("\nDataFrame after reset_index():")
print(df_reset)

Original DataFrame:
    Name  Age
a  Alice   25
b    Bob   30
c    Max   35

DataFrame after reset_index():
  index   Name  Age
0     a  Alice   25
1     b    Bob   30
2     c    Max   35


This code creates a DataFrame with 'Name' and 'Age' columns and a custom index ('a', 'b', 'c'). The reset_index() function moves these index labels into a new 'index' column and replaces them with default row numbers (0, 1, 2).

# Pandas `DataFrame.reset_index()`

**Syntax:**  
`DataFrame.reset_index(level=None, drop=False, inplace=False, col_level=0, col_fill='')`

---

## Parameters  

| Parameter   | Description                                                                 |
|-------------|-----------------------------------------------------------------------------|
| `level`     | (Optional) Specifies the index level to reset (useful for multi-level indices). Can be an integer, string, or a list of levels. |
| `drop`      | Boolean (default: `False`). If `True`, removes the old index instead of adding it as a column. |
| `inplace`   | Boolean (default: `False`). If `True`, modifies the original DataFrame in place; otherwise returns a new DataFrame. |
| `col_level` | Integer (default: `0`). Used to select the level of the column to insert the index labels. |
| `col_fill`  | String (default: `''`). If the DataFrame has multiple column levels, this determines how missing levels are filled in the column headers. |

---

## Returns  

The `reset_index()` method returns a **new DataFrame** with the index reset, unless `inplace=True` is specified.  
In that case, the original DataFrame is modified directly without creating a new one.

---

## Example 1: Resetting Index of a Pandas DataFrame
In this example, we will set the "First Name" column as the index of the DataFrame and then reset the index using the reset_index() method. This process will move the custom index back to a regular column and restore the default integer-based index.

In [9]:
import pandas as pd

data = pd.DataFrame({
    'First Name': ['John', 'Jane', 'Emily', 'Michael', 'Sara'],
    'Last Name': ['Doe', 'Smith', 'Jones', 'Brown', 'Davis'],
    'Age': [28, 34, 22, 45, 29],
    'Department': ['HR', 'Finance', 'IT', 'Marketing', 'Sales'],
    'Salary': [55000, 68000, 48000, 75000, 60000]
})

print("Original Employee Dataset:")
display(data.head())

data.set_index("First Name", inplace=True)
print("\nAfter Setting 'First Name' as Index:")
display(data.head())

data.reset_index(inplace=True)
print("\nAfter Resetting Index:")
display(data.head())

Original Employee Dataset:


Unnamed: 0,First Name,Last Name,Age,Department,Salary
0,John,Doe,28,HR,55000
1,Jane,Smith,34,Finance,68000
2,Emily,Jones,22,IT,48000
3,Michael,Brown,45,Marketing,75000
4,Sara,Davis,29,Sales,60000



After Setting 'First Name' as Index:


Unnamed: 0_level_0,Last Name,Age,Department,Salary
First Name,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
John,Doe,28,HR,55000
Jane,Smith,34,Finance,68000
Emily,Jones,22,IT,48000
Michael,Brown,45,Marketing,75000
Sara,Davis,29,Sales,60000



After Resetting Index:


Unnamed: 0,First Name,Last Name,Age,Department,Salary
0,John,Doe,28,HR,55000
1,Jane,Smith,34,Finance,68000
2,Emily,Jones,22,IT,48000
3,Michael,Brown,45,Marketing,75000
4,Sara,Davis,29,Sales,60000


## example 2: Resetting the Index After Filtering Data
When filtering a DataFrame, the original row indices are retained which can lead to inconsistencies during further operations. Using reset_index() ensures that the index is clean and sequential.

In [10]:
import pandas as pd

data = {'ID': [101, 102, 103, 104, 105],
        'Name': ['Alice', 'Bob', 'Charlie', 'David', 'Emma'],
        'Dept': ['HR', 'IT', 'IT', 'Finance', 'HR'],
        'Salary': [50000, 60000, 65000, 70000, 55000]}

df = pd.DataFrame(data)
print("Original DataFrame:")
print(df)

df_it = df[df['Dept'] == 'IT']
print("\nFiltered DataFrame:")
print(df_it)

df_it = df_it.reset_index(drop=True)
print("\nFiltered DataFrame after reset_index():")
print(df_it)

Original DataFrame:
    ID     Name     Dept  Salary
0  101    Alice       HR   50000
1  102      Bob       IT   60000
2  103  Charlie       IT   65000
3  104    David  Finance   70000
4  105     Emma       HR   55000

Filtered DataFrame:
    ID     Name Dept  Salary
1  102      Bob   IT   60000
2  103  Charlie   IT   65000

Filtered DataFrame after reset_index():
    ID     Name Dept  Salary
0  102      Bob   IT   60000
1  103  Charlie   IT   65000


## Example 3: Resetting Index for Multi-Level DataFrames
If our DataFrame has a multi-level index, reset_index() can reset one or more of the index levels, turning them back into regular columns.

In [11]:
import pandas as pd

data = {'Region': ['North', 'North', 'South', 'South'],
        'City': ['New York', 'Los Angeles', 'Miami', 'Houston'],
        'Population': [8.4, 3.9, 0.4, 2.3]}
df = pd.DataFrame(data)

df.set_index(['Region', 'City'], inplace=True)

print("DataFrame with Multi-Level Index:")
print(df)

df_reset = df.reset_index(level='City')

print("\nDataFrame after resetting the 'City' level of the index:")
print(df_reset)

DataFrame with Multi-Level Index:
                    Population
Region City                   
North  New York            8.4
       Los Angeles         3.9
South  Miami               0.4
       Houston             2.3

DataFrame after resetting the 'City' level of the index:
               City  Population
Region                         
North      New York         8.4
North   Los Angeles         3.9
South         Miami         0.4
South       Houston         2.3


# Key Use Cases of `reset_index()`

- **Simplifying Data Manipulation**: After performing operations like filtering or sorting, we may want to reset the index to have a clean, sequential index.  
- **Handling Multi-Level Indexes**: When working with multi-level indexes, it can be used to remove specific index levels without affecting the rest of the structure.  
- **Restoring Default Index**: If you've set a custom index using the `set_index()` method, `reset_index()` restores the default integer-based index.  

By understanding and using `reset_index()`, we can efficiently manage and reorganize our DataFrame's index which makes our data easier to manipulate and analyze in various situations.
