# Python Pandas Tutorial - Indexes - How to Set, Reset, and Use Indexes

Indexing in Pandas :

Indexing in pandas means simply selecting particular rows and columns of data from a DataFrame. Indexing could mean selecting all the rows and some of the columns, some of the rows and all of the columns, or some of each of the rows and columns. Indexing can also be known as Subset Selection.

See the Pandas documentation from this links:
https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html

https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.Index.html


The dataset used in this Pandas Tutorial can be in this link - (i.e, 2019 data in csv file) - https://insights.stackoverflow.com/survey


In [17]:
#This represent DataFrame with multiple values (people)
#keys are the columns and values are the rows
people = { 
           "first": [ "Michael", 'Jane', 'John'], 
           "last": ["Alabi", 'Doe', 'Done'], 
           "email": ["MichaelAlabi@gmail.com", 'JaneDoe1@hotmail.com', 'JohnDone@yahoomail.com']
         }

In [18]:
import pandas as pd

In [19]:
df = pd.DataFrame(people)

In [20]:
df

Unnamed: 0,first,last,email
0,Michael,Alabi,MichaelAlabi@gmail.com
1,Jane,Doe,JaneDoe1@hotmail.com
2,John,Done,JohnDone@yahoomail.com


In [21]:
# The column without a name of the far left is referred to as "index" i.e. 0, 1, 2 - it is default index
# This is a range of number that is basically an 'integer identifiers' for the row.
# It will make sense to have different indentifier for each column which become the label for the row
# It is usually unique, but Pandas usually does not force index being unique and sometimes it could be

# For this tutorial, 'email' will be a good index for this data because it is unique value for most people. 

df['email']

0    MichaelAlabi@gmail.com
1      JaneDoe1@hotmail.com
2    JohnDone@yahoomail.com
Name: email, dtype: object

In [22]:
# if we would like to set the 'email' as the index for the DataFrame, we will use 'df.set_index'

df.set_index('email')  #it replace the index column

Unnamed: 0_level_0,first,last
email,Unnamed: 1_level_1,Unnamed: 2_level_1
MichaelAlabi@gmail.com,Michael,Alabi
JaneDoe1@hotmail.com,Jane,Doe
JohnDone@yahoomail.com,John,Done


In [23]:
# However, if we print the data frame again, it will return the previous DataFrame because pandas don't do lots of this changes
 
df

Unnamed: 0,first,last,email
0,Michael,Alabi,MichaelAlabi@gmail.com
1,Jane,Doe,JaneDoe1@hotmail.com
2,John,Done,JohnDone@yahoomail.com


In [24]:
# However, if we print the data frame again, it will return the previous DataFrame because pandas don't do lots of this changes
# Unless It unless we instruct or specifically state it to do so
# We can specifically have the default index change to 'email', 
# Making 'email' a unique value by having the the email carry over to the future cell

df.set_index('email', inplace = True)
 

In [25]:
df

Unnamed: 0_level_0,first,last
email,Unnamed: 1_level_1,Unnamed: 2_level_1
MichaelAlabi@gmail.com,Michael,Alabi
JaneDoe1@hotmail.com,Jane,Doe
JohnDone@yahoomail.com,John,Done


In [26]:
# Now the email has become a unique value or identifier for the row instead of the 'default index'
# Now email is the index to identify each detail in each row

# To check the current index using 'df.index'
df.index

Index(['MichaelAlabi@gmail.com', 'JaneDoe1@hotmail.com',
       'JohnDone@yahoomail.com'],
      dtype='object', name='email')

In [28]:
# Why the email as index?
# Email as the index gives us a unique identifier for the row

# Before in the last Tutorial, we access each row using the index as lable e.g df.loc[0] 
# But now we can use email as unique value to access the row
# This provides the details of the person with the specific email 'MichaelAlabi@gmail.com' in that row

df.loc['MichaelAlabi@gmail.com']    

first    Michael
last       Alabi
Name: MichaelAlabi@gmail.com, dtype: object

In [29]:
# We can access specific rows and passing the value for specific column as well,
df.loc['MichaelAlabi@gmail.com', 'last']

'Alabi'

In [33]:
# If we want to access the previous index using the previous lable, it will give us error that is "df.loc[0]"

#You can try it out to see the error.


In [32]:
# But if we use the "iloc", it will still return the previous index with the previous label

df.iloc[0]

first    Michael
last       Alabi
Name: MichaelAlabi@gmail.com, dtype: object

Reset the New Index to Previous 
#If you set the index and want to reset it using 'Index Reset Method'

