# Indexes: How to Set, Reset and Use Indexes

In [1]:
import pandas as pd

In [2]:
people = {
    "first": ["Phil", "Jane", "Rob"],
    "last": ["Lembo", "Doe", "Roe"],
    "email": ["phil.lembo@gmail.com", "janedoe@email.com", "robroe@email.com"]
}

In [3]:
df = pd.DataFrame(people)

In [4]:
df

Unnamed: 0,first,last,email
0,Phil,Lembo,phil.lembo@gmail.com
1,Jane,Doe,janedoe@email.com
2,Rob,Roe,robroe@email.com


In [5]:
df['email']

0    phil.lembo@gmail.com
1       janedoe@email.com
2        robroe@email.com
Name: email, dtype: object

In [6]:
df.set_index('email')

Unnamed: 0_level_0,first,last
email,Unnamed: 1_level_1,Unnamed: 2_level_1
phil.lembo@gmail.com,Phil,Lembo
janedoe@email.com,Jane,Doe
robroe@email.com,Rob,Roe


By default, pandas won't change the original DataFrame.

In [7]:
df

Unnamed: 0,first,last,email
0,Phil,Lembo,phil.lembo@gmail.com
1,Jane,Doe,janedoe@email.com
2,Rob,Roe,robroe@email.com


To change the index in place, you need to use the "inplace=True" flag.

In [8]:
df.set_index('email', inplace=True)

In [9]:
df

Unnamed: 0_level_0,first,last
email,Unnamed: 1_level_1,Unnamed: 2_level_1
phil.lembo@gmail.com,Phil,Lembo
janedoe@email.com,Jane,Doe
robroe@email.com,Rob,Roe


In [10]:
df.index

Index(['phil.lembo@gmail.com', 'janedoe@email.com', 'robroe@email.com'], dtype='object', name='email')

In [11]:
df.loc['phil.lembo@gmail.com']

first     Phil
last     Lembo
Name: phil.lembo@gmail.com, dtype: object

In [12]:
df.loc['phil.lembo@gmail.com', 'last']

'Lembo'

Note, we no longer have those integers as our index.

In [13]:
df.loc[0]

TypeError: cannot do label indexing on <class 'pandas.core.indexes.base.Index'> with these indexers [0] of <class 'int'>

Instead, we now need to employ iloc to use integers.

In [None]:
df.iloc[0]

To reset, use the reset_index method.

In [None]:
df.reset_index(inplace=True)

In [None]:
df

Now turn to survey data.

In [None]:
res_df = pd.read_csv('data/survey_results_public.csv')
schema_df = pd.read_csv('data/survey_results_schema.csv')

In [None]:
pd.set_option('display.max_columns', 85)
pd.set_option('display.max_rows', 85)

In [None]:
df

Set index when loading data.

In [None]:
res_df = pd.read_csv('data/survey_results_public.csv', index_col='Respondent')

In [None]:
res_df

To retrieve respondent number 1.

In [None]:
res_df.loc[1]

In [None]:
schema_df

What if I want to be able to call up a schema definition without having to scroll through frame? Set "Column" as the index!

In [None]:
schema_df = pd.read_csv('data/survey_results_schema.csv', index_col='Column')

In [None]:
schema_df

In [None]:
schema_df.loc['Hobbyist']

In [None]:
schema_df.loc['MgrIdiot']

By default, pandas truncates its response. This can be changed, but you can also retrieve the full text by specifying both the index _and_ column names (in this case "QuestionText").

In [None]:
schema_df.loc['MgrIdiot', 'QuestionText']

We can sort to make life easier!

In [None]:
schema_df.sort_index()

To reverse order, use "ascending" flag.

In [None]:
schema_df.sort_index(ascending=False)

To make change persistent, use "inplace=True" flag.

In [None]:
schema_df.sort_index(inplace=True)
schema_df