# Indexing

In Pandas, an index serves as a unique identifier for each row in a DataFrame or for each element in a Series. It provides a way to access, modify, and manipulate data efficiently.

With Indexing can select all the rows and some of the columns, some of the rows and all of the columns, or some of each of the rows and columns.

Understanding how indexes work in Pandas is fundamental to working effectively with data.

In [1]:
import pandas as pd

In [7]:
df = pd.read_csv('https://raw.githubusercontent.com/datasciencedojo/datasets/master/titanic.csv')

In [8]:
df.head()

Unnamed: 0,PassengerId,Survived,Pclass,Name,Sex,Age,SibSp,Parch,Ticket,Fare,Cabin,Embarked
0,1,0,3,"Braund, Mr. Owen Harris",male,22.0,1,0,A/5 21171,7.25,,S
1,2,1,1,"Cumings, Mrs. John Bradley (Florence Briggs Th...",female,38.0,1,0,PC 17599,71.2833,C85,C
2,3,1,3,"Heikkinen, Miss. Laina",female,26.0,0,0,STON/O2. 3101282,7.925,,S
3,4,1,1,"Futrelle, Mrs. Jacques Heath (Lily May Peel)",female,35.0,1,0,113803,53.1,C123,S
4,5,0,3,"Allen, Mr. William Henry",male,35.0,0,0,373450,8.05,,S


By default, when you create a DataFrame, Pandas assigns an implicit integer-based index starting from 0 and increasing sequentially. This index is often referred to as the "row number" or "position" index.

### Explicit Index

In [9]:
df_name = df.set_index('Name') 

In [11]:
df_name.head()

Unnamed: 0_level_0,PassengerId,Survived,Pclass,Sex,Age,SibSp,Parch,Ticket,Fare,Cabin,Embarked
Name,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1
"Braund, Mr. Owen Harris",1,0,3,male,22.0,1,0,A/5 21171,7.25,,S
"Cumings, Mrs. John Bradley (Florence Briggs Thayer)",2,1,1,female,38.0,1,0,PC 17599,71.2833,C85,C
"Heikkinen, Miss. Laina",3,1,3,female,26.0,0,0,STON/O2. 3101282,7.925,,S
"Futrelle, Mrs. Jacques Heath (Lily May Peel)",4,1,1,female,35.0,1,0,113803,53.1,C123,S
"Allen, Mr. William Henry",5,0,3,male,35.0,0,0,373450,8.05,,S


Here the 'Name' column is set as the index using 'set_index()' function that retruns new DataFrame.You can also set multiple columns as the index by passing a list of column names to set_index().

Additionally, you can use the inplace=True argument to modify the DataFrame in place without returning a new DataFrame.

#### Specify index while creating DataFrame

In [16]:
data= {
    'col1' : [1,2,3,4],
    'col3' : [5,6,7,8],
    'col2' : ['A','B','C','D'],
    'col4' : ['X','Y','Z','D']
}
df1 = pd.DataFrame(data, index = ['R1', 'R2', 'R3', 'R4'])

In [17]:
df1

Unnamed: 0,col1,col3,col2,col4
R1,1,5,A,X
R2,2,6,B,Y
R3,3,7,C,Z
R4,4,8,D,D


In [21]:
df1.loc['R2']

col1    2
col3    6
col2    B
col4    Y
Name: R2, dtype: object

In [22]:
df1.iloc[1]

col1    2
col3    6
col2    B
col4    Y
Name: R2, dtype: object

Explicit indexes allow for more meaningful and context-specific row labels, which can be useful for data analysis and manipulation.

**Implicit Index Vs Explicit Index**
- Implicit Index: Automatically assigned by Pandas as integer-based row numbers.
- Explicit Index: Manually specified by the user, providing meaningful labels for rows.

Regardless of whether the index is implicit or explicit, it serves the purpose of uniquely identifying rows within a DataFrame and facilitating various data operations like selection, slicing, and alignment.

#### Reset Index

In [26]:
df.set_index('Name', inplace=True) 

In [28]:
df.head()

Unnamed: 0_level_0,PassengerId,Survived,Pclass,Sex,Age,SibSp,Parch,Ticket,Fare,Cabin,Embarked
Name,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1
"Braund, Mr. Owen Harris",1,0,3,male,22.0,1,0,A/5 21171,7.25,,S
"Cumings, Mrs. John Bradley (Florence Briggs Thayer)",2,1,1,female,38.0,1,0,PC 17599,71.2833,C85,C
"Heikkinen, Miss. Laina",3,1,3,female,26.0,0,0,STON/O2. 3101282,7.925,,S
"Futrelle, Mrs. Jacques Heath (Lily May Peel)",4,1,1,female,35.0,1,0,113803,53.1,C123,S
"Allen, Mr. William Henry",5,0,3,male,35.0,0,0,373450,8.05,,S


In [31]:
df.reset_index(inplace=True)

In [33]:
df.head()

Unnamed: 0,Name,PassengerId,Survived,Pclass,Sex,Age,SibSp,Parch,Ticket,Fare,Cabin,Embarked
0,"Braund, Mr. Owen Harris",1,0,3,male,22.0,1,0,A/5 21171,7.25,,S
1,"Cumings, Mrs. John Bradley (Florence Briggs Th...",2,1,1,female,38.0,1,0,PC 17599,71.2833,C85,C
2,"Heikkinen, Miss. Laina",3,1,3,female,26.0,0,0,STON/O2. 3101282,7.925,,S
3,"Futrelle, Mrs. Jacques Heath (Lily May Peel)",4,1,1,female,35.0,1,0,113803,53.1,C123,S
4,"Allen, Mr. William Henry",5,0,3,male,35.0,0,0,373450,8.05,,S


The reset_index() function is used to reset the index of a DataFrame. By default, it resets the index to the default integer-based index starting from 0, and it moves the current index to a new column.

### Selection and Slicing

You can use index labels to select specific rows or slices of rows from the DataFrame.

In [35]:
df[0:10:2]

Unnamed: 0,Name,PassengerId,Survived,Pclass,Sex,Age,SibSp,Parch,Ticket,Fare,Cabin,Embarked
0,"Braund, Mr. Owen Harris",1,0,3,male,22.0,1,0,A/5 21171,7.25,,S
2,"Heikkinen, Miss. Laina",3,1,3,female,26.0,0,0,STON/O2. 3101282,7.925,,S
4,"Allen, Mr. William Henry",5,0,3,male,35.0,0,0,373450,8.05,,S
6,"McCarthy, Mr. Timothy J",7,0,1,male,54.0,0,0,17463,51.8625,E46,S
8,"Johnson, Mrs. Oscar W (Elisabeth Vilhelmina Berg)",9,1,3,female,27.0,0,2,347742,11.1333,,S


In [36]:
df.iloc[0 : 2] # start from Index 0 and go to 1

Unnamed: 0,Name,PassengerId,Survived,Pclass,Sex,Age,SibSp,Parch,Ticket,Fare,Cabin,Embarked
0,"Braund, Mr. Owen Harris",1,0,3,male,22.0,1,0,A/5 21171,7.25,,S
1,"Cumings, Mrs. John Bradley (Florence Briggs Th...",2,1,1,female,38.0,1,0,PC 17599,71.2833,C85,C


In [37]:
df.loc[0 : 2] # Index with 0,1 and 2

Unnamed: 0,Name,PassengerId,Survived,Pclass,Sex,Age,SibSp,Parch,Ticket,Fare,Cabin,Embarked
0,"Braund, Mr. Owen Harris",1,0,3,male,22.0,1,0,A/5 21171,7.25,,S
1,"Cumings, Mrs. John Bradley (Florence Briggs Th...",2,1,1,female,38.0,1,0,PC 17599,71.2833,C85,C
2,"Heikkinen, Miss. Laina",3,1,3,female,26.0,0,0,STON/O2. 3101282,7.925,,S


loc will go with named indexes and iloc will go with default index

In [39]:
df.iloc[0:2, 1:5] 

Unnamed: 0,PassengerId,Survived,Pclass,Sex
0,1,0,3,male
1,2,1,1,female


In [43]:
df.iloc[0:2, [0,3,4,5]] 

Unnamed: 0,Name,Pclass,Sex,Age
0,"Braund, Mr. Owen Harris",3,male,22.0
1,"Cumings, Mrs. John Bradley (Florence Briggs Th...",1,female,38.0


In [44]:
df.loc[0:2, ['Name', 'Age','Sex']] 

Unnamed: 0,Name,Age,Sex
0,"Braund, Mr. Owen Harris",22.0,male
1,"Cumings, Mrs. John Bradley (Florence Briggs Th...",38.0,female
2,"Heikkinen, Miss. Laina",26.0,female


In [45]:
df['Name'][2:10] # Get names from 2 to 10

2                               Heikkinen, Miss. Laina
3         Futrelle, Mrs. Jacques Heath (Lily May Peel)
4                             Allen, Mr. William Henry
5                                     Moran, Mr. James
6                              McCarthy, Mr. Timothy J
7                       Palsson, Master. Gosta Leonard
8    Johnson, Mrs. Oscar W (Elisabeth Vilhelmina Berg)
9                  Nasser, Mrs. Nicholas (Adele Achem)
Name: Name, dtype: object