ndexing and selecting data helps us to efficiently retrieve specific rows, columns or subsets of data from a DataFrame. Whether we're filtering rows based on conditions, extracting particular columns or accessing data by labels or positions, mastering these techniques helps to work effectively with large datasets. In this article, we’ll see various ways to index and select data in Pandas which shows us how to access the parts of our dataset.

### 1. Indexing Data using the [] Operator
The [] operator is the basic and frequently used method for indexing in Pandas. It allows us to select columns and filter rows based on conditions. This method can be used to select individual columns or multiple columns.

#### 1. Selecting a Single Column
To select a single column, we simply refer the column name inside square brackets.

In [1]:
import pandas as pd

data = pd.read_csv("nba.csv", index_col="Name")
print("Dataset")
display(data.head(5))

first = data["Age"]
print("\nSingle Column selected from Dataset")
display(first.head(5))

Dataset


Unnamed: 0_level_0,Team,Number,Position,Age,Height,Weight,College,Salary
Name,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1
Avery Bradley,Boston Celtics,0.0,PG,25.0,6-2,180.0,Texas,7730337.0
Jae Crowder,Boston Celtics,99.0,SF,25.0,6-6,235.0,Marquette,6796117.0
John Holland,Boston Celtics,30.0,SG,27.0,6-5,205.0,Boston University,
R.J. Hunter,Boston Celtics,28.0,SG,22.0,6-5,185.0,Georgia State,1148640.0
Jonas Jerebko,Boston Celtics,8.0,PF,29.0,6-10,231.0,,5000000.0



Single Column selected from Dataset


Name
Avery Bradley    25.0
Jae Crowder      25.0
John Holland     27.0
R.J. Hunter      22.0
Jonas Jerebko    29.0
Name: Age, dtype: float64

#### 2. Selecting Multiple Columns
To select multiple columns, pass a list of column names inside the [] operator:

In [2]:
first = data[["Age", "College", "Salary"]]
print("\nMultiple Columns selected from Dataset")
display(first.head(5))


Multiple Columns selected from Dataset


Unnamed: 0_level_0,Age,College,Salary
Name,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
Avery Bradley,25.0,Texas,7730337.0
Jae Crowder,25.0,Marquette,6796117.0
John Holland,27.0,Boston University,
R.J. Hunter,22.0,Georgia State,1148640.0
Jonas Jerebko,29.0,,5000000.0


### 2. Indexing with .loc[ ]
The.loc[] function is used for label-based indexing. It allows us to access rows and columns by their labels. Unlike the indexing operator, it can select subsets of rows and columns simultaneously which offers flexibility in data retrieval.

#### 1. Selecting a Single Row by Label
We can select a single row by its label:

In [3]:
row = data.loc["Avery Bradley"]
print(row)

Team        Boston Celtics
Number                 0.0
Position                PG
Age                   25.0
Height                 6-2
Weight               180.0
College              Texas
Salary           7730337.0
Name: Avery Bradley, dtype: object


#### 2. Selecting Multiple Rows by Label
To select multiple rows, pass a list of labels:

In [5]:
rows = data.loc[["Avery Bradley", "R.J. Hunter"]]
rows

Unnamed: 0_level_0,Team,Number,Position,Age,Height,Weight,College,Salary
Name,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1
Avery Bradley,Boston Celtics,0.0,PG,25.0,6-2,180.0,Texas,7730337.0
R.J. Hunter,Boston Celtics,28.0,SG,22.0,6-5,185.0,Georgia State,1148640.0


#### 3. Selecting Specific Rows and Columns
We can select specific rows and columns by providing lists of row labels and column names:

Dataframe.loc[["row1", "row2"], ["column1", "column2", "column3"]]

In [6]:
selection = data.loc[["Avery Bradley", "R.J. Hunter"], ["Team", "Number", "Position"]]
print(selection)

                         Team  Number Position
Name                                          
Avery Bradley  Boston Celtics     0.0       PG
R.J. Hunter    Boston Celtics    28.0       SG


#### 4. Selecting All Rows and Specific Columns
We can select all rows and specific columns by using a colon [:] to indicate all rows followed by the list of column names:

Dataframe.loc[:, ["column1", "column2", "column3"]]

In [8]:
all_rows_specific_columns = data.loc[:, ["Team", "Position", "Salary"]]
all_rows_specific_columns

Unnamed: 0_level_0,Team,Position,Salary
Name,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
Avery Bradley,Boston Celtics,PG,7730337.0
Jae Crowder,Boston Celtics,SF,6796117.0
John Holland,Boston Celtics,SG,
R.J. Hunter,Boston Celtics,SG,1148640.0
Jonas Jerebko,Boston Celtics,PF,5000000.0
...,...,...,...
Shelvin Mack,Utah Jazz,PG,2433333.0
Raul Neto,Utah Jazz,PG,900000.0
Tibor Pleiss,Utah Jazz,C,2900000.0
Jeff Withey,Utah Jazz,C,947276.0


### 3. Indexing with .iloc[ ]
The .iloc[] function is used for position-based indexing. It allows us to access rows and columns by their integer positions. It is similar to .loc[] but only accepts integer-based indices to specify rows and columns.

#### 1. Selecting a Single Row by Position
To select a single row using .iloc[] provide the integer position of the row:

In [9]:
row = data.iloc[3]
print(row)

Team        Boston Celtics
Number                28.0
Position                SG
Age                   22.0
Height                 6-5
Weight               185.0
College      Georgia State
Salary           1148640.0
Name: R.J. Hunter, dtype: object


#### 2. Selecting Multiple Rows by Position
We can select multiple rows by passing a list of integer positions:

In [11]:
rows = data.iloc[[3, 5, 7]]
rows

Unnamed: 0_level_0,Team,Number,Position,Age,Height,Weight,College,Salary
Name,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1
R.J. Hunter,Boston Celtics,28.0,SG,22.0,6-5,185.0,Georgia State,1148640.0
Amir Johnson,Boston Celtics,90.0,PF,29.0,6-9,240.0,,12000000.0
Kelly Olynyk,Boston Celtics,41.0,C,25.0,7-0,238.0,Gonzaga,2165160.0


#### 3. Selecting Specific Rows and Columns by Position
We can select specific rows and columns by providing integer positions for both rows and columns:

In [12]:
selection = data.iloc[[3, 4], [1, 2]]
print(selection)

               Number Position
Name                          
R.J. Hunter      28.0       SG
Jonas Jerebko     8.0       PF


#### 4. Selecting All Rows and Specific Columns by Position
To select all rows and specific columns, use a colon [:] for all rows and a list of column positions:

In [13]:
selection = data.iloc[:, [1, 2]]
print(selection)

               Number Position
Name                          
Avery Bradley     0.0       PG
Jae Crowder      99.0       SF
John Holland     30.0       SG
R.J. Hunter      28.0       SG
Jonas Jerebko     8.0       PF
...               ...      ...
Shelvin Mack      8.0       PG
Raul Neto        25.0       PG
Tibor Pleiss     21.0        C
Jeff Withey      24.0        C
NaN               NaN      NaN

[458 rows x 2 columns]


### 4. Other Useful Indexing Methods
Pandas also provides several other methods that we may find useful for indexing and manipulating DataFrames:

#### 1. .head(): Returns the first n rows of a DataFrame

In [14]:
print(data.head(5))

                         Team  Number Position   Age Height  Weight  \
Name                                                                  
Avery Bradley  Boston Celtics     0.0       PG  25.0    6-2   180.0   
Jae Crowder    Boston Celtics    99.0       SF  25.0    6-6   235.0   
John Holland   Boston Celtics    30.0       SG  27.0    6-5   205.0   
R.J. Hunter    Boston Celtics    28.0       SG  22.0    6-5   185.0   
Jonas Jerebko  Boston Celtics     8.0       PF  29.0   6-10   231.0   

                         College     Salary  
Name                                         
Avery Bradley              Texas  7730337.0  
Jae Crowder            Marquette  6796117.0  
John Holland   Boston University        NaN  
R.J. Hunter        Georgia State  1148640.0  
Jonas Jerebko                NaN  5000000.0  


#### 2. .tail(): Returns the last n rows of a DataFrame

In [15]:
print(data.tail(5))

                   Team  Number Position   Age Height  Weight College  \
Name                                                                    
Shelvin Mack  Utah Jazz     8.0       PG  26.0    6-3   203.0  Butler   
Raul Neto     Utah Jazz    25.0       PG  24.0    6-1   179.0     NaN   
Tibor Pleiss  Utah Jazz    21.0        C  26.0    7-3   256.0     NaN   
Jeff Withey   Utah Jazz    24.0        C  26.0    7-0   231.0  Kansas   
NaN                 NaN     NaN      NaN   NaN    NaN     NaN     NaN   

                 Salary  
Name                     
Shelvin Mack  2433333.0  
Raul Neto      900000.0  
Tibor Pleiss  2900000.0  
Jeff Withey    947276.0  
NaN                 NaN  


#### 3. .at[]: Access a single value for a row/column label pair

In [16]:
value = data.at["Avery Bradley", "Age"]
print(value)

25.0


#### 4. .query(): Query the DataFrame using a boolean expression

In [17]:
result = data.query("Age > 25 and College == 'Duke'")
print(result)

                                    Team  Number Position   Age Height  \
Name                                                                     
Lance Thomas             New York Knicks    42.0       SF  28.0    6-8   
Elton Brand           Philadelphia 76ers    42.0       PF  37.0    6-9   
JJ Redick           Los Angeles Clippers     4.0       SG  31.0    6-4   
Mike Dunleavy              Chicago Bulls    34.0       SG  35.0    6-9   
Dahntay Jones        Cleveland Cavaliers    30.0       SG  35.0    6-6   
Miles Plumlee            Milwaukee Bucks    18.0        C  27.0   6-11   
Luol Deng                     Miami Heat     9.0       SF  31.0    6-9   
Josh McRoberts                Miami Heat     4.0       PF  29.0   6-10   
Kyle Singler       Oklahoma City Thunder     5.0       SF  28.0    6-8   
Gerald Henderson  Portland Trail Blazers     9.0       SG  28.0    6-5   
Mason Plumlee     Portland Trail Blazers    24.0        C  26.0   6-11   

                  Weight College     

| Function             | Description                                                                 |
|-----------------------|-----------------------------------------------------------------------------|
| `DataFrame.iat[]`     | Access a single value for a row/column pair by integer position.             |
| `DataFrame.pop()`     | Return item and drop from DataFrame.                                        |
| `DataFrame.xs()`      | Return a cross-section (row(s) or column(s)) from the DataFrame.            |
| `DataFrame.get()`     | Get item from object for given key (e.g., DataFrame column).                 |
| `DataFrame.isin()`    | Return a boolean DataFrame showing whether each element is contained in values. |
| `DataFrame.where()`   | Return an object of the same shape with entries from self where cond is True otherwise from other. |
| `DataFrame.mask()`    | Return an object of the same shape with entries from self where cond is False otherwise from other. |
| `DataFrame.insert()`  | Insert a column into DataFrame at a specified location.                      |
