## Data Selection and Indexing
Selecting the right subset of data is essential for analysis
and data cleaning. Pandas provides powerful indexing tools
to access rows and columns efficiently.

In [1]:
import pandas as pd, numpy as np

In [2]:
# Sample dataset
df = pd.read_csv("5.1_data.csv")
print(df.head())
print()
print(df.tail())

        name  gender  age  salary     city date of joining
0       Amit    Male   28   40769  Kolkata      30-10-2021
1       Riya  Female   41   99735     Pune      09-02-2018
2       John    Male   36   96101     Pune      02-06-2019
3       Neha  Female   32   42433  Kolkata      04-03-2020
4  Siddharth    Male   29   45311     Pune      15-10-2022

      name  gender  age  salary     city date of joining
15    Emma  Female   29   58942    Delhi      13-10-2020
16    Liam    Male   45   97001  Kolkata      16-09-2018
17  Olivia  Female   24   58431   Mumbai      06-04-2022
18     Raj    Male   43   42747  Chennai      14-05-2022
19  Simran  Female   42   98319    Delhi      20-01-2022


## Column Selection

### Single Column

In [3]:
df["name"].head()

0         Amit
1         Riya
2         John
3         Neha
4    Siddharth
Name: name, dtype: object

### Multiple Columns
Enter list of columns for multiple columns selection

In [4]:
df[["name", "age", "salary"]].head() 

Unnamed: 0,name,age,salary
0,Amit,28,40769
1,Riya,41,99735
2,John,36,96101
3,Neha,32,42433
4,Siddharth,29,45311


**Note**
- `df['salary']` returns a Series,
- `df[['salary']]` returns a DataFrame.

## Row Selection with iloc (Position-based)
The `.iloc` property is used for integer-position based indexing, similar to how you index a standard Python list.

### Select the first row

In [5]:
df.iloc[0]

name                     Amit
gender                   Male
age                        28
salary                  40769
city                  Kolkata
date of joining    30-10-2021
Name: 0, dtype: object

### Select the first two rows and first two columns

In [6]:
df.iloc[0:2, 0:2]

Unnamed: 0,name,gender
0,Amit,Male
1,Riya,Female


## Row Selection with loc (Label-based)
The `.loc` property is used when you want to select data using the index labels (row names) and column names.

### Select a single row by label

In [7]:
df.loc[0] 

name                     Amit
gender                   Male
age                        28
salary                  40769
city                  Kolkata
date of joining    30-10-2021
Name: 0, dtype: object

### Select a specific value

In [8]:
df.loc[2, 'name'] # Returns 'John'

'John'

### Slice rows and specific columns

In [9]:
df.loc[0:2, ['name', 'city']]

Unnamed: 0,name,city
0,Amit,Kolkata
1,Riya,Pune
2,John,Pune


## Fast Access: `.at` and `.iat`
These are optimized for single element access

### Fast label-based access

In [10]:
df.at[0, "name"]

'Amit'

### Fast position-based access

In [11]:
df.iat[2, 0]

'John'

## Summary
- iloc → position-based selection
- loc → label-based selection
- Fast Access: `.at` and `.iat`