# Indexing and Slicing in Pandas

This notebook explores **indexing and slicing** techniques in Pandas to select specific data from DataFrames and Series. You'll learn how to use label-based, integer-based, and condition-based methods to access and filter data efficiently.

## Core Concepts

- **Indexing**: Accessing specific rows, columns, or values in a DataFrame or Series using labels or positions.
- **Slicing**: Selecting a subset of data, such as a range of rows or columns, based on conditions or indices.
- **Key Techniques**:
  - Label-based selection (e.g., using column names or row labels).
  - Integer position-based selection (e.g., using row or column numbers).
  - Boolean indexing for filtering data based on conditions.
  - SQL-like queries for advanced filtering.

## Key Methods & Functions

Below are the essential methods for indexing and slicing in Pandas:

- **`.loc[]`**: Label-based selection (inclusive of start and end labels).
- **`.iloc[]`**: Integer position-based selection (zero-based, end-exclusive).
- **`[]`**: Basic column selection or row filtering with conditions.
- **`.at[]`**: Fast access to a single value by label.
- **`.iat[]`**: Fast access to a single value by integer position.
- **`.query()`**: Filter rows using SQL-like string expressions.
- **`.isin()`**: Check if values are in a specified list or array.
- **Boolean Indexing**: Use conditions (e.g., `df['age'] > 30`) to filter rows.

## Learning Objectives

- Understand the difference between `.loc[]` and `.iloc[]`.
- Select single or multiple columns from a DataFrame.
- Select rows by index, position, or condition.
- Combine row and column selections for precise data extraction.
- Apply Boolean masking to filter data based on multiple conditions.

### 1. Setting Up a Sample Dataset

In [15]:
import pandas as pd
import numpy as np

# Creating a sample DataFrame
data = {
    'name': ['Alice', 'Bob', 'Charlie', 'David', 'Eve'],
    'age': [25, 30, 35, 28, 22],
    'salary': [50000, 60000, 75000, 52000, 48000],
    'department': ['HR', 'IT', 'IT', 'Marketing', 'HR']
}
df = pd.DataFrame(data, index=['emp1', 'emp2', 'emp3', 'emp4', 'emp5'])
print("Sample DataFrame:")
print(df)

Sample DataFrame:
         name  age  salary department
emp1    Alice   25   50000         HR
emp2      Bob   30   60000         IT
emp3  Charlie   35   75000         IT
emp4    David   28   52000  Marketing
emp5      Eve   22   48000         HR


### 2. Selecting Columns with `[]`


In [16]:
# Selecting a single column (returns a Series)
name_col = df['name']
print("Single column (name):")
print(name_col)

# Selecting multiple columns (returns a DataFrame)
subset_df = df[['name', 'salary']]
print("\nMultiple columns (name, salary):")
print(subset_df)

Single column (name):
emp1      Alice
emp2        Bob
emp3    Charlie
emp4      David
emp5        Eve
Name: name, dtype: object

Multiple columns (name, salary):
         name  salary
emp1    Alice   50000
emp2      Bob   60000
emp3  Charlie   75000
emp4    David   52000
emp5      Eve   48000


### 3. Label-Based Selection with `.loc[]`


In [17]:
# Selecting a single row by label
row_emp3 = df.loc['emp3']
print("Row with index 'emp3':")
print(row_emp3)

# Selecting specific rows and columns
subset_loc = df.loc[['emp1', 'emp3'], ['name', 'age']]
print("\nRows 'emp1', 'emp3' and columns 'name', 'age':")
print(subset_loc)

# Selecting a range of rows (inclusive)
range_loc = df.loc['emp2':'emp4', 'name':'salary']
print("\nRange of rows and columns:")
print(range_loc)

Row with index 'emp3':
name          Charlie
age                35
salary          75000
department         IT
Name: emp3, dtype: object

Rows 'emp1', 'emp3' and columns 'name', 'age':
         name  age
emp1    Alice   25
emp3  Charlie   35

Range of rows and columns:
         name  age  salary
emp2      Bob   30   60000
emp3  Charlie   35   75000
emp4    David   28   52000


### 4. Integer-Based Selection with `.iloc[]`


In [18]:
# Selecting a single row by position
row_pos = df.iloc[2]
print("Row at position 2:")
print(row_pos)

# Selecting specific rows and columns by position
subset_iloc = df.iloc[[0, 2], [0, 1]]
print("\nRows 0, 2 and columns 0, 1:")
print(subset_iloc)

# Selecting a range of rows and columns (end-exclusive)
range_iloc = df.iloc[1:4, 0:3]
print("\nRange of rows and columns:")
print(range_iloc)

Row at position 2:
name          Charlie
age                35
salary          75000
department         IT
Name: emp3, dtype: object

Rows 0, 2 and columns 0, 1:
         name  age
emp1    Alice   25
emp3  Charlie   35

Range of rows and columns:
         name  age  salary
emp2      Bob   30   60000
emp3  Charlie   35   75000
emp4    David   28   52000


### 5. Single Value Selection with `.at[]` and `.iat[]`


In [19]:
# Accessing a single value by label
age_emp2 = df.at['emp2', 'age']
print("Age of emp2 (using .at[]):", age_emp2)

# Accessing a single value by position
salary_pos2 = df.iat[2, 2]
print("Salary at row 2, column 2 (using .iat[]):", salary_pos2)

Age of emp2 (using .at[]): 30
Salary at row 2, column 2 (using .iat[]): 75000


### 6. Boolean Indexing and Filtering


In [20]:
# Filtering rows where age > 25
age_filter = df[df['age'] > 25]
print("Employees with age > 25:")
print(age_filter)

# Combining multiple conditions
complex_filter = df[(df['age'] > 25) & (df['department'] == 'IT')]
print("\nEmployees with age > 25 and in IT department:")
print(complex_filter)

# Using .isin() to filter rows
hr_filter = df[df['department'].isin(['HR'])]
print("\nEmployees in HR department:")
print(hr_filter)

Employees with age > 25:
         name  age  salary department
emp2      Bob   30   60000         IT
emp3  Charlie   35   75000         IT
emp4    David   28   52000  Marketing

Employees with age > 25 and in IT department:
         name  age  salary department
emp2      Bob   30   60000         IT
emp3  Charlie   35   75000         IT

Employees in HR department:
       name  age  salary department
emp1  Alice   25   50000         HR
emp5    Eve   22   48000         HR


### 7. SQL-Like Filtering with `.query()`

In [21]:
# Filtering using query
query_result = df.query("age > 25 and department == 'IT'")
print("Employees with age > 25 and in IT (using .query()):")
print(query_result)

Employees with age > 25 and in IT (using .query()):
         name  age  salary department
emp2      Bob   30   60000         IT
emp3  Charlie   35   75000         IT


## Key Takeaways

- **`.loc[]` vs `.iloc[]`**: Use `.loc[]` for label-based access (inclusive) and `.iloc[]` for integer-based access (end-exclusive).
- **Column Selection**: Use `[]` for quick column selection or basic filtering.
- **Single Value Access**: `.at[]` and `.iat[]` are faster for accessing individual values.
- **Boolean Indexing**: Combine conditions with `&`, `|`, and `~` for powerful filtering.
- **`.query()`**: Offers a concise, SQL-like syntax for filtering rows.
- **Flexibility**: Combine row and column selections to extract exactly the data you need.