## Data Selection and Indexing

#### 4.1 DataFrame Indexing and Slicing
- You can index and slice DataFrames using labels (via `loc`) or positional indexing (via `iloc`).



In [1]:
import pandas as pd

In [2]:
# Creating a DataFrame
data = pd.DataFrame({
    'Name': ['John', 'Anna', 'Peter', 'Linda'],
    'Age': [28, 24, 35, 32],
    'City': ['New York', 'Paris', 'Berlin', 'London']
})

# Indexing using loc (label-based)
print(data.loc[0])

# Indexing using iloc (position-based)
print(data.iloc[0])

# Slicing rows
print(data[1:3])

Name        John
Age           28
City    New York
Name: 0, dtype: object
Name        John
Age           28
City    New York
Name: 0, dtype: object
    Name  Age    City
1   Anna   24   Paris
2  Peter   35  Berlin


### 2. Boolean Indexing
Boolean indexing allows you to filter data based on conditions.

In [3]:
# Boolean indexing: Selecting rows where age is greater than 30

print(data[data['Age'] > 30])

    Name  Age    City
2  Peter   35  Berlin
3  Linda   32  London


### 3. Setting and Resetting Index
You can set one or more columns as the DataFrame's index, and reset the index back to default.

In [4]:
# Setting 'Name' as the index
data = data.set_index('Name')
print(data)

# Resetting the index back to default
data = data.reset_index()
print(data)

       Age      City
Name                
John    28  New York
Anna    24     Paris
Peter   35    Berlin
Linda   32    London
    Name  Age      City
0   John   28  New York
1   Anna   24     Paris
2  Peter   35    Berlin
3  Linda   32    London


### 4. Conditional Selections (Querying a DataFrame)
Use conditions to query a DataFrame based on specific criteria.

In [5]:
# Using query to filter rows where Age is greater than 30
print(data.query('Age > 30'))

    Name  Age    City
2  Peter   35  Berlin
3  Linda   32  London


In [6]:
# Creating a DataFrame with a MultiIndex
index = pd.MultiIndex.from_tuples([('2024', 'Q1'), ('2024', 'Q2'), ('2025', 'Q1'), ('2025', 'Q2')], names=['Year', 'Quarter'])
multi_index_df = pd.DataFrame({'Revenue': [200, 300, 400, 500]}, index=index)

# Accessing data in a MultiIndex DataFrame
print(multi_index_df)
print(multi_index_df.loc['2024'])

              Revenue
Year Quarter         
2024 Q1           200
     Q2           300
2025 Q1           400
     Q2           500
         Revenue
Quarter         
Q1           200
Q2           300


In [7]:
# Sorting by index
sorted_by_index = data.sort_index()
print(sorted_by_index)

# Sorting by column (e.g., Age)
sorted_by_age = data.sort_values(by='Age')
print(sorted_by_age)

    Name  Age      City
0   John   28  New York
1   Anna   24     Paris
2  Peter   35    Berlin
3  Linda   32    London
    Name  Age      City
1   Anna   24     Paris
0   John   28  New York
3  Linda   32    London
2  Peter   35    Berlin


In [8]:
# Selecting multiple columns
print(data[['Name', 'Age']])

    Name  Age
0   John   28
1   Anna   24
2  Peter   35
3  Linda   32


In [9]:
# Creating a DataFrame with missing values
data_with_nan = pd.DataFrame({
    'Name': ['John', 'Anna', 'Peter', 'Linda'],
    'Age': [28, None, 35, None],
    'City': ['New York', 'Paris', 'Berlin', 'London']
})

# Detecting missing values
print(data_with_nan.isna())

# Dropping rows with missing values
print(data_with_nan.dropna())

# Filling missing values
print(data_with_nan.fillna(0))

    Name    Age   City
0  False  False  False
1  False   True  False
2  False  False  False
3  False   True  False
    Name   Age      City
0   John  28.0  New York
2  Peter  35.0    Berlin
    Name   Age      City
0   John  28.0  New York
1   Anna   0.0     Paris
2  Peter  35.0    Berlin
3  Linda   0.0    London
