# Advanced Indexing in Pandas

## Overview

**Advanced Indexing** = Powerful techniques for complex data selection and manipulation

### What We'll Learn

**1. MultiIndex (Hierarchical Indexing)** üéØ
- Multiple levels of row/column indices
- Perfect for multidimensional data
- Example: (Year, Quarter, Month) ‚Üí Sales

**2. Stack & Unstack** üîÑ
- Convert between wide and long format
- Pivot dimensions
- Reshape data

**3. Pivot Tables** üìä
- Excel-like pivot tables
- Aggregate and summarize
- Multi-dimensional analysis

**4. Advanced Selection** üé™
- Cross-sections (xs)
- Index slicing
- Boolean indexing
- Query method

**5. Index Operations** ‚öôÔ∏è
- Swap levels
- Reorder levels
- Sort by index
- Reset and set index

### Why Advanced Indexing?

```
‚úÖ Handle multidimensional data elegantly
‚úÖ Efficient data selection and filtering
‚úÖ Reshape data for analysis
‚úÖ Group and aggregate complex structures
‚úÖ Time series with multiple dimensions
```

### Real-World Use Cases

**Business**
- Sales by (Region, Product, Date)
- Financial data (Company, Metric, Period)
- Multi-location store analytics

**Research**
- Experimental data (Subject, Trial, Condition)
- Medical records (Patient, Visit, Measurement)
- Survey responses (Respondent, Question, Wave)

**Time Series**
- Stock prices (Ticker, Date)
- Weather data (Station, Date, Metric)
- Sensor readings (Device, Timestamp, Sensor)

### What You'll Master

1. ‚úÖ Creating and using MultiIndex
2. ‚úÖ Selecting data from hierarchical structures
3. ‚úÖ Stacking and unstacking (reshaping)
4. ‚úÖ Creating pivot tables
5. ‚úÖ Cross-sections and advanced slicing
6. ‚úÖ Index manipulation (swap, sort, reorder)
7. ‚úÖ Boolean indexing and query()
8. ‚úÖ Real-world applications
9. ‚úÖ Performance optimization
10. ‚úÖ Best practices

In [1]:
import pandas as pd
import numpy as np

# Display settings
pd.set_option('display.max_rows', 20)
pd.set_option('display.max_columns', None)
pd.set_option('display.width', None)
pd.set_option('display.precision', 2)

print("‚úÖ Libraries imported")
print(f"Pandas version: {pd.__version__}")

‚úÖ Libraries imported
Pandas version: 2.2.3


## 1. MultiIndex (Hierarchical Indexing)

### What is MultiIndex?

**MultiIndex** = Multiple levels of indices (like nested dictionaries)

```
Simple Index:          MultiIndex:
Index  Value           Region  City    Value
  0     100            East    NYC      100
  1     200                    Boston   150
  2     150            West    LA       200
  3     300                    Seattle  180
```

### Creating MultiIndex

**Method 1: From tuples**
```python
index = pd.MultiIndex.from_tuples([
    ('USA', 'New York'),
    ('USA', 'California'),
    ('UK', 'London')
], names=['Country', 'City'])
```

**Method 2: From product (Cartesian product)**
```python
index = pd.MultiIndex.from_product([
    ['2023', '2024'],           # Years
    ['Q1', 'Q2', 'Q3', 'Q4']    # Quarters
], names=['Year', 'Quarter'])
```

**Method 3: From arrays**
```python
index = pd.MultiIndex.from_arrays([
    ['A', 'A', 'B', 'B'],  # Level 0
    [1, 2, 1, 2]           # Level 1
], names=['Letter', 'Number'])
```

**Method 4: set_index with multiple columns**
```python
df.set_index(['col1', 'col2'])
```

### MultiIndex Structure

```python
# Two-level index
Level 0: ['A', 'A', 'B', 'B']
Level 1: ['X', 'Y', 'X', 'Y']

Result:
A  X    value1
   Y    value2
B  X    value3
   Y    value4
```

### Benefits

```
‚úÖ Represent N-dimensional data in 2D
‚úÖ Efficient storage and access
‚úÖ Natural hierarchical grouping
‚úÖ Powerful selection and slicing
‚úÖ Easy aggregation across levels
```

### Common Operations

```python
# Access properties
df.index.names            # Names of levels
df.index.levels           # Values at each level
df.index.nlevels          # Number of levels

# Selection
df.loc['A']              # Select level 0
df.loc[('A', 'X')]       # Select specific combination
df.xs('X', level=1)      # Cross-section
```

In [2]:
print("=== MULTIINDEX BASICS ===\n")

# Example 1: Create from tuples
print("Example 1: MultiIndex from tuples\n")
index = pd.MultiIndex.from_tuples([
    ('USA', 'New York'),
    ('USA', 'Los Angeles'),
    ('UK', 'London'),
    ('UK', 'Manchester')
], names=['Country', 'City'])

sales = pd.Series([100, 150, 80, 60], index=index, name='Sales')
print(sales)
print(f"\nIndex names: {sales.index.names}")
print(f"Number of levels: {sales.index.nlevels}")
print()

# Example 2: Create from product
print("="*70)
print("Example 2: MultiIndex from product (Cartesian)\n")
index = pd.MultiIndex.from_product([
    ['2023', '2024'],
    ['Q1', 'Q2', 'Q3', 'Q4']
], names=['Year', 'Quarter'])

np.random.seed(42)
revenue = pd.Series(np.random.randint(100, 200, 8), index=index, name='Revenue')
print(revenue)
print()

# Example 3: Create from arrays
print("="*70)
print("Example 3: MultiIndex from arrays\n")
regions = ['East', 'East', 'West', 'West']
cities = ['NYC', 'Boston', 'LA', 'Seattle']
index = pd.MultiIndex.from_arrays([regions, cities], names=['Region', 'City'])

data = pd.DataFrame({
    'Population': [8.3, 0.7, 3.9, 0.7],
    'Area': [302, 48, 469, 84]
}, index=index)
print(data)
print()

# Example 4: Create with set_index
print("="*70)
print("Example 4: Create from DataFrame columns\n")
df = pd.DataFrame({
    'Country': ['USA', 'USA', 'UK', 'UK'],
    'State': ['NY', 'CA', 'England', 'Scotland'],
    'City': ['NYC', 'LA', 'London', 'Edinburgh'],
    'Sales': [100, 150, 80, 60]
})

print("Original DataFrame:")
print(df)

df_multi = df.set_index(['Country', 'State', 'City'])
print("\nWith MultiIndex:")
print(df_multi)
print()

# Example 5: DataFrame with MultiIndex columns
print("="*70)
print("Example 5: MultiIndex on columns\n")
columns = pd.MultiIndex.from_product([
    ['Sales', 'Profit'],
    ['Q1', 'Q2']
], names=['Metric', 'Quarter'])

df_cols = pd.DataFrame(
    np.random.randint(50, 200, (3, 4)),
    index=['Product A', 'Product B', 'Product C'],
    columns=columns
)
print(df_cols)
print()

# Example 6: Both rows and columns MultiIndex
print("="*70)
print("Example 6: MultiIndex on both rows and columns\n")
row_index = pd.MultiIndex.from_product([
    ['North', 'South'],
    ['Product A', 'Product B']
], names=['Region', 'Product'])

col_index = pd.MultiIndex.from_product([
    ['2023', '2024'],
    ['Q1', 'Q2']
], names=['Year', 'Quarter'])

df_full = pd.DataFrame(
    np.random.randint(100, 200, (4, 4)),
    index=row_index,
    columns=col_index
)
print(df_full)

=== MULTIINDEX BASICS ===

Example 1: MultiIndex from tuples

Country  City       
USA      New York       100
         Los Angeles    150
UK       London          80
         Manchester      60
Name: Sales, dtype: int64

Index names: ['Country', 'City']
Number of levels: 2

Example 2: MultiIndex from product (Cartesian)

Year  Quarter
2023  Q1         151
      Q2         192
      Q3         114
      Q4         171
2024  Q1         160
      Q2         120
      Q3         182
      Q4         186
Name: Revenue, dtype: int64

Example 3: MultiIndex from arrays

                Population  Area
Region City                     
East   NYC             8.3   302
       Boston          0.7    48
West   LA              3.9   469
       Seattle         0.7    84

Example 4: Create from DataFrame columns

Original DataFrame:
  Country     State       City  Sales
0     USA        NY        NYC    100
1     USA        CA         LA    150
2      UK   England     London     80
3      UK  Scotla

## 2. Selecting Data with MultiIndex

### Basic Selection

```python
# Level 0 selection
df.loc['A']              # All rows where level 0 = 'A'

# Specific combination
df.loc[('A', 'X')]       # level 0='A' AND level 1='X'

# Slice notation
df.loc[('A', 'X'):('B', 'Y')]  # Range
```

### Cross-Section (xs)

```python
# Select from specific level
df.xs('X', level=1)           # All rows where level 1 = 'X'
df.xs('A', level='Letter')    # By level name

# Multiple values
df.xs(('A', 'X'))             # Specific combination
```

### IndexSlice

```python
idx = pd.IndexSlice

# Slice on specific level
df.loc[idx['A', :], :]        # All level 1 where level 0='A'
df.loc[idx[:, 'X'], :]        # All level 0 where level 1='X'
df.loc[idx['A':'B', 'X'], :]  # Range on level 0, specific level 1
```

### Boolean Indexing

```python
# Filter by index level
df[df.index.get_level_values(0) == 'A']
df[df.index.get_level_values('City') == 'NYC']
```

### Selection Patterns

**Pattern 1: Single level**
```python
df.loc['USA']              # All cities in USA
```

**Pattern 2: Multiple levels**
```python
df.loc[('USA', 'NYC')]     # Specific city
```

**Pattern 3: Slice within level**
```python
idx = pd.IndexSlice
df.loc[idx['USA', 'NYC':'LA'], :]  # NYC to LA in USA
```

**Pattern 4: All values at level**
```python
df.xs('NYC', level='City')  # All NYC entries
```

### Important Notes

```
‚ö†Ô∏è MultiIndex must be sorted for slicing!
   Use: df.sort_index()

‚ö†Ô∏è Tuple vs single value:
   df.loc['A']      ‚Üí All level 1 where level 0='A'
   df.loc[('A',)]   ‚Üí Same as above (explicit tuple)
   df.loc[('A','X')] ‚Üí Specific combination

‚úÖ Use xs() for cleaner syntax
‚úÖ Use IndexSlice for complex selections
```

In [3]:
print("=== MULTIINDEX SELECTION ===\n")

# Create sample data
index = pd.MultiIndex.from_product([
    ['USA', 'UK'],
    ['2023', '2024'],
    ['Q1', 'Q2', 'Q3', 'Q4']
], names=['Country', 'Year', 'Quarter'])

np.random.seed(42)
df = pd.DataFrame({
    'Sales': np.random.randint(100, 200, 16),
    'Profit': np.random.randint(20, 50, 16)
}, index=index)

# Example 1: Select by first level
print("Example 1: Select all data for USA\n")
print(df.loc['USA'])
print()

# Example 2: Select specific combination
print("="*70)
print("Example 2: Select USA, 2023, Q1\n")
print(df.loc[('USA', '2023', 'Q1')])
print()

# Example 3: Select with partial tuple
print("="*70)
print("Example 3: All Q1 data for USA in 2023\n")
print(df.loc[('USA', '2023')])
print()

# Example 4: Cross-section (xs)
print("="*70)
print("Example 4: All Q1 data across all countries and years\n")
print(df.xs('Q1', level='Quarter'))
print()

# Example 5: xs with multiple levels
print("="*70)
print("Example 5: All 2024 data for USA\n")
print(df.xs(('USA', '2024'), level=['Country', 'Year']))
print()

# Example 6: IndexSlice
print("="*70)
print("Example 6: Using IndexSlice for complex selection\n")
idx = pd.IndexSlice

# All 2023 data
print("All 2023 data:")
print(df.loc[idx[:, '2023', :], :])
print()

# Example 7: Boolean indexing on index
print("="*70)
print("Example 7: Filter by index level value\n")
# Get all Q1 and Q2 data
quarters = df.index.get_level_values('Quarter').isin(['Q1', 'Q2'])
print(df[quarters])
print()

# Example 8: Slice within MultiIndex
print("="*70)
print("Example 8: Slicing (requires sorted index)\n")
df_sorted = df.sort_index()
print("USA 2023 Q2 to 2024 Q1:")
print(df_sorted.loc[('USA', '2023', 'Q2'):('USA', '2024', 'Q1')])

=== MULTIINDEX SELECTION ===

Example 1: Select all data for USA

              Sales  Profit
Year Quarter               
2023 Q1         151      21
     Q2         192      43
     Q3         114      31
     Q4         171      49
2024 Q1         160      25
     Q2         120      21
     Q3         182      47
     Q4         186      40

Example 2: Select USA, 2023, Q1

Sales     151
Profit     21
Name: (USA, 2023, Q1), dtype: int64

Example 3: All Q1 data for USA in 2023

         Sales  Profit
Quarter               
Q1         151      21
Q2         192      43
Q3         114      31
Q4         171      49

Example 4: All Q1 data across all countries and years

              Sales  Profit
Country Year               
USA     2023    151      21
        2024    160      25
UK      2023    174      20
        2024    123      48

Example 5: All 2024 data for USA

         Sales  Profit
Quarter               
Q1         160      25
Q2         120      21
Q3         182      47
Q4 

  print(df.loc[('USA', '2023')])


## 3. Stack and Unstack - Reshaping Data

### What is Stack/Unstack?

**Stack** = Pivot columns ‚Üí rows (wide ‚Üí long)
**Unstack** = Pivot rows ‚Üí columns (long ‚Üí wide)

### Visual Example

```
ORIGINAL (Wide):
        Q1   Q2   Q3
A       10   20   30
B       40   50   60

STACK (Long):
A  Q1    10
   Q2    20
   Q3    30
B  Q1    40
   Q2    50
   Q3    60

UNSTACK (Back to Wide):
        Q1   Q2   Q3
A       10   20   30
B       40   50   60
```

### Stack()

```python
# Stack innermost column level
df.stack()

# Stack specific level
df.stack(level=0)
df.stack(level='Quarter')

# Handle missing values
df.stack(dropna=False)  # Keep NaN
```

### Unstack()

```python
# Unstack innermost row level
df.unstack()

# Unstack specific level
df.unstack(level=0)
df.unstack(level='City')

# Fill missing values
df.unstack(fill_value=0)
```

### Common Patterns

**Pattern 1: Wide to Long**
```python
# Wide format
#      Q1  Q2  Q3
# A    10  20  30

df_long = df.stack()
# A  Q1    10
#    Q2    20
#    Q3    30
```

**Pattern 2: Long to Wide**
```python
# Long format
# A  Q1    10
#    Q2    20

df_wide = df.unstack()
#      Q1  Q2
# A    10  20
```

**Pattern 3: Swap Dimensions**
```python
# Rows: Country, City
# Cols: Metric

df.unstack(level='City')  # City becomes columns
df.stack().unstack(level='Country')  # Reorganize
```

### Stack vs Unstack vs Pivot

```python
stack()      # Columns ‚Üí Index (makes long)
unstack()    # Index ‚Üí Columns (makes wide)
pivot()      # Reshape by specifying index, columns, values
pivot_table() # Pivot with aggregation
```

In [4]:
print("=== STACK AND UNSTACK ===\n")

# Example 1: Basic stack
print("Example 1: Stack - Wide to Long\n")
df_wide = pd.DataFrame({
    'Q1': [100, 150, 120],
    'Q2': [110, 160, 125],
    'Q3': [120, 170, 130]
}, index=['Product A', 'Product B', 'Product C'])

print("Original (Wide):")
print(df_wide)

df_stacked = df_wide.stack()
print("\nStacked (Long):")
print(df_stacked)
print()

# Example 2: Unstack - Long to Wide
print("="*70)
print("Example 2: Unstack - Long to Wide\n")
print("Unstacked (back to wide):")
print(df_stacked.unstack())
print()

# Example 3: MultiIndex DataFrame
print("="*70)
print("Example 3: Stack/Unstack with MultiIndex\n")
index = pd.MultiIndex.from_product([
    ['North', 'South'],
    ['Product A', 'Product B']
], names=['Region', 'Product'])

df_multi = pd.DataFrame({
    'Q1': [100, 150, 120, 140],
    'Q2': [110, 160, 125, 145]
}, index=index)

print("Original:")
print(df_multi)

# Unstack Product level
print("\nUnstack Product (Products become columns):")
print(df_multi.unstack(level='Product'))
print()

# Example 4: Unstack multiple levels
print("="*70)
print("Example 4: Unstack to flatten completely\n")
print("Unstack both levels:")
print(df_multi.unstack(level=['Region', 'Product']))
print()

# Example 5: Handle missing values
print("="*70)
print("Example 5: Unstack with missing values\n")
df_sparse = pd.DataFrame({
    'value': [10, 20, 30]
}, index=pd.MultiIndex.from_tuples([
    ('A', 'X'),
    ('A', 'Y'),
    ('B', 'X')
], names=['Letter', 'Symbol']))

print("Original (missing B-Y):")
print(df_sparse)

print("\nUnstacked (NaN for missing):")
print(df_sparse.unstack())

print("\nUnstacked with fill_value=0:")
print(df_sparse.unstack(fill_value=0))
print()

# Example 6: Practical - Sales data transformation
print("="*70)
print("Example 6: Transform sales data format\n")
sales = pd.DataFrame({
    'Region': ['East', 'East', 'West', 'West'],
    'Quarter': ['Q1', 'Q2', 'Q1', 'Q2'],
    'Sales': [100, 110, 150, 160]
})

print("Original (long format):")
print(sales)

# Convert to wide format
sales_multi = sales.set_index(['Region', 'Quarter'])
sales_wide = sales_multi.unstack()

print("\nWide format (quarters as columns):")
print(sales_wide)
print()

# Example 7: Swap dimensions
print("="*70)
print("Example 7: Swap row and column dimensions\n")
print("Transpose dimensions (Region and Quarter swapped):")
print(sales_wide.T)

=== STACK AND UNSTACK ===

Example 1: Stack - Wide to Long

Original (Wide):
            Q1   Q2   Q3
Product A  100  110  120
Product B  150  160  170
Product C  120  125  130

Stacked (Long):
Product A  Q1    100
           Q2    110
           Q3    120
Product B  Q1    150
           Q2    160
           Q3    170
Product C  Q1    120
           Q2    125
           Q3    130
dtype: int64

Example 2: Unstack - Long to Wide

Unstacked (back to wide):
            Q1   Q2   Q3
Product A  100  110  120
Product B  150  160  170
Product C  120  125  130

Example 3: Stack/Unstack with MultiIndex

Original:
                   Q1   Q2
Region Product            
North  Product A  100  110
       Product B  150  160
South  Product A  120  125
       Product B  140  145

Unstack Product (Products become columns):
               Q1                  Q2          
Product Product A Product B Product A Product B
Region                                         
North         100       150       110  

## 4. Pivot Tables

### What is a Pivot Table?

**Pivot Table** = Reshape and aggregate data (like Excel pivot tables)

### pivot() vs pivot_table()

**pivot()** - No aggregation (requires unique index-column pairs)
```python
df.pivot(
    index='row_col',     # What becomes rows
    columns='col_col',   # What becomes columns
    values='value_col'   # What fills the table
)
```

**pivot_table()** - With aggregation (handles duplicates)
```python
df.pivot_table(
    values='value_col',
    index='row_col',
    columns='col_col',
    aggfunc='mean',      # sum, mean, count, etc.
    fill_value=0,        # Replace NaN
    margins=True         # Add totals
)
```

### Common Aggregation Functions

```python
'sum'      # Total
'mean'     # Average
'count'    # Count
'min'      # Minimum
'max'      # Maximum
'std'      # Standard deviation
'first'    # First value
'last'     # Last value
```

### Multiple Aggregations

```python
# Multiple functions
pivot_table(aggfunc=['sum', 'mean', 'count'])

# Different functions for different columns
pivot_table(aggfunc={'Sales': 'sum', 'Profit': 'mean'})
```

### Margins (Totals)

```python
pivot_table(..., margins=True, margins_name='Total')
```

### Common Patterns

**Pattern 1: Sales by Region and Quarter**
```python
df.pivot_table(
    values='sales',
    index='region',
    columns='quarter',
    aggfunc='sum'
)
```

**Pattern 2: Multiple Metrics**
```python
df.pivot_table(
    values=['sales', 'profit'],
    index='product',
    columns='quarter',
    aggfunc='sum'
)
```

**Pattern 3: Multi-level Index**
```python
df.pivot_table(
    values='sales',
    index=['region', 'city'],  # Two-level index
    columns='quarter',
    aggfunc='sum'
)
```

**Pattern 4: With Totals**
```python
df.pivot_table(
    values='sales',
    index='region',
    columns='quarter',
    aggfunc='sum',
    margins=True,
    margins_name='Total'
)
```

In [5]:
print("=== PIVOT TABLES ===\n")

# Create sample sales data
np.random.seed(42)
sales_data = pd.DataFrame({
    'Date': pd.date_range('2024-01-01', periods=12, freq='M'),
    'Region': ['East', 'West', 'North'] * 4,
    'Product': ['A', 'B'] * 6,
    'Sales': np.random.randint(100, 300, 12),
    'Profit': np.random.randint(20, 80, 12)
})
sales_data['Quarter'] = sales_data['Date'].dt.quarter.map(lambda x: f'Q{x}')

# Example 1: Basic pivot
print("Example 1: Simple pivot (no aggregation)\n")
simple_data = pd.DataFrame({
    'Region': ['East', 'West', 'East', 'West'],
    'Quarter': ['Q1', 'Q1', 'Q2', 'Q2'],
    'Sales': [100, 150, 110, 160]
})

print("Original:")
print(simple_data)

pivot = simple_data.pivot(index='Region', columns='Quarter', values='Sales')
print("\nPivoted:")
print(pivot)
print()

# Example 2: pivot_table with aggregation
print("="*70)
print("Example 2: Pivot table with aggregation\n")
print("Original data (first 6 rows):")
print(sales_data.head(6))

pivot_agg = sales_data.pivot_table(
    values='Sales',
    index='Region',
    columns='Quarter',
    aggfunc='sum'
)
print("\nPivot: Sales by Region and Quarter:")
print(pivot_agg)
print()

# Example 3: Multiple values
print("="*70)
print("Example 3: Multiple metrics\n")
pivot_multi = sales_data.pivot_table(
    values=['Sales', 'Profit'],
    index='Region',
    columns='Quarter',
    aggfunc='sum'
)
print(pivot_multi)
print()

# Example 4: Multi-level index
print("="*70)
print("Example 4: Multi-level index (Region and Product)\n")
pivot_multi_idx = sales_data.pivot_table(
    values='Sales',
    index=['Region', 'Product'],
    columns='Quarter',
    aggfunc='sum',
    fill_value=0
)
print(pivot_multi_idx)
print()

# Example 5: Multiple aggregation functions
print("="*70)
print("Example 5: Multiple aggregation functions\n")
pivot_multi_agg = sales_data.pivot_table(
    values='Sales',
    index='Region',
    columns='Quarter',
    aggfunc=['sum', 'mean', 'count']
)
print(pivot_multi_agg)
print()

# Example 6: With margins (totals)
print("="*70)
print("Example 6: Add row and column totals\n")
pivot_totals = sales_data.pivot_table(
    values='Sales',
    index='Region',
    columns='Quarter',
    aggfunc='sum',
    margins=True,
    margins_name='Total'
)
print(pivot_totals)
print()

# Example 7: Different aggfunc for different columns
print("="*70)
print("Example 7: Different aggregations per metric\n")
pivot_diff_agg = sales_data.pivot_table(
    values=['Sales', 'Profit'],
    index='Region',
    columns='Product',
    aggfunc={'Sales': 'sum', 'Profit': 'mean'}
)
print(pivot_diff_agg.round(0))
print()

# Example 8: Real-world - Monthly sales report
print("="*70)
print("Example 8: Monthly sales report by region\n")
monthly_report = sales_data.pivot_table(
    values=['Sales', 'Profit'],
    index='Region',
    aggfunc={
        'Sales': ['sum', 'mean'],
        'Profit': ['sum', 'mean']
    }
).round(1)
print(monthly_report)
print()

# Example 9: Count occurrences
print("="*70)
print("Example 9: Count of transactions\n")
count_pivot = sales_data.pivot_table(
    values='Sales',
    index='Product',
    columns='Region',
    aggfunc='count',
    fill_value=0
)
print(count_pivot)

=== PIVOT TABLES ===

Example 1: Simple pivot (no aggregation)

Original:
  Region Quarter  Sales
0   East      Q1    100
1   West      Q1    150
2   East      Q2    110
3   West      Q2    160

Pivoted:
Quarter   Q1   Q2
Region           
East     100  110
West     150  160

Example 2: Pivot table with aggregation

Original data (first 6 rows):
        Date Region Product  Sales  Profit Quarter
0 2024-01-31   East       A    202      72      Q1
1 2024-02-29   West       B    279      55      Q1
2 2024-03-31  North       A    192      59      Q1
3 2024-04-30   East       B    114      43      Q2
4 2024-05-31   West       A    206      22      Q2
5 2024-06-30  North       B    171      41      Q2

Pivot: Sales by Region and Quarter:
Quarter   Q1   Q2   Q3   Q4
Region                     
East     202  114  288  221
North    192  171  202  187
West     279  206  120  174

Example 3: Multiple metrics

        Profit             Sales               
Quarter     Q1  Q2  Q3  Q4    Q1   Q2   

  'Date': pd.date_range('2024-01-01', periods=12, freq='M'),


## 5. Index Operations

### Swap Levels

```python
# Swap two levels
df.swaplevel(0, 1)        # By position
df.swaplevel('A', 'B')    # By name
```

### Reorder Levels

```python
# Reorder to new sequence
df.reorder_levels(['Level2', 'Level1', 'Level0'])
```

### Sort Index

```python
# Sort all levels
df.sort_index()

# Sort specific level
df.sort_index(level=0)
df.sort_index(level='City')

# Sort descending
df.sort_index(ascending=False)
```

### Reset Index

```python
# Convert index to columns
df.reset_index()

# Reset specific level
df.reset_index(level=1)

# Drop index
df.reset_index(drop=True)
```

### Set Index

```python
# Single column
df.set_index('col')

# Multiple columns (MultiIndex)
df.set_index(['col1', 'col2'])

# Append to existing index
df.set_index('col', append=True)
```

### Index Names

```python
# Get names
df.index.names

# Set names
df.index.names = ['Level1', 'Level2']

# Rename levels
df.rename_axis(['New1', 'New2'])
```

### Get Level Values

```python
# Get values at specific level
df.index.get_level_values(0)
df.index.get_level_values('City')

# Unique values at level
df.index.get_level_values(0).unique()
```

### Common Patterns

**Pattern 1: Reorganize hierarchy**
```python
# Change order of importance
df.swaplevel(0, 1).sort_index()
```

**Pattern 2: Flatten MultiIndex**
```python
# Convert to regular DataFrame
df.reset_index()
```

**Pattern 3: Add level to existing index**
```python
# Add year as top level
df.set_index('year', append=True)
```

In [6]:
print("=== INDEX OPERATIONS ===\n")

# Create sample MultiIndex DataFrame
index = pd.MultiIndex.from_product([
    ['USA', 'UK'],
    ['NYC', 'LA', 'London'],
    ['2023', '2024']
], names=['Country', 'City', 'Year'])

np.random.seed(42)
df = pd.DataFrame({
    'Sales': np.random.randint(100, 200, 12)
}, index=index)

# Example 1: Swap levels
print("Example 1: Swap index levels\n")
print("Original order: Country > City > Year")
print(df.head(4))

df_swapped = df.swaplevel('Country', 'City')
print("\nSwapped: City > Country > Year")
print(df_swapped.head(4))
print()

# Example 2: Reorder levels
print("="*70)
print("Example 2: Reorder all levels\n")
df_reordered = df.reorder_levels(['Year', 'Country', 'City'])
print("New order: Year > Country > City")
print(df_reordered.head(4))
print()

# Example 3: Sort index
print("="*70)
print("Example 3: Sort by index\n")
df_unsorted = df.sample(frac=1)  # Shuffle
print("Unsorted (shuffled):")
print(df_unsorted.head(4))

df_sorted = df_unsorted.sort_index()
print("\nSorted:")
print(df_sorted.head(4))
print()

# Example 4: Sort specific level
print("="*70)
print("Example 4: Sort by specific level\n")
df_sort_year = df.sort_index(level='Year', ascending=False)
print("Sorted by Year (descending):")
print(df_sort_year.head(4))
print()

# Example 5: Reset index
print("="*70)
print("Example 5: Reset index (flatten to columns)\n")
df_reset = df.reset_index()
print(df_reset.head())
print(f"\nShape changed from {df.shape} to {df_reset.shape}")
print()

# Example 6: Reset specific level
print("="*70)
print("Example 6: Reset only one level\n")
df_reset_city = df.reset_index(level='City')
print(df_reset_city.head())
print("\nCity is now a column, Country and Year still in index")
print()

# Example 7: Set index from columns
print("="*70)
print("Example 7: Create MultiIndex from columns\n")
df_flat = pd.DataFrame({
    'Region': ['East', 'East', 'West', 'West'],
    'Product': ['A', 'B', 'A', 'B'],
    'Sales': [100, 150, 120, 160]
})

print("Original:")
print(df_flat)

df_indexed = df_flat.set_index(['Region', 'Product'])
print("\nWith MultiIndex:")
print(df_indexed)
print()

# Example 8: Rename index levels
print("="*70)
print("Example 8: Rename index levels\n")
df_renamed = df.rename_axis(['Nation', 'Location', 'Period'])
print(df_renamed.head(3))
print()

# Example 9: Get level values
print("="*70)
print("Example 9: Get values at specific level\n")
print(f"Unique countries: {df.index.get_level_values('Country').unique().tolist()}")
print(f"Unique cities: {df.index.get_level_values('City').unique().tolist()}")
print(f"Unique years: {df.index.get_level_values('Year').unique().tolist()}")
print()

# Example 10: Combine operations
print("="*70)
print("Example 10: Chain operations\n")
result = (df
    .swaplevel('Country', 'Year')     # Swap
    .sort_index(level=0)              # Sort by new level 0
    .reset_index(level='City')        # Move City to column
)
print(result.head())

=== INDEX OPERATIONS ===

Example 1: Swap index levels

Original order: Country > City > Year
                   Sales
Country City Year       
USA     NYC  2023    151
             2024    192
        LA   2023    114
             2024    171

Swapped: City > Country > Year
                   Sales
City Country Year       
NYC  USA     2023    151
             2024    192
LA   USA     2023    114
             2024    171

Example 2: Reorder all levels

New order: Year > Country > City
                   Sales
Year Country City       
2023 USA     NYC     151
2024 USA     NYC     192
2023 USA     LA      114
2024 USA     LA      171

Example 3: Sort by index

Unsorted (shuffled):
                     Sales
Country City   Year       
UK      London 2023    187
        LA     2024    174
USA     NYC    2023    151
UK      LA     2023    174

Sorted:
                     Sales
Country City   Year       
UK      LA     2023    174
               2024    174
        London 2023    187
     

## 6. Advanced Selection Techniques

### Query Method

```python
# SQL-like filtering
df.query('Sales > 100')
df.query('Sales > 100 and Region == "East"')
df.query('Product in ["A", "B"]')

# With variables
threshold = 100
df.query('Sales > @threshold')

# Index querying
df.query('index > 5')  # For numeric index
```

### Boolean Indexing

```python
# Single condition
df[df['Sales'] > 100]

# Multiple conditions
df[(df['Sales'] > 100) & (df['Region'] == 'East')]
df[(df['Sales'] > 150) | (df['Profit'] > 50)]

# NOT condition
df[~(df['Region'] == 'East')]

# Using isin()
df[df['Region'].isin(['East', 'West'])]
```

### Index-Based Boolean

```python
# Filter by index values
df[df.index.get_level_values(0) == 'USA']
df[df.index.get_level_values('City').str.startswith('N')]
```

### String Methods

```python
# String operations
df[df['City'].str.contains('New')]
df[df['City'].str.startswith('N')]
df[df['City'].str.endswith('York')]
df[df['City'].str.len() > 5]
```

### Between and isin

```python
# Range
df[df['Sales'].between(100, 200)]

# Membership
df[df['Region'].isin(['East', 'West'])]
df[~df['Region'].isin(['North'])]  # NOT in
```

### Null Checks

```python
# Missing values
df[df['Sales'].isna()]
df[df['Sales'].notna()]

# Null in index
df[df.index.get_level_values(0).notna()]
```

### Where and Mask

```python
# Where: Keep values where condition is True
df.where(df['Sales'] > 100, other=0)

# Mask: Replace values where condition is True
df.mask(df['Sales'] < 100, other=0)
```

### Query Performance

```
query() is often FASTER than boolean indexing!

df.query('A > 5')           # Often faster
df[df['A'] > 5]            # Traditional

Especially for:
- Large DataFrames
- Complex conditions
- Multiple column references
```

In [7]:
print("=== ADVANCED SELECTION ===\n")

# Create sample data
np.random.seed(42)
df = pd.DataFrame({
    'Region': ['East', 'West', 'North', 'South'] * 3,
    'Product': ['A', 'B', 'C'] * 4,
    'Sales': np.random.randint(80, 200, 12),
    'Profit': np.random.randint(10, 60, 12),
    'City': ['New York', 'Los Angeles', 'Chicago', 'Miami',
             'Boston', 'San Francisco', 'Seattle', 'Atlanta',
             'NYC', 'LA', 'Phoenix', 'Dallas']
})

# Example 1: query() method
print("Example 1: Query method (SQL-like)\n")
print("All data:")
print(df.head())

print("\nQuery: Sales > 150")
print(df.query('Sales > 150'))
print()

# Example 2: Multiple conditions in query
print("="*70)
print("Example 2: Complex query\n")
print("Query: Sales > 120 and Region == 'East'")
print(df.query('Sales > 120 and Region == "East"'))
print()

# Example 3: Query with variables
print("="*70)
print("Example 3: Query with external variables\n")
min_sales = 130
target_region = 'West'
print(f"Query: Sales > {min_sales} and Region == '{target_region}'")
print(df.query('Sales > @min_sales and Region == @target_region'))
print()

# Example 4: Query with isin
print("="*70)
print("Example 4: Query with membership\n")
print("Query: Product in ['A', 'B']")
print(df.query('Product in ["A", "B"]'))
print()

# Example 5: Boolean indexing (traditional)
print("="*70)
print("Example 5: Boolean indexing\n")
print("Condition: (Sales > 150) & (Profit > 30)")
result = df[(df['Sales'] > 150) & (df['Profit'] > 30)]
print(result)
print()

# Example 6: String methods
print("="*70)
print("Example 6: String filtering\n")
print("Cities containing 'New':")
print(df[df['City'].str.contains('New')])
print()

# Example 7: between()
print("="*70)
print("Example 7: Range filtering\n")
print("Sales between 100 and 150:")
print(df[df['Sales'].between(100, 150)])
print()

# Example 8: isin() for multiple values
print("="*70)
print("Example 8: Filter multiple values\n")
print("Regions: East or West")
print(df[df['Region'].isin(['East', 'West'])])
print()

# Example 9: Negation
print("="*70)
print("Example 9: NOT condition\n")
print("Regions NOT East:")
print(df[~(df['Region'] == 'East')])
print()

# Example 10: where() and mask()
print("="*70)
print("Example 10: where() - conditional replacement\n")
print("Replace Sales < 150 with NaN:")
print(df[['Region', 'Sales']].where(df['Sales'] >= 150))
print()

# Example 11: MultiIndex filtering
print("="*70)
print("Example 11: Filter MultiIndex by level\n")
df_multi = df.set_index(['Region', 'Product'])
print("Filter: Region level == 'East'")
east_mask = df_multi.index.get_level_values('Region') == 'East'
print(df_multi[east_mask])
print()

# Example 12: Complex combinations
print("="*70)
print("Example 12: Complex filtering\n")
print("Condition: (Sales > 120 OR Profit > 40) AND City starts with 'N'")
complex_filter = ((df['Sales'] > 120) | (df['Profit'] > 40)) & df['City'].str.startswith('N')
print(df[complex_filter])

=== ADVANCED SELECTION ===

Example 1: Query method (SQL-like)

All data:
  Region Product  Sales  Profit         City
0   East       A    182      20     New York
1   West       B    131      33  Los Angeles
2  North       C    172      45      Chicago
3  South       A     94      49        Miami
4   East       B    186      33       Boston

Query: Sales > 150
   Region Product  Sales  Profit           City
0    East       A    182      20       New York
2   North       C    172      45        Chicago
4    East       B    186      33         Boston
5    West       C    151      12  San Francisco
8    East       C    182      33            NYC
9    West       A    162      53             LA
10  North       B    166      39        Phoenix
11  South       C    154      47         Dallas

Example 2: Complex query

Query: Sales > 120 and Region == 'East'
  Region Product  Sales  Profit      City
0   East       A    182      20  New York
4   East       B    186      33    Boston
8   East   

## 7. Real-World Applications

### Application 1: Multi-Region Sales Analysis

```python
# Data: (Region, City, Product) ‚Üí Sales
df.set_index(['Region', 'City', 'Product'])

# Analysis
df.groupby(level=['Region', 'Product']).sum()
df.xs('NYC', level='City')
df.unstack(level='Product')  # Products as columns
```

### Application 2: Time Series with Multiple Metrics

```python
# Data: (Date, Metric) ‚Üí Value
df.set_index(['Date', 'Metric'])

# Analysis
df.unstack(level='Metric')  # Metrics as columns
df.xs('Sales', level='Metric')  # Just sales
df.resample('M', level='Date').sum()  # Monthly aggregate
```

### Application 3: Survey Data

```python
# Data: (Respondent, Question) ‚Üí Answer
df.set_index(['Respondent', 'Question'])

# Analysis
df.unstack(level='Question')  # Wide format
df.groupby(level='Question').value_counts()  # Per question
```

### Application 4: Financial Data

```python
# Data: (Company, Metric, Period) ‚Üí Value
df.set_index(['Company', 'Metric', 'Period'])

# Analysis
df.xs('Revenue', level='Metric')
df.unstack(level='Period')  # Periods as columns
df.pivot_table(index='Company', columns='Period', 
               values='Value', aggfunc='sum')
```

### Application 5: Experimental Data

```python
# Data: (Subject, Trial, Condition) ‚Üí Measurement
df.set_index(['Subject', 'Trial', 'Condition'])

# Analysis
df.groupby(level=['Subject', 'Condition']).mean()
df.xs('Treatment', level='Condition')
df.unstack(level='Condition')  # Compare conditions
```

In [8]:
print("=== REAL-WORLD APPLICATIONS ===\n")

# Application 1: Multi-region sales
print("Application 1: Regional Sales Analysis\n")
sales = pd.DataFrame({
    'Region': ['East', 'East', 'West', 'West'] * 2,
    'City': ['NYC', 'Boston', 'LA', 'Seattle'] * 2,
    'Product': ['A', 'A', 'A', 'A', 'B', 'B', 'B', 'B'],
    'Sales': [100, 80, 120, 90, 110, 85, 130, 95]
})

sales_multi = sales.set_index(['Region', 'City', 'Product'])
print("Data structure:")
print(sales_multi)

print("\nTotal by Region and Product:")
print(sales_multi.groupby(level=['Region', 'Product']).sum())

print("\nProducts as columns:")
print(sales_multi.unstack(level='Product'))
print()

# Application 2: Time series with metrics
print("="*70)
print("Application 2: Multi-Metric Time Series\n")
dates = pd.date_range('2024-01', periods=3, freq='M')
metrics = ['Sales', 'Profit', 'Orders']
ts_data = pd.DataFrame({
    'Date': np.repeat(dates, 3),
    'Metric': metrics * 3,
    'Value': [1000, 200, 50, 1100, 220, 55, 1200, 240, 60]
})

ts_multi = ts_data.set_index(['Date', 'Metric'])
print("Long format:")
print(ts_multi)

print("\nWide format (metrics as columns):")
print(ts_multi.unstack(level='Metric'))
print()

# Application 3: Survey responses
print("="*70)
print("Application 3: Survey Data Analysis\n")
survey = pd.DataFrame({
    'Respondent': [1, 1, 1, 2, 2, 2, 3, 3, 3],
    'Question': ['Q1', 'Q2', 'Q3'] * 3,
    'Answer': [5, 4, 5, 3, 4, 3, 5, 5, 4]
})

survey_multi = survey.set_index(['Respondent', 'Question'])
print("Long format:")
print(survey_multi)

print("\nWide format (questions as columns):")
print(survey_multi.unstack(level='Question'))

print("\nAverage score per question:")
print(survey_multi.groupby(level='Question').mean())
print()

# Application 4: Company financials
print("="*70)
print("Application 4: Financial Data (Multiple Companies)\n")
financials = pd.DataFrame({
    'Company': ['AAPL', 'AAPL', 'GOOGL', 'GOOGL'] * 2,
    'Metric': ['Revenue', 'Profit'] * 4,
    'Period': ['Q1', 'Q1', 'Q1', 'Q1', 'Q2', 'Q2', 'Q2', 'Q2'],
    'Value': [90, 20, 75, 18, 95, 22, 80, 19]
})

# Create pivot table
pivot = financials.pivot_table(
    values='Value',
    index=['Company', 'Metric'],
    columns='Period'
)
print(pivot)

print("\nRevenue only:")
print(pivot.xs('Revenue', level='Metric'))
print()

# Application 5: A/B testing results
print("="*70)
print("Application 5: A/B Test Analysis\n")
ab_test = pd.DataFrame({
    'Group': ['A', 'B', 'A', 'B', 'A', 'B'],
    'Metric': ['CTR', 'CTR', 'Revenue', 'Revenue', 'Users', 'Users'],
    'Value': [0.05, 0.06, 1000, 1100, 500, 510]
})

ab_pivot = ab_test.pivot(index='Metric', columns='Group', values='Value')
print("A vs B comparison:")
print(ab_pivot)

ab_pivot['Lift'] = ((ab_pivot['B'] - ab_pivot['A']) / ab_pivot['A'] * 100).round(2)
print("\nWith lift % (B vs A):")
print(ab_pivot)

=== REAL-WORLD APPLICATIONS ===

Application 1: Regional Sales Analysis

Data structure:
                        Sales
Region City    Product       
East   NYC     A          100
       Boston  A           80
West   LA      A          120
       Seattle A           90
East   NYC     B          110
       Boston  B           85
West   LA      B          130
       Seattle B           95

Total by Region and Product:
                Sales
Region Product       
East   A          180
       B          195
West   A          210
       B          225

Products as columns:
               Sales     
Product            A    B
Region City              
East   Boston     80   85
       NYC       100  110
West   LA        120  130
       Seattle    90   95

Application 2: Multi-Metric Time Series

Long format:
                   Value
Date       Metric       
2024-01-31 Sales    1000
           Profit    200
           Orders     50
2024-02-29 Sales    1100
           Profit    220
           Orde

  dates = pd.date_range('2024-01', periods=3, freq='M')


## 8. Best Practices & Performance

### When to Use MultiIndex ‚úÖ

```
‚úÖ Good Use Cases:
- Hierarchical data (Region > City > Store)
- Time series with multiple dimensions
- Panel data (repeated measurements)
- Grouped aggregations
- Pivot tables and cross-tabulations

‚ùå Avoid When:
- Simple flat data
- Need frequent reset_index()
- Primarily using iloc (position-based)
- Most operations work on single dimension
```

### Performance Tips üöÄ

**1. Sort Index for Slicing**
```python
# ‚úÖ Sort once, slice many times (fast)
df = df.sort_index()
df.loc[('A', 'X'):('B', 'Y')]  # Fast

# ‚ùå Unsorted index (slow/error)
df.loc[('A', 'X'):('B', 'Y')]  # UnsortedIndexError
```

**2. Use xs() for Cross-Sections**
```python
# ‚úÖ Cleaner and often faster
df.xs('NYC', level='City')

# ‚ùå More complex
df[df.index.get_level_values('City') == 'NYC']
```

**3. Query for Complex Filters**
```python
# ‚úÖ Often faster for large DataFrames
df.query('Sales > 100 and Region == "East"')

# ‚ùå Slower for complex conditions
df[(df['Sales'] > 100) & (df['Region'] == 'East')]
```

**4. Unstack Sparingly**
```python
# ‚ö†Ô∏è Can create many NaN values
# ‚ö†Ô∏è Memory intensive
df.unstack()  # Creates wide format
```

### Common Pitfalls ‚ùå

**1. Unsorted Index**
```python
# ‚ùå Error
df.loc['A':'B']  # UnsortedIndexError

# ‚úÖ Sort first
df = df.sort_index()
df.loc['A':'B']  # Works
```

**2. Wrong Tuple Format**
```python
# ‚ùå Wrong
df.loc['A', 'X']  # Treated as (row, column)

# ‚úÖ Correct for MultiIndex
df.loc[('A', 'X')]  # Tuple for multi-level
```

**3. Forgetting Level Names**
```python
# ‚ùå Loses names
index = pd.MultiIndex.from_tuples([('A', 'X')])

# ‚úÖ Keep names
index = pd.MultiIndex.from_tuples(
    [('A', 'X')],
    names=['Level1', 'Level2']
)
```

**4. Stack/Unstack Confusion**
```python
stack()    # Columns ‚Üí Rows (makes taller)
unstack()  # Rows ‚Üí Columns (makes wider)

Remember: Stack builds UP (rows), Unstack spreads OUT (columns)
```

### Index Hygiene

```python
# ‚úÖ Always name your levels
df.index.names = ['Region', 'City']

# ‚úÖ Sort after creation
df = df.sort_index()

# ‚úÖ Remove unused levels after filtering
df = df[df.index.get_level_values(0) == 'USA']
df.index = df.index.remove_unused_levels()

# ‚úÖ Check for duplicates
df.index.is_unique  # Should be True usually
```

### Workflow Pattern

```python
# 1. Load data
df = pd.read_csv('data.csv')

# 2. Create MultiIndex
df = df.set_index(['Region', 'City', 'Date'])

# 3. Name levels (if not named)
df.index.names = ['Region', 'City', 'Date']

# 4. Sort for efficient slicing
df = df.sort_index()

# 5. Analysis
# - Use xs() for cross-sections
# - Use query() for filtering
# - Use unstack() to pivot
# - Use groupby() with level parameter

# 6. Reset for output if needed
df_output = df.reset_index()
```

## 9. Practice Exercises

### Beginner Level (1-5)

1. **Create MultiIndex**
   - From tuples: (Country, City) pairs
   - Add names to levels

2. **Basic selection**
   - Select all rows for specific level 0 value
   - Select specific combination

3. **Stack and unstack**
   - Convert wide DataFrame to long
   - Convert back to wide

4. **Simple pivot**
   - Create pivot table with one index and one column
   - Use sum aggregation

5. **Reset index**
   - Convert MultiIndex to columns
   - Set index from columns

### Intermediate Level (6-10)

6. **Cross-sections**
   - Use xs() to select from specific level
   - Select multiple levels

7. **IndexSlice**
   - Use pd.IndexSlice for complex slicing
   - Slice on multiple levels

8. **Swap and reorder**
   - Swap two index levels
   - Reorder to new sequence

9. **Pivot with multiple values**
   - Pivot table with 2+ value columns
   - Multiple aggregation functions

10. **Query method**
    - Filter using query()
    - Use external variables with @

### Advanced Level (11-15)

11. **Multi-level pivot**
    - Create pivot with MultiIndex rows and columns
    - Add margins (totals)

12. **Stack with level**
    - Stack specific level from multi-level columns
    - Unstack multiple levels

13. **GroupBy with level**
    - Group by specific index level
    - Aggregate across multiple levels

14. **Boolean indexing on MultiIndex**
    - Filter using get_level_values()
    - Combine multiple level conditions

15. **Reshape time series**
    - Create (Date, Metric) MultiIndex
    - Unstack metrics to columns
    - Resample by date level

### Challenge Problems (16-20)

16. **Complete sales analysis**
    - MultiIndex: (Region, City, Product)
    - Pivot by different dimensions
    - Calculate totals and subtotals
    - Reshape for reporting

17. **Survey data processing**
    - (Respondent, Question, Wave) ‚Üí Answer
    - Pivot to wide format
    - Calculate response rates
    - Compare waves

18. **Financial statement**
    - (Company, Metric, Period) structure
    - Create income statement format
    - Calculate period-over-period growth
    - Handle missing data

19. **Experimental results**
    - (Subject, Treatment, Measurement) ‚Üí Value
    - Compare treatments
    - Calculate statistics per group
    - Reshape for visualization

20. **Complex hierarchy**
    - 4-level index: (Country, Region, City, Store)
    - Aggregate at different levels
    - Swap and reorder for different views
    - Export flattened version

## Quick Reference Card

### Creating MultiIndex

```python
# From tuples
pd.MultiIndex.from_tuples([
    ('A', 'X'),
    ('A', 'Y')
], names=['Level1', 'Level2'])

# From product (Cartesian)
pd.MultiIndex.from_product([
    ['A', 'B'],
    ['X', 'Y']
], names=['Level1', 'Level2'])

# From arrays
pd.MultiIndex.from_arrays([
    ['A', 'A', 'B'],
    ['X', 'Y', 'X']
], names=['Level1', 'Level2'])

# From DataFrame
df.set_index(['col1', 'col2'])
```

### Selection

```python
# By level
df.loc['A']                    # All level 1 where level 0='A'
df.loc[('A', 'X')]            # Specific combination

# Cross-section
df.xs('X', level=1)           # All rows where level 1='X'
df.xs(('A', 'X'))             # Specific combination

# IndexSlice
idx = pd.IndexSlice
df.loc[idx['A', :], :]        # All level 1 where level 0='A'
df.loc[idx[:, 'X'], :]        # All level 0 where level 1='X'

# Slicing (requires sorted index)
df.loc[('A', 'X'):('B', 'Y')]
```

### Stack & Unstack

```python
# Stack (columns ‚Üí rows)
df.stack()                    # Stack innermost level
df.stack(level=0)             # Stack specific level
df.stack(dropna=False)        # Keep NaN

# Unstack (rows ‚Üí columns)
df.unstack()                  # Unstack innermost level
df.unstack(level='City')      # Unstack specific level
df.unstack(fill_value=0)      # Fill missing
```

### Pivot Tables

```python
# Simple pivot (no aggregation)
df.pivot(
    index='row_col',
    columns='col_col',
    values='value_col'
)

# Pivot table (with aggregation)
df.pivot_table(
    values='value_col',
    index='row_col',
    columns='col_col',
    aggfunc='sum',
    fill_value=0,
    margins=True
)

# Multiple aggregations
df.pivot_table(
    values='sales',
    index='region',
    columns='quarter',
    aggfunc=['sum', 'mean', 'count']
)
```

### Index Operations

```python
# Swap levels
df.swaplevel(0, 1)
df.swaplevel('A', 'B')

# Reorder levels
df.reorder_levels(['L2', 'L1', 'L0'])

# Sort index
df.sort_index()
df.sort_index(level=1)
df.sort_index(ascending=False)

# Reset index
df.reset_index()              # All levels to columns
df.reset_index(level=1)       # One level to column
df.reset_index(drop=True)     # Drop index

# Set index
df.set_index('col')
df.set_index(['col1', 'col2'])
```

### Query & Filtering

```python
# Query method
df.query('Sales > 100')
df.query('Sales > 100 and Region == "East"')
df.query('Product in ["A", "B"]')

# With variables
threshold = 100
df.query('Sales > @threshold')

# Boolean indexing
df[df['Sales'] > 100]
df[(df['Sales'] > 100) & (df['Region'] == 'East')]
df[df['Region'].isin(['East', 'West'])]
df[df['Sales'].between(100, 200)]

# On index
df[df.index.get_level_values(0) == 'USA']
df[df.index.get_level_values('City') == 'NYC']
```

### Common Patterns

```python
# Pattern 1: Create and sort MultiIndex
df = df.set_index(['Region', 'City']).sort_index()

# Pattern 2: Aggregate by levels
df.groupby(level=['Region', 'Product']).sum()

# Pattern 3: Reshape for analysis
df.unstack(level='Product')  # Products as columns

# Pattern 4: Pivot and add totals
df.pivot_table(
    values='sales',
    index='region',
    columns='quarter',
    aggfunc='sum',
    margins=True
)

# Pattern 5: Filter and select
df.xs('East', level='Region').query('Sales > 100')
```

## Summary

### Key Concepts Mastered ‚úÖ

**1. MultiIndex (Hierarchical Indexing)**
- Multiple levels of indices
- Represents N-dimensional data in 2D
- Efficient storage and access
- Natural grouping structure

**2. Creating MultiIndex**
- **from_tuples()**: List of tuples
- **from_product()**: Cartesian product
- **from_arrays()**: Separate arrays per level
- **set_index()**: From DataFrame columns

**3. Selection Methods**
- **loc[]**: Label-based (use tuples)
- **xs()**: Cross-section (cleaner syntax)
- **IndexSlice**: Complex multi-level slicing
- **get_level_values()**: Filter by level

**4. Reshaping**
- **stack()**: Columns ‚Üí Rows (long format)
- **unstack()**: Rows ‚Üí Columns (wide format)
- **pivot()**: Reshape without aggregation
- **pivot_table()**: Reshape with aggregation

**5. Index Operations**
- **swaplevel()**: Swap two levels
- **reorder_levels()**: Change level order
- **sort_index()**: Sort by index
- **reset_index()**: Flatten to columns

**6. Advanced Filtering**
- **query()**: SQL-like syntax
- Boolean indexing with multiple conditions
- String methods on index/columns
- **between()** and **isin()** for ranges

---

### Method Comparison

| Method | Purpose | Best For |
|--------|---------|----------|
| **loc[]** | Label selection | Direct index access |
| **xs()** | Cross-section | Selecting from one level |
| **IndexSlice** | Complex slicing | Multi-level ranges |
| **query()** | SQL-like filtering | Complex conditions |
| **stack()** | Wide ‚Üí Long | Normalize data |
| **unstack()** | Long ‚Üí Wide | Comparison tables |
| **pivot_table()** | Aggregate & reshape | Summary reports |

---

### Decision Trees

**Should I use MultiIndex?**
```
Do you have hierarchical data?
‚îú‚îÄ Yes ‚Üí Does it have 2+ natural grouping levels?
‚îÇ   ‚îú‚îÄ Yes ‚Üí ‚úÖ Use MultiIndex
‚îÇ   ‚îî‚îÄ No  ‚Üí Regular index is fine
‚îî‚îÄ No  ‚Üí ‚ùå Don't use MultiIndex
```

**How to select data?**
```
What do you need?
‚îú‚îÄ Specific combination ‚Üí df.loc[('A', 'X')]
‚îú‚îÄ All of one level ‚Üí df.xs('X', level=1)
‚îú‚îÄ Complex slice ‚Üí idx = pd.IndexSlice; df.loc[idx[:, 'X'], :]
‚îî‚îÄ Filtered data ‚Üí df.query('Sales > 100')
```

**Stack or Unstack?**
```
What do you want?
‚îú‚îÄ Make data taller (long) ‚Üí stack()
‚îú‚îÄ Make data wider (pivot) ‚Üí unstack()
‚îî‚îÄ Aggregate + reshape ‚Üí pivot_table()
```

---

### Real-World Applications

**Business Analytics**
- Regional sales (Region, City, Product)
- Multi-period comparisons
- Hierarchical reporting structures
- KPI dashboards

**Finance**
- Company fundamentals (Company, Metric, Period)
- Portfolio analysis (Asset, Date, Price Type)
- Multi-currency data

**Research**
- Experimental data (Subject, Treatment, Measurement)
- Longitudinal studies (Patient, Visit, Metric)
- Survey responses (Respondent, Question, Wave)

**Operations**
- Inventory (Warehouse, Product, Date)
- Sensor data (Device, Location, Timestamp)
- Multi-location metrics

---

### Remember

- üìä **MultiIndex** = Hierarchical structure for complex data
- üîç **Sort index** before slicing (`df.sort_index()`)
- üéØ **xs()** is cleaner than complex loc[] for cross-sections
- üîÑ **Stack** makes tall, **Unstack** makes wide
- üìà **pivot_table()** for aggregated summaries
- ‚ö° **query()** often faster than boolean indexing
- üè∑Ô∏è **Name your levels** for clarity

---

### Next Steps

After mastering advanced indexing:
1. **Performance** - Optimize large MultiIndex operations
2. **Visualization** - Plot hierarchical data
3. **Database** - SQL-like operations with MultiIndex
4. **Advanced reshaping** - Complex transformations
5. **Real projects** - Apply to business problems

---

**Happy Advanced Indexing! üêºüéØüìä**