# Pivot Tables and Reshaping in Pandas

## Overview

**Reshaping** = Transforming data between different formats (wide ‚Üî long)

### Why Reshape Data?

Different analyses need different formats:
- **Plotting**: Often needs wide format
- **Statistical analysis**: Often needs long format
- **Reporting**: Needs pivoted summaries
- **Machine Learning**: Needs specific structures

### Visual Example

```
LONG FORMAT (tall, tidy):        WIDE FORMAT (spreadsheet-like):
Person | Month | Sales           Person | Jan | Feb | Mar
Alice  | Jan   | 100             Alice  | 100 | 120 | 110
Alice  | Feb   | 120             Bob    | 90  | 95  | 100
Alice  | Mar   | 110
Bob    | Jan   | 90
Bob    | Feb   | 95
Bob    | Mar   | 100

        ‚Üï RESHAPE ‚Üï
```

### Key Operations

| Operation | Purpose | Direction |
|-----------|---------|----------|
| **pivot()** | Reshape without aggregation | Long ‚Üí Wide |
| **pivot_table()** | Reshape with aggregation | Long ‚Üí Wide |
| **melt()** | Unpivot data | Wide ‚Üí Long |
| **stack()** | Pivot columns to rows | Wide ‚Üí Long |
| **unstack()** | Pivot rows to columns | Long ‚Üí Wide |
| **crosstab()** | Frequency table | Data ‚Üí Contingency |

### Common Patterns

```python
# Wide to Long (for analysis)
df.melt(id_vars=['id'], value_vars=['Jan', 'Feb', 'Mar'])

# Long to Wide (for reporting)
df.pivot(index='person', columns='month', values='sales')

# Aggregate while pivoting
df.pivot_table(index='product', columns='region', 
               values='sales', aggfunc='sum')
```

### What We'll Learn
1. ‚úÖ pivot() - Basic reshaping
2. ‚úÖ pivot_table() - With aggregation
3. ‚úÖ melt() - Wide to long
4. ‚úÖ stack() and unstack()
5. ‚úÖ crosstab() - Frequency tables
6. ‚úÖ Real-world reporting scenarios
7. ‚úÖ Best practices
8. ‚úÖ Common pitfalls

In [1]:
import pandas as pd
import numpy as np
from datetime import datetime, timedelta

# Display settings
pd.set_option('display.max_columns', None)
pd.set_option('display.width', None)
pd.set_option('display.precision', 2)

print("‚úÖ Libraries imported")
print(f"Pandas version: {pd.__version__}")

‚úÖ Libraries imported
Pandas version: 2.2.3


## Sample Dataset: Regional Sales Data

We'll use realistic sales data to demonstrate reshaping operations.

### Dataset Structure

**Long Format** (database/analysis friendly):
```
salesperson | region | month | product | sales_amount
Alice       | North  | Jan   | Laptop  | 5000
Alice       | North  | Feb   | Laptop  | 5500
...
```

**Wide Format** (reporting/visualization friendly):
```
salesperson | Jan  | Feb  | Mar  | Apr
Alice       | 5000 | 5500 | 5200 | 5800
Bob         | 4500 | 4800 | 4600 | 5000
```

We'll practice converting between these formats!

In [2]:
print("Creating sample sales dataset...\n")

# Create long format data (typical database structure)
np.random.seed(42)

salespeople = ['Alice', 'Bob', 'Charlie', 'Diana']
regions = ['North', 'South', 'East', 'West']
months = ['Jan', 'Feb', 'Mar', 'Apr']
products = ['Laptop', 'Phone', 'Tablet']

# Generate all combinations
data = []
for person in salespeople:
    for region in [np.random.choice(regions)]:
        for month in months:
            for product in products:
                sales = np.random.randint(3000, 8000)
                data.append({
                    'salesperson': person,
                    'region': region,
                    'month': month,
                    'product': product,
                    'sales_amount': sales,
                    'units_sold': np.random.randint(5, 20)
                })

sales_long = pd.DataFrame(data)

print("LONG FORMAT (Database/Analysis friendly):")
print(sales_long.head(15))
print(f"\nShape: {sales_long.shape}")
print(f"Columns: {sales_long.columns.tolist()}")
print("\nThis is 'tidy' format - one observation per row")
print()

# Also create a simple wide format example
print("="*70)
print("\nWIDE FORMAT (Spreadsheet/Reporting friendly):")
sales_wide = sales_long.groupby(['salesperson', 'month'])['sales_amount'].sum().unstack()
print(sales_wide)
print(f"\nShape: {sales_wide.shape}")
print("\nEach month becomes a column - easier to read!")

Creating sample sales dataset...

LONG FORMAT (Database/Analysis friendly):
   salesperson region month product  sales_amount  units_sold
0        Alice   East   Jan  Laptop          3860          19
1        Alice   East   Jan   Phone          6772           9
2        Alice   East   Jan  Tablet          3466          11
3        Alice   East   Feb  Laptop          7426          15
4        Alice   East   Feb   Phone          6444           8
5        Alice   East   Feb  Tablet          5919          12
6        Alice   East   Mar  Laptop          3130          10
7        Alice   East   Mar   Phone          3769          12
8        Alice   East   Mar  Tablet          5433          16
9        Alice   East   Apr  Laptop          4184          16
10       Alice   East   Apr   Phone          6385          10
11       Alice   East   Apr  Tablet          7843          13
12         Bob  North   Jan  Laptop          3474          15
13         Bob  North   Jan   Phone          5558       

## 1. pivot() - Basic Reshaping

### What is pivot()?

**pivot()** = Reshape data from long to wide format **without aggregation**

### Requirements
- ‚ö†Ô∏è **No duplicate** index/column combinations
- Each combination must be unique
- Use `pivot_table()` if duplicates exist

### Syntax

```python
df.pivot(index='row_labels',      # What becomes row index
         columns='col_labels',    # What becomes column names
         values='data_values')    # What fills the cells
```

### Visual Transformation

```
BEFORE (Long):               AFTER (Wide):
Person | Month | Sales       Month    Jan  Feb  Mar
Alice  | Jan   | 100         Person
Alice  | Feb   | 120         Alice    100  120  110
Alice  | Mar   | 110         Bob      90   95   100
Bob    | Jan   | 90
Bob    | Feb   | 95
Bob    | Mar   | 100

Code: df.pivot(index='Person', columns='Month', values='Sales')
```

### Parameters

| Parameter | Description | Example |
|-----------|-------------|----------|
| **index** | Column(s) for row labels | 'salesperson' |
| **columns** | Column(s) for column labels | 'month' |
| **values** | Column(s) for cell values | 'sales_amount' |

### When to Use
- ‚úÖ Data is already aggregated
- ‚úÖ No duplicate combinations
- ‚úÖ Simple reshaping needed
- ‚ùå Use pivot_table() if duplicates exist

In [3]:
print("=== PIVOT() EXAMPLES ===\n")

# Create simple data without duplicates
simple_sales = pd.DataFrame({
    'salesperson': ['Alice', 'Alice', 'Alice', 'Bob', 'Bob', 'Bob'],
    'month': ['Jan', 'Feb', 'Mar', 'Jan', 'Feb', 'Mar'],
    'sales': [100, 120, 110, 90, 95, 100]
})

# Example 1: Basic pivot
print("Example 1: Basic pivot - salesperson √ó month")
print("\nBEFORE (Long format):")
print(simple_sales)

pivoted = simple_sales.pivot(index='salesperson', columns='month', values='sales')
print("\nAFTER (Wide format):")
print(pivoted)
print(f"\nShape changed from {simple_sales.shape} to {pivoted.shape}")
print()

# Example 2: Different index/columns
print("Example 2: Swap index and columns")
pivoted_swapped = simple_sales.pivot(index='month', columns='salesperson', values='sales')
print(pivoted_swapped)
print()

# Example 3: Access pivoted data
print("Example 3: Access pivoted data")
print("Alice's Feb sales:", pivoted.loc['Alice', 'Feb'])
print("All Jan sales:", pivoted['Jan'].values)
print()

# Example 4: Reset index to make it a regular DataFrame
print("Example 4: Reset index for cleaner DataFrame")
pivoted_reset = pivoted.reset_index()
print(pivoted_reset)
print()

# Example 5: What happens with duplicates?
print("Example 5: Pivot fails with duplicate combinations")
duplicate_data = pd.DataFrame({
    'person': ['Alice', 'Alice', 'Bob'],
    'month': ['Jan', 'Jan', 'Jan'],  # Alice-Jan appears twice!
    'sales': [100, 150, 90]
})
print("\nData with duplicates:")
print(duplicate_data)

try:
    duplicate_data.pivot(index='person', columns='month', values='sales')
except ValueError as e:
    print(f"\n‚ùå Error: {str(e)[:80]}...")
    print("\nüí° Use pivot_table() for data with duplicates!")

=== PIVOT() EXAMPLES ===

Example 1: Basic pivot - salesperson √ó month

BEFORE (Long format):
  salesperson month  sales
0       Alice   Jan    100
1       Alice   Feb    120
2       Alice   Mar    110
3         Bob   Jan     90
4         Bob   Feb     95
5         Bob   Mar    100

AFTER (Wide format):
month        Feb  Jan  Mar
salesperson               
Alice        120  100  110
Bob           95   90  100

Shape changed from (6, 3) to (2, 3)

Example 2: Swap index and columns
salesperson  Alice  Bob
month                  
Feb            120   95
Jan            100   90
Mar            110  100

Example 3: Access pivoted data
Alice's Feb sales: 120
All Jan sales: [100  90]

Example 4: Reset index for cleaner DataFrame
month salesperson  Feb  Jan  Mar
0           Alice  120  100  110
1             Bob   95   90  100

Example 5: Pivot fails with duplicate combinations

Data with duplicates:
  person month  sales
0  Alice   Jan    100
1  Alice   Jan    150
2    Bob   Jan     90

‚ùå E

## 2. pivot_table() - Pivot with Aggregation

### pivot_table() vs pivot()

| Feature | pivot() | pivot_table() |
|---------|---------|---------------|
| **Handles duplicates** | ‚ùå No | ‚úÖ Yes |
| **Aggregation** | ‚ùå No | ‚úÖ Yes |
| **Multiple values** | Limited | ‚úÖ Yes |
| **Margins (totals)** | ‚ùå No | ‚úÖ Yes |
| **Use when** | Pre-aggregated | Raw data |

### Syntax

```python
df.pivot_table(
    index='row_labels',        # What becomes rows
    columns='col_labels',      # What becomes columns
    values='data_values',      # What to aggregate
    aggfunc='mean',            # How to aggregate (mean, sum, count, etc.)
    fill_value=0,              # Replace NaN
    margins=True               # Add row/column totals
)
```

### Aggregation Functions

| Function | Description | Example Use |
|----------|-------------|-------------|
| `'mean'` | Average | Average sales |
| `'sum'` | Total | Total revenue |
| `'count'` | Count | Number of orders |
| `'min'` | Minimum | Lowest price |
| `'max'` | Maximum | Highest sale |
| `'median'` | Middle value | Median income |
| `'std'` | Standard deviation | Volatility |
| `np.mean` | NumPy function | Same as 'mean' |
| `[list]` | Multiple | ['sum', 'mean'] |
| `{dict}` | Different per column | {'col1': 'sum', 'col2': 'mean'} |

### Visual Example

```
Raw Data (with duplicates):     Pivot Table (aggregated):
Person | Product | Sales        Product  Laptop  Phone
Alice  | Laptop  | 100          Person
Alice  | Laptop  | 120          Alice    110     95
Alice  | Phone   | 90           Bob      105     88
Alice  | Phone   | 100
Bob    | Laptop  | 110
Bob    | Laptop  | 100
Bob    | Phone   | 85
Bob    | Phone   | 90

Code: df.pivot_table(index='Person', columns='Product', 
                     values='Sales', aggfunc='mean')
```

In [4]:
print("=== PIVOT_TABLE() EXAMPLES ===\n")

# Example 1: Basic pivot_table with aggregation
print("Example 1: Average sales by salesperson √ó month")
avg_sales = sales_long.pivot_table(
    index='salesperson',
    columns='month',
    values='sales_amount',
    aggfunc='mean'
)
print(avg_sales.round(0))
print()

# Example 2: Total sales (sum)
print("Example 2: Total sales by region √ó product")
total_sales = sales_long.pivot_table(
    index='region',
    columns='product',
    values='sales_amount',
    aggfunc='sum'
)
print(total_sales)
print()

# Example 3: Count of transactions
print("Example 3: Number of sales by salesperson √ó product")
count_sales = sales_long.pivot_table(
    index='salesperson',
    columns='product',
    values='sales_amount',
    aggfunc='count'
)
print(count_sales)
print()

# Example 4: Multiple aggregation functions
print("Example 4: Multiple aggregations (sum and mean)")
multi_agg = sales_long.pivot_table(
    index='salesperson',
    columns='product',
    values='sales_amount',
    aggfunc=['sum', 'mean']
)
print(multi_agg.round(0))
print()

# Example 5: Add margins (totals)
print("Example 5: Pivot table with row and column totals")
with_totals = sales_long.pivot_table(
    index='salesperson',
    columns='product',
    values='sales_amount',
    aggfunc='sum',
    margins=True,
    margins_name='TOTAL'
)
print(with_totals)
print()

# Example 6: Fill missing values
print("Example 6: Fill NaN with zeros")
filled = sales_long.pivot_table(
    index='region',
    columns='month',
    values='sales_amount',
    aggfunc='sum',
    fill_value=0
)
print(filled)
print()

# Example 7: Multiple values columns
print("Example 7: Pivot multiple value columns")
multi_val = sales_long.pivot_table(
    index='salesperson',
    columns='month',
    values=['sales_amount', 'units_sold'],
    aggfunc='sum'
)
print(multi_val)
print("\nNote: Creates MultiIndex columns")

=== PIVOT_TABLE() EXAMPLES ===

Example 1: Average sales by salesperson √ó month
month           Apr     Feb     Jan     Mar
salesperson                                
Alice        6137.0  6596.0  4699.0  4111.0
Bob          6071.0  4890.0  4693.0  6042.0
Charlie      6459.0  4799.0  5238.0  5917.0
Diana        6077.0  7420.0  5313.0  3810.0

Example 2: Total sales by region √ó product
product  Laptop  Phone  Tablet
region                        
East      18600  23370   22661
North     50487  40810   41652
West      22511  22132   22596

Example 3: Number of sales by salesperson √ó product
product      Laptop  Phone  Tablet
salesperson                       
Alice             4      4       4
Bob               4      4       4
Charlie           4      4       4
Diana             4      4       4

Example 4: Multiple aggregations (sum and mean)
               sum                  mean                
product     Laptop  Phone Tablet  Laptop   Phone  Tablet
salesperson                 

## 3. melt() - Wide to Long (Unpivot)

### What is melt()?

**melt()** = Transform from wide to long format (opposite of pivot)

Also called **unpivot** or **gather**

### Syntax

```python
df.melt(
    id_vars=['cols_to_keep'],      # Identifier columns (stay as is)
    value_vars=['cols_to_melt'],   # Columns to unpivot
    var_name='variable_column',    # Name for variable column
    value_name='value_column'      # Name for value column
)
```

### Visual Transformation

```
BEFORE (Wide):                AFTER (Long):
Person | Jan | Feb | Mar      Person | Month | Sales
Alice  | 100 | 120 | 110      Alice  | Jan   | 100
Bob    | 90  | 95  | 100      Alice  | Feb   | 120
                              Alice  | Mar   | 110
                              Bob    | Jan   | 90
                              Bob    | Feb   | 95
                              Bob    | Mar   | 100

Code: df.melt(id_vars=['Person'], 
              value_vars=['Jan', 'Feb', 'Mar'],
              var_name='Month', 
              value_name='Sales')
```

### Parameters

| Parameter | Description | Required? |
|-----------|-------------|----------|
| **id_vars** | Columns to keep as identifiers | Optional |
| **value_vars** | Columns to melt (default: all) | Optional |
| **var_name** | Name for variable column | Optional |
| **value_name** | Name for value column | Optional |

### When to Use
- Convert spreadsheet data to database format
- Prepare data for analysis/modeling
- Create tidy data
- Before groupby operations

In [5]:
print("=== MELT() EXAMPLES ===\n")

# Create wide format data
sales_wide_ex = pd.DataFrame({
    'salesperson': ['Alice', 'Bob', 'Charlie'],
    'Jan': [100, 90, 85],
    'Feb': [120, 95, 88],
    'Mar': [110, 100, 92]
})

# Example 1: Basic melt
print("Example 1: Basic melt - wide to long")
print("\nBEFORE (Wide):")
print(sales_wide_ex)

melted = sales_wide_ex.melt(
    id_vars=['salesperson'],
    value_vars=['Jan', 'Feb', 'Mar'],
    var_name='month',
    value_name='sales'
)

print("\nAFTER (Long):")
print(melted)
print(f"\nShape changed from {sales_wide_ex.shape} to {melted.shape}")
print()

# Example 2: Melt all except id columns
print("Example 2: Melt without specifying value_vars")
melted_auto = sales_wide_ex.melt(
    id_vars=['salesperson'],
    var_name='month',
    value_name='sales'
)
print(melted_auto)
print("\nAll non-id columns are melted automatically")
print()

# Example 3: Multiple id variables
print("Example 3: Multiple identifier columns")
multi_id_data = pd.DataFrame({
    'name': ['Alice', 'Bob'],
    'region': ['North', 'South'],
    'Q1': [100, 90],
    'Q2': [110, 95],
    'Q3': [120, 100]
})
print("\nWide format:")
print(multi_id_data)

melted_multi = multi_id_data.melt(
    id_vars=['name', 'region'],
    var_name='quarter',
    value_name='revenue'
)
print("\nLong format:")
print(melted_multi)
print()

# Example 4: Melt for analysis
print("Example 4: Melt then analyze")
# Now we can easily calculate total per person
total_by_person = melted.groupby('salesperson')['sales'].sum()
print("Total sales per person:")
print(total_by_person)
print()

# Example 5: Round trip (pivot back)
print("Example 5: Melt then pivot back (round trip)")
back_to_wide = melted.pivot(index='salesperson', columns='month', values='sales')
print(back_to_wide)
print("\nSuccessfully converted back to wide format!")

=== MELT() EXAMPLES ===

Example 1: Basic melt - wide to long

BEFORE (Wide):
  salesperson  Jan  Feb  Mar
0       Alice  100  120  110
1         Bob   90   95  100
2     Charlie   85   88   92

AFTER (Long):
  salesperson month  sales
0       Alice   Jan    100
1         Bob   Jan     90
2     Charlie   Jan     85
3       Alice   Feb    120
4         Bob   Feb     95
5     Charlie   Feb     88
6       Alice   Mar    110
7         Bob   Mar    100
8     Charlie   Mar     92

Shape changed from (3, 4) to (9, 3)

Example 2: Melt without specifying value_vars
  salesperson month  sales
0       Alice   Jan    100
1         Bob   Jan     90
2     Charlie   Jan     85
3       Alice   Feb    120
4         Bob   Feb     95
5     Charlie   Feb     88
6       Alice   Mar    110
7         Bob   Mar    100
8     Charlie   Mar     92

All non-id columns are melted automatically

Example 3: Multiple identifier columns

Wide format:
    name region   Q1   Q2   Q3
0  Alice  North  100  110  120
1    B

## 4. stack() and unstack()

### What are stack/unstack?

**stack()** and **unstack()** work with **index levels** (especially MultiIndex)

### Difference from pivot/melt

| Operation | Works on | Use Case |
|-----------|----------|----------|
| **pivot/melt** | Columns | Simple reshaping |
| **stack/unstack** | Index levels | MultiIndex manipulation |

### Operations

```
UNSTACK: Move row index level to columns (wide)
         Long ‚Üí Wide

STACK:   Move column level to row index (long)
         Wide ‚Üí Long
```

### Visual Example

```
Original (with MultiIndex):    unstack() ‚Üí    Unstacked:
Person Month | Sales                         Month    Jan  Feb  Mar
Alice  Jan   | 100                           Person
       Feb   | 120                           Alice    100  120  110
       Mar   | 110                           Bob      90   95   100
Bob    Jan   | 90
       Feb   | 95
       Mar   | 100

                              ‚Üê stack()
```

### Syntax

```python
# Unstack: index to columns
df.unstack()           # Unstack innermost level
df.unstack(level=0)    # Unstack specific level
df.unstack(level='month')  # Unstack by name

# Stack: columns to index
df.stack()             # Stack all columns
df.stack(level=0)      # Stack specific level
```

### When to Use
- Working with MultiIndex DataFrames
- After groupby with multiple columns
- Time series with multiple dimensions
- Quick reshape without column specification

In [6]:
print("=== STACK() AND UNSTACK() EXAMPLES ===\n")

# Create MultiIndex data
mi_data = pd.DataFrame({
    'salesperson': ['Alice', 'Alice', 'Alice', 'Bob', 'Bob', 'Bob'],
    'month': ['Jan', 'Feb', 'Mar', 'Jan', 'Feb', 'Mar'],
    'sales': [100, 120, 110, 90, 95, 100]
}).set_index(['salesperson', 'month'])

# Example 1: Unstack (long to wide)
print("Example 1: unstack() - move index level to columns")
print("\nBEFORE (MultiIndex):")
print(mi_data)

unstacked = mi_data.unstack()
print("\nAFTER unstack():")
print(unstacked)
print("\nMonth moved from index to columns")
print()

# Example 2: Stack (wide to long)
print("Example 2: stack() - move columns to index")
print("\nBEFORE (Wide):")
wide_df = pd.DataFrame({
    'Jan': [100, 90],
    'Feb': [120, 95],
    'Mar': [110, 100]
}, index=['Alice', 'Bob'])
print(wide_df)

stacked = wide_df.stack()
print("\nAFTER stack():")
print(stacked)
print("\nColumns moved to index level")
print()

# Example 3: Round trip
print("Example 3: stack() then unstack() (round trip)")
original = wide_df.copy()
stacked = original.stack()
back_to_original = stacked.unstack()
print("Original:")
print(original)
print("\nAfter stack() then unstack():")
print(back_to_original)
print("\nSuccessfully returned to original format!")
print()

# Example 4: Unstack specific level
print("Example 4: Unstack specific level")
multi_level = sales_long.groupby(['salesperson', 'product', 'month'])['sales_amount'].sum()
print("3-level MultiIndex:")
print(multi_level.head(12))

print("\nUnstack month (innermost level):")
unstacked_month = multi_level.unstack(level='month')
print(unstacked_month.head())
print()

# Example 5: Unstack with fill_value
print("Example 5: Unstack with fill_value")
unstacked_filled = mi_data.unstack(fill_value=0)
print(unstacked_filled)
print()

# Example 6: After groupby
print("Example 6: Common pattern - groupby then unstack")
grouped = sales_long.groupby(['salesperson', 'month'])['sales_amount'].mean()
print("Grouped (MultiIndex Series):")
print(grouped.head())

print("\nUnstacked to wide format:")
wide_from_group = grouped.unstack()
print(wide_from_group.round(0))
print("\nüí° This is equivalent to pivot_table!")

=== STACK() AND UNSTACK() EXAMPLES ===

Example 1: unstack() - move index level to columns

BEFORE (MultiIndex):
                   sales
salesperson month       
Alice       Jan      100
            Feb      120
            Mar      110
Bob         Jan       90
            Feb       95
            Mar      100

AFTER unstack():
            sales          
month         Feb  Jan  Mar
salesperson                
Alice         120  100  110
Bob            95   90  100

Month moved from index to columns

Example 2: stack() - move columns to index

BEFORE (Wide):
       Jan  Feb  Mar
Alice  100  120  110
Bob     90   95  100

AFTER stack():
Alice  Jan    100
       Feb    120
       Mar    110
Bob    Jan     90
       Feb     95
       Mar    100
dtype: int64

Columns moved to index level

Example 3: stack() then unstack() (round trip)
Original:
       Jan  Feb  Mar
Alice  100  120  110
Bob     90   95  100

After stack() then unstack():
       Jan  Feb  Mar
Alice  100  120  110
Bob     90

## 5. crosstab() - Frequency Tables

### What is crosstab()?

**crosstab()** = Create a cross-tabulation (frequency table) of two or more factors

Similar to pivot_table, but designed for **counting** and **frequencies**

### Syntax

```python
pd.crosstab(
    index=df['row_var'],        # Row variable
    columns=df['col_var'],      # Column variable
    values=df['value_col'],     # Optional: values to aggregate
    aggfunc='mean',             # How to aggregate values
    normalize=False,            # Calculate percentages
    margins=True                # Add row/column totals
)
```

### crosstab() vs pivot_table()

| Feature | crosstab() | pivot_table() |
|---------|------------|---------------|
| **Default operation** | Count | Aggregate |
| **Input** | Arrays/Series | DataFrame columns |
| **Primary use** | Frequencies | Summarize values |
| **Normalize** | ‚úÖ Easy | ‚ùå Manual |

### Normalize Options

```python
normalize=False   # Counts (default)
normalize=True    # All cells sum to 1
normalize='index' # Each row sums to 1
normalize='columns'  # Each column sums to 1
```

### Visual Example

```
Data:                    Crosstab:
Person   Product         Product  Laptop  Phone  Tablet
Alice    Laptop          Person
Alice    Phone           Alice       2      3       1
Alice    Laptop          Bob         1      2       2
Alice    Phone           Charlie     3      1       2
Bob      Laptop
...

Shows: How many times each person sold each product
```

### Common Use Cases
- Customer segmentation counts
- Product √ó Region sales distribution
- Survey response analysis
- A/B test results
- Demographic breakdowns

In [7]:
print("=== CROSSTAB() EXAMPLES ===\n")

# Example 1: Basic frequency table
print("Example 1: Frequency of salesperson √ó product combinations")
freq_table = pd.crosstab(
    index=sales_long['salesperson'],
    columns=sales_long['product']
)
print(freq_table)
print("\nShows: Number of transactions for each combination")
print()

# Example 2: With margins (totals)
print("Example 2: Frequency table with row and column totals")
freq_with_total = pd.crosstab(
    index=sales_long['salesperson'],
    columns=sales_long['product'],
    margins=True,
    margins_name='Total'
)
print(freq_with_total)
print()

# Example 3: Normalize to percentages
print("Example 3: Percentage distribution (all cells sum to 100%)")
pct_all = pd.crosstab(
    index=sales_long['salesperson'],
    columns=sales_long['product'],
    normalize=True
) * 100
print(pct_all.round(1))
print()

# Example 4: Row percentages
print("Example 4: Row percentages (each row sums to 100%)")
pct_row = pd.crosstab(
    index=sales_long['salesperson'],
    columns=sales_long['product'],
    normalize='index'
) * 100
print(pct_row.round(1))
print("\nShows: Product mix for each salesperson")
print()

# Example 5: Column percentages
print("Example 5: Column percentages (each column sums to 100%)")
pct_col = pd.crosstab(
    index=sales_long['salesperson'],
    columns=sales_long['product'],
    normalize='columns'
) * 100
print(pct_col.round(1))
print("\nShows: Market share per product")
print()

# Example 6: Crosstab with values (like pivot_table)
print("Example 6: Crosstab with aggregation (total sales)")
sales_crosstab = pd.crosstab(
    index=sales_long['salesperson'],
    columns=sales_long['product'],
    values=sales_long['sales_amount'],
    aggfunc='sum'
)
print(sales_crosstab)
print()

# Example 7: Multiple row/column variables
print("Example 7: Multi-dimensional crosstab")
multi_cross = pd.crosstab(
    index=[sales_long['region'], sales_long['salesperson']],
    columns=sales_long['product'],
    margins=True
)
print(multi_cross)
print()

# Example 8: Practical - customer segmentation
print("Example 8: Customer segmentation analysis")
# Create age groups
customers = pd.DataFrame({
    'customer': ['C1', 'C2', 'C3', 'C4', 'C5', 'C6', 'C7', 'C8'],
    'age_group': ['18-25', '18-25', '26-35', '26-35', '36-50', '36-50', '50+', '50+'],
    'product': ['Phone', 'Phone', 'Laptop', 'Tablet', 'Laptop', 'Laptop', 'Tablet', 'Phone']
})

segment_table = pd.crosstab(
    index=customers['age_group'],
    columns=customers['product'],
    normalize='index'
) * 100

print("\nProduct preferences by age group:")
print(segment_table.round(1))
print("\nInsight: 18-25 prefers phones, 36-50 prefers laptops")

=== CROSSTAB() EXAMPLES ===

Example 1: Frequency of salesperson √ó product combinations
product      Laptop  Phone  Tablet
salesperson                       
Alice             4      4       4
Bob               4      4       4
Charlie           4      4       4
Diana             4      4       4

Shows: Number of transactions for each combination

Example 2: Frequency table with row and column totals
product      Laptop  Phone  Tablet  Total
salesperson                              
Alice             4      4       4     12
Bob               4      4       4     12
Charlie           4      4       4     12
Diana             4      4       4     12
Total            16     16      16     48

Example 3: Percentage distribution (all cells sum to 100%)
product      Laptop  Phone  Tablet
salesperson                       
Alice           8.3    8.3     8.3
Bob             8.3    8.3     8.3
Charlie         8.3    8.3     8.3
Diana           8.3    8.3     8.3

Example 4: Row percentages (e

## 6. Method Comparison & Selection Guide

### Decision Tree

```
What do you need to do?
‚îÇ
‚îú‚îÄ Wide ‚Üí Long (unpivot)
‚îÇ   ‚îî‚îÄ Use melt()
‚îÇ
‚îú‚îÄ Long ‚Üí Wide (pivot)
‚îÇ   ‚îú‚îÄ No duplicates?
‚îÇ   ‚îÇ   ‚îî‚îÄ Use pivot()
‚îÇ   ‚îî‚îÄ With duplicates (need aggregation)?
‚îÇ       ‚îî‚îÄ Use pivot_table()
‚îÇ
‚îú‚îÄ Frequency/count table
‚îÇ   ‚îî‚îÄ Use crosstab()
‚îÇ
‚îî‚îÄ MultiIndex manipulation
    ‚îú‚îÄ Index ‚Üí Columns: unstack()
    ‚îî‚îÄ Columns ‚Üí Index: stack()
```

### Method Comparison Table

| Method | Direction | Input | Aggregation | Use Case |
|--------|-----------|-------|-------------|----------|
| **pivot()** | Long ‚Üí Wide | DataFrame | ‚ùå No | Simple reshape, unique keys |
| **pivot_table()** | Long ‚Üí Wide | DataFrame | ‚úÖ Yes | Summarize with aggregation |
| **melt()** | Wide ‚Üí Long | DataFrame | ‚ùå No | Unpivot, create tidy data |
| **stack()** | Wide ‚Üí Long | DataFrame | ‚ùå No | Index/column manipulation |
| **unstack()** | Long ‚Üí Wide | MultiIndex | ‚ùå No | Flatten MultiIndex |
| **crosstab()** | Data ‚Üí Table | Series/Arrays | ‚úÖ Yes | Frequency tables, counts |

### Quick Selection Guide

**Q: I have months as columns, want them as rows?**
‚Üí Use `melt()`

**Q: I want to create a report with products as columns and regions as rows?**
‚Üí Use `pivot_table()`

**Q: I need to count how many times each category √ó subcategory appears?**
‚Üí Use `crosstab()`

**Q: After groupby, I want to spread one column into multiple columns?**
‚Üí Use `unstack()`

**Q: I have unique data and just need to reshape?**
‚Üí Use `pivot()`

### Equivalent Operations

```python
# These produce similar results:

# Method 1: pivot_table
df.pivot_table(index='A', columns='B', values='C', aggfunc='sum')

# Method 2: groupby + unstack
df.groupby(['A', 'B'])['C'].sum().unstack()

# Method 3: crosstab with values
pd.crosstab(df['A'], df['B'], values=df['C'], aggfunc='sum')
```

In [8]:
print("=== METHOD COMPARISON EXAMPLES ===\n")

# Example 1: Same result, different methods
print("Example 1: Three ways to get salesperson √ó month totals\n")

print("Method 1: pivot_table()")
method1 = sales_long.pivot_table(
    index='salesperson',
    columns='month',
    values='sales_amount',
    aggfunc='sum'
)
print(method1.head())
print()

print("Method 2: groupby() + unstack()")
method2 = sales_long.groupby(['salesperson', 'month'])['sales_amount'].sum().unstack()
print(method2.head())
print()

print("Method 3: crosstab() with values")
method3 = pd.crosstab(
    index=sales_long['salesperson'],
    columns=sales_long['month'],
    values=sales_long['sales_amount'],
    aggfunc='sum'
)
print(method3.head())
print()
print("All three methods produce the same result!\n")

# Example 2: When to use which
print("="*70)
print("Example 2: Method selection based on task\n")

print("Task: Spreadsheet ‚Üí Database format")
print("Method: melt()")
wide_data = pd.DataFrame({
    'id': [1, 2],
    'Q1': [100, 90],
    'Q2': [110, 95]
})
long_data = wide_data.melt(id_vars=['id'], var_name='quarter', value_name='sales')
print(long_data)
print()

print("Task: Create summary report")
print("Method: pivot_table()")
summary = sales_long.pivot_table(
    index='product',
    columns='region',
    values='sales_amount',
    aggfunc='sum',
    fill_value=0
)
print(summary)
print()

print("Task: Count occurrences")
print("Method: crosstab()")
counts = pd.crosstab(
    index=sales_long['product'],
    columns=sales_long['region']
)
print(counts)
print()

print("Task: Flatten MultiIndex from groupby")
print("Method: unstack()")
grouped = sales_long.groupby(['salesperson', 'product'])['sales_amount'].mean()
flattened = grouped.unstack().round(0)
print(flattened.head())

=== METHOD COMPARISON EXAMPLES ===

Example 1: Three ways to get salesperson √ó month totals

Method 1: pivot_table()
month          Apr    Feb    Jan    Mar
salesperson                            
Alice        18412  19789  14098  12332
Bob          18213  14670  14079  18127
Charlie      19377  14397  15714  17751
Diana        18231  22260  15940  11429

Method 2: groupby() + unstack()
month          Apr    Feb    Jan    Mar
salesperson                            
Alice        18412  19789  14098  12332
Bob          18213  14670  14079  18127
Charlie      19377  14397  15714  17751
Diana        18231  22260  15940  11429

Method 3: crosstab() with values
month          Apr    Feb    Jan    Mar
salesperson                            
Alice        18412  19789  14098  12332
Bob          18213  14670  14079  18127
Charlie      19377  14397  15714  17751
Diana        18231  22260  15940  11429

All three methods produce the same result!

Example 2: Method selection based on task

Task: S

## 7. Real-World Reshaping Scenarios

### Scenario 1: Excel Report ‚Üí Analysis Format

**Problem**: Excel data has months as columns
**Solution**: Use `melt()` to convert to long format

### Scenario 2: Database ‚Üí Executive Dashboard

**Problem**: Need to create comparison table
**Solution**: Use `pivot_table()` with aggregation

### Scenario 3: Time Series Forecasting

**Problem**: Each product in separate column
**Solution**: Use `melt()` to stack, then model

### Scenario 4: A/B Test Results

**Problem**: Count conversions by variant √ó segment
**Solution**: Use `crosstab()` with normalization

### Scenario 5: Multi-Level Reporting

**Problem**: Group by region, product, month
**Solution**: Use `pivot_table()` with multiple index/columns

### Common Business Reports

```python
# 1. Product Performance Matrix
df.pivot_table(
    index='product',
    columns='quarter',
    values='revenue',
    aggfunc='sum',
    margins=True
)

# 2. Regional Comparison
df.pivot_table(
    index='region',
    columns='product_category',
    values=['revenue', 'units'],
    aggfunc='sum'
)

# 3. Customer Segmentation
pd.crosstab(
    index=df['age_group'],
    columns=df['purchase_category'],
    normalize='index'
) * 100
```

In [9]:
print("=== REAL-WORLD RESHAPING SCENARIOS ===\n")

# Scenario 1: Excel report to analysis format
print("Scenario 1: Convert quarterly Excel report to analysis format\n")
excel_report = pd.DataFrame({
    'Product': ['Laptop', 'Phone', 'Tablet'],
    'Q1_2024': [50000, 80000, 30000],
    'Q2_2024': [55000, 85000, 32000],
    'Q3_2024': [60000, 90000, 35000],
    'Q4_2024': [65000, 95000, 38000]
})

print("Original Excel format (wide):")
print(excel_report)

analysis_format = excel_report.melt(
    id_vars=['Product'],
    var_name='Quarter',
    value_name='Revenue'
)
print("\nConverted to analysis format (long):")
print(analysis_format)

print("\nNow we can easily:")
print(f"- Calculate total per product: {analysis_format.groupby('Product')['Revenue'].sum().values}")
print(f"- Find best quarter: {analysis_format.groupby('Quarter')['Revenue'].sum().idxmax()}")
print()

# Scenario 2: Executive dashboard
print("="*70)
print("Scenario 2: Create executive summary table\n")

dashboard = sales_long.pivot_table(
    index='product',
    columns='region',
    values='sales_amount',
    aggfunc=['sum', 'mean'],
    margins=True,
    margins_name='TOTAL/AVG'
)
print("Product √ó Region Performance Dashboard:")
print(dashboard.round(0))
print()

# Scenario 3: Market share analysis
print("="*70)
print("Scenario 3: Market share by product and region\n")

market_share = pd.crosstab(
    index=sales_long['product'],
    columns=sales_long['region'],
    values=sales_long['sales_amount'],
    aggfunc='sum',
    normalize='columns'
) * 100

print("Market Share % (by region):")
print(market_share.round(1))
print("\nEach column sums to 100%")
print()

# Scenario 4: Time-based comparison
print("="*70)
print("Scenario 4: Month-over-month comparison\n")

monthly = sales_long.pivot_table(
    index='product',
    columns='month',
    values='sales_amount',
    aggfunc='sum'
)

print("Monthly sales by product:")
print(monthly)

# Calculate month-over-month change
print("\nMonth-over-month change:")
mom_change = monthly.pct_change(axis=1) * 100
print(mom_change.round(1))
print()

# Scenario 5: Multi-dimensional analysis
print("="*70)
print("Scenario 5: Multi-dimensional performance report\n")

multi_dim = sales_long.pivot_table(
    index=['region', 'product'],
    columns='month',
    values='sales_amount',
    aggfunc='sum',
    fill_value=0
)

print("Region √ó Product √ó Month analysis:")
print(multi_dim)
print("\nHierarchical view of performance across all dimensions")

=== REAL-WORLD RESHAPING SCENARIOS ===

Scenario 1: Convert quarterly Excel report to analysis format

Original Excel format (wide):
  Product  Q1_2024  Q2_2024  Q3_2024  Q4_2024
0  Laptop    50000    55000    60000    65000
1   Phone    80000    85000    90000    95000
2  Tablet    30000    32000    35000    38000

Converted to analysis format (long):
   Product  Quarter  Revenue
0   Laptop  Q1_2024    50000
1    Phone  Q1_2024    80000
2   Tablet  Q1_2024    30000
3   Laptop  Q2_2024    55000
4    Phone  Q2_2024    85000
5   Tablet  Q2_2024    32000
6   Laptop  Q3_2024    60000
7    Phone  Q3_2024    90000
8   Tablet  Q3_2024    35000
9   Laptop  Q4_2024    65000
10   Phone  Q4_2024    95000
11  Tablet  Q4_2024    38000

Now we can easily:
- Calculate total per product: [230000 350000 135000]
- Find best quarter: Q4_2024

Scenario 2: Create executive summary table

Product √ó Region Performance Dashboard:
             sum                             mean                          
reg

## 8. Best Practices & Common Pitfalls

### Best Practices ‚úÖ

**1. Choose the Right Method**
```python
# ‚úÖ Simple reshape, unique keys
df.pivot(index='A', columns='B', values='C')

# ‚úÖ Need aggregation, duplicates exist
df.pivot_table(index='A', columns='B', values='C', aggfunc='sum')

# ‚úÖ Frequency counts
pd.crosstab(df['A'], df['B'])
```

**2. Use Meaningful Names**
```python
# ‚úÖ Clear variable names
df.melt(id_vars=['customer_id'], 
        var_name='month', 
        value_name='revenue')

# ‚ùå Default names
df.melt(id_vars=['customer_id'])
```

**3. Handle Missing Values**
```python
# ‚úÖ Fill NaN explicitly
df.pivot_table(..., fill_value=0)

# ‚úÖ Or handle after
pivoted.fillna(0)
```

**4. Reset Index When Needed**
```python
# ‚úÖ Clean DataFrame for further analysis
result = df.pivot_table(...).reset_index()

# ‚ùå Index makes access harder
result = df.pivot_table(...)
```

**5. Add Totals for Reports**
```python
# ‚úÖ Include margins for complete view
df.pivot_table(..., margins=True, margins_name='Total')
```

### Common Pitfalls ‚ùå

**1. Using pivot() with Duplicates**
```python
# ‚ùå Will raise ValueError
df.pivot(index='A', columns='B', values='C')  # If A-B duplicates exist

# ‚úÖ Use pivot_table instead
df.pivot_table(index='A', columns='B', values='C', aggfunc='sum')
```

**2. Forgetting aggfunc in pivot_table**
```python
# ‚ùå Default is 'mean', might not be what you want
df.pivot_table(index='A', columns='B', values='C')

# ‚úÖ Explicit aggregation
df.pivot_table(index='A', columns='B', values='C', aggfunc='sum')
```

**3. Not Specifying value_vars in melt**
```python
# ‚ùå Melts ALL columns except id_vars
df.melt(id_vars=['id'])

# ‚úÖ Specify which columns to melt
df.melt(id_vars=['id'], value_vars=['Jan', 'Feb', 'Mar'])
```

**4. Ignoring MultiIndex Columns**
```python
# ‚ùå Complicated MultiIndex columns
result = df.pivot_table(index='A', columns='B', values=['X', 'Y'])
# Accessing: result[('X', 'B1')] is confusing

# ‚úÖ Flatten if needed
result.columns = ['_'.join(col) for col in result.columns]
```

**5. Not Handling NaN After Reshape**
```python
# ‚ùå NaN values remain
pivoted = df.pivot_table(...)

# ‚úÖ Handle NaN appropriately
pivoted = df.pivot_table(..., fill_value=0)
# or
pivoted = pivoted.fillna(method='ffill')  # Forward fill
```

### Performance Tips üöÄ

**1. Use pivot over pivot_table when possible**
```python
# Faster if data is already aggregated
df.pivot(...)  # vs df.pivot_table(...)
```

**2. Filter Before Reshaping**
```python
# ‚úÖ Filter first
df_filtered = df[df['year'] == 2024]
df_filtered.pivot_table(...)

# ‚ùå Reshape then filter (slower)
df.pivot_table(...).loc[df['year'] == 2024]
```

**3. Use Categorical Data Types**
```python
# Faster for repeated values
df['category'] = df['category'].astype('category')
df.pivot_table(...)
```

## 9. Practice Exercises

Use the sales_long dataset for these exercises.

### Beginner Level (1-5)

1. **Basic pivot**
   - Create salesperson √ó month sales table using pivot_table

2. **Simple melt**
   - Convert a wide format table to long format

3. **Frequency table**
   - Count occurrences of product √ó region using crosstab

4. **Add totals**
   - Create pivot table with row and column totals

5. **Fill missing values**
   - Pivot table with fill_value=0

### Intermediate Level (6-10)

6. **Multiple aggregations**
   - Pivot table with both sum and mean

7. **Percentage distribution**
   - Use crosstab with normalize='index'

8. **Unstack after groupby**
   - Group by two columns, then unstack

9. **Multi-level pivot**
   - Pivot with multiple index columns

10. **Round trip**
    - Pivot data, then melt it back

### Advanced Level (11-15)

11. **Complex melt**
    - Melt with multiple id_vars and custom names

12. **Market share calculation**
    - Pivot table with column-wise percentages

13. **Time series pivot**
    - Create month-over-month comparison table

14. **Multi-dimensional report**
    - Pivot with 3+ dimensions using MultiIndex

15. **Custom aggregation**
    - Pivot table with custom aggregation function

### Challenge Problems (16-20)

16. **Executive dashboard**
    - Create comprehensive report with multiple metrics

17. **Growth rate analysis**
    - Pivot by time period, calculate growth rates

18. **Customer cohort analysis**
    - Reshape and analyze customer behavior over time

19. **Product portfolio matrix**
    - Create BCG-style matrix (sales vs growth)

20. **Automated report generator**
    - Function that takes parameters and generates formatted pivot table

In [10]:
print("=== PRACTICE EXERCISE SOLUTIONS ===\n")
print("Try solving exercises first, then check solutions!\n")

# Solution 1
print("Solution 1: Basic pivot - salesperson √ó month")
sol1 = sales_long.pivot_table(
    index='salesperson',
    columns='month',
    values='sales_amount',
    aggfunc='sum'
)
print(sol1)
print()

# Solution 3
print("Solution 3: Frequency table - product √ó region")
sol3 = pd.crosstab(
    index=sales_long['product'],
    columns=sales_long['region']
)
print(sol3)
print()

# Solution 4
print("Solution 4: Pivot table with totals")
sol4 = sales_long.pivot_table(
    index='product',
    columns='region',
    values='sales_amount',
    aggfunc='sum',
    margins=True,
    margins_name='TOTAL'
)
print(sol4)
print()

# Solution 6
print("Solution 6: Multiple aggregations (sum and mean)")
sol6 = sales_long.pivot_table(
    index='salesperson',
    columns='product',
    values='sales_amount',
    aggfunc=['sum', 'mean']
)
print(sol6.round(0))
print()

# Solution 7
print("Solution 7: Percentage distribution (row percentages)")
sol7 = pd.crosstab(
    index=sales_long['salesperson'],
    columns=sales_long['product'],
    normalize='index'
) * 100
print(sol7.round(1))
print()

# Solution 12
print("Solution 12: Market share calculation")
sol12 = pd.crosstab(
    index=sales_long['product'],
    columns=sales_long['region'],
    values=sales_long['sales_amount'],
    aggfunc='sum',
    normalize='columns'
) * 100
print("Market share % by region:")
print(sol12.round(1))
print()

# Solution 15
print("Solution 15: Custom aggregation function")
def revenue_range(x):
    return x.max() - x.min()

sol15 = sales_long.pivot_table(
    index='product',
    columns='region',
    values='sales_amount',
    aggfunc=revenue_range
)
print("Revenue range by product √ó region:")
print(sol15)
print()

# Solution 20
print("Solution 20: Automated report generator")
def generate_sales_report(df, row_var, col_var, metric='sales_amount', agg='sum'):
    """Generate formatted sales report"""
    report = df.pivot_table(
        index=row_var,
        columns=col_var,
        values=metric,
        aggfunc=agg,
        margins=True,
        margins_name='TOTAL'
    )
    return report.round(0)

print("Product √ó Month report:")
print(generate_sales_report(sales_long, 'product', 'month'))

print("\n" + "="*80)
print("Try solving the remaining exercises on your own!")
print("="*80)

=== PRACTICE EXERCISE SOLUTIONS ===

Try solving exercises first, then check solutions!

Solution 1: Basic pivot - salesperson √ó month
month          Apr    Feb    Jan    Mar
salesperson                            
Alice        18412  19789  14098  12332
Bob          18213  14670  14079  18127
Charlie      19377  14397  15714  17751
Diana        18231  22260  15940  11429

Solution 3: Frequency table - product √ó region
region   East  North  West
product                   
Laptop      4      8     4
Phone       4      8     4
Tablet      4      8     4

Solution 4: Pivot table with totals
region    East   North   West   TOTAL
product                              
Laptop   18600   50487  22511   91598
Phone    23370   40810  22132   86312
Tablet   22661   41652  22596   86909
TOTAL    64631  132949  67239  264819

Solution 6: Multiple aggregations (sum and mean)
               sum                  mean                
product     Laptop  Phone Tablet  Laptop   Phone  Tablet
salesperson

## Quick Reference Card

### pivot() - Simple Reshape

```python
# Basic pivot (no duplicates required)
df.pivot(index='row_labels', 
         columns='col_labels', 
         values='data')
```

### pivot_table() - With Aggregation

```python
# Full-featured pivoting
df.pivot_table(
    index='rows',              # Row labels
    columns='cols',            # Column labels
    values='data',             # Values to aggregate
    aggfunc='sum',             # mean, sum, count, etc.
    fill_value=0,              # Replace NaN
    margins=True,              # Add totals
    margins_name='Total'       # Name for totals
)
```

### melt() - Wide to Long

```python
# Unpivot data
df.melt(
    id_vars=['id_cols'],       # Columns to keep
    value_vars=['cols_to_melt'], # Columns to unpivot
    var_name='variable',       # Name for variable column
    value_name='value'         # Name for value column
)
```

### stack() / unstack()

```python
# Move index level to columns
df.unstack()                   # Unstack innermost level
df.unstack(level='month')      # Unstack specific level

# Move columns to index
df.stack()                     # Stack all columns
```

### crosstab() - Frequency Tables

```python
# Count occurrences
pd.crosstab(df['row'], df['col'])

# With aggregation
pd.crosstab(
    index=df['row'],
    columns=df['col'],
    values=df['data'],
    aggfunc='sum',
    normalize=False,           # 'index', 'columns', True, or False
    margins=True
)
```

### Common Patterns

```python
# Pattern 1: Excel to analysis
df.melt(id_vars=['id'], var_name='period', value_name='amount')

# Pattern 2: Create summary report
df.pivot_table(index='product', columns='region', 
               values='sales', aggfunc='sum', margins=True)

# Pattern 3: Market share
pd.crosstab(df['product'], df['region'], 
            values=df['sales'], aggfunc='sum', 
            normalize='columns') * 100

# Pattern 4: Groupby + unstack
df.groupby(['A', 'B'])['value'].sum().unstack()

# Pattern 5: Flatten MultiIndex columns
df.columns = ['_'.join(col).strip() for col in df.columns.values]
```

### Quick Decision Guide

```
Wide ‚Üí Long?
  ‚Üí Use melt()

Long ‚Üí Wide?
  ‚îú‚îÄ No duplicates? ‚Üí pivot()
  ‚îî‚îÄ With duplicates? ‚Üí pivot_table()

Need frequencies?
  ‚Üí Use crosstab()

MultiIndex manipulation?
  ‚îú‚îÄ Index ‚Üí Columns: unstack()
  ‚îî‚îÄ Columns ‚Üí Index: stack()
```

## Summary

### Key Concepts Mastered ‚úÖ

**1. Reshaping Fundamentals**
- **Wide format**: Columns for variables (reporting)
- **Long format**: Rows for observations (analysis)
- When to use each format
- Converting between formats

**2. Core Methods**
- **pivot()**: Simple reshape (unique keys required)
- **pivot_table()**: Reshape with aggregation (handles duplicates)
- **melt()**: Wide ‚Üí Long (unpivoting)
- **stack/unstack()**: Index level manipulation
- **crosstab()**: Frequency tables and contingency analysis

**3. Advanced Techniques**
- Multiple aggregation functions
- Multi-level pivoting
- Adding row/column totals
- Percentage distributions
- Handling missing values
- MultiIndex flattening

---

### Method Comparison

| Method | Direction | Aggregation | Use Case |
|--------|-----------|-------------|----------|
| **pivot()** | Long ‚Üí Wide | ‚ùå | Pre-aggregated data |
| **pivot_table()** | Long ‚Üí Wide | ‚úÖ | Raw data summarization |
| **melt()** | Wide ‚Üí Long | ‚ùå | Tidy data creation |
| **unstack()** | Long ‚Üí Wide | ‚ùå | MultiIndex to columns |
| **stack()** | Wide ‚Üí Long | ‚ùå | Columns to MultiIndex |
| **crosstab()** | Data ‚Üí Table | ‚úÖ | Frequency analysis |

---

### Real-World Applications

**Business Reporting**
- Executive dashboards
- Product performance matrices
- Regional comparisons
- Time series summaries

**Data Analysis**
- Time series forecasting
- Customer segmentation
- Market share analysis
- A/B test results

**Data Preparation**
- Excel ‚Üí Database format
- Feature engineering
- Tidy data creation
- ML model input formatting

---

### Common Workflows

**Workflow 1: Excel Report to Analysis**
```
1. Read wide Excel file
2. melt() to long format
3. Clean and analyze
4. pivot_table() for summary
5. Export report
```

**Workflow 2: Database to Dashboard**
```
1. Query long-format database
2. pivot_table() with aggregation
3. Add margins for totals
4. Format and visualize
```

**Workflow 3: Time Series Analysis**
```
1. Wide time series data
2. melt() to long format
3. Apply forecasting models
4. pivot() results back
5. Calculate period-over-period changes
```

---

### Remember

- üìä **pivot_table** is the most versatile (use when in doubt)
- üîÑ **melt** is the opposite of **pivot**
- üìà **crosstab** is best for frequencies and percentages
- üéØ **Fill NaN** explicitly to avoid confusion
- üè∑Ô∏è **Use meaningful names** for var_name and value_name
- ‚úÖ **Add margins** for complete business reports

---

### Next Steps

After mastering pivot and reshape:
1. **Time Series** - Date-based reshaping and resampling
2. **MultiIndex** - Advanced hierarchical indexing
3. **Visualization** - Plot reshaped data
4. **SQL Integration** - Pivot in database queries
5. **Advanced Aggregation** - Custom functions and window operations

---

**Happy Reshaping! üêºüîÑ**