# 🌳 Pandas Mastery Series - Advanced

Welcome to the Pandas Mastery Series - Advanced! In this notebook, we will explore advanced pandas topics to enhance your data manipulation and analysis skills. This series will cover complex operations and provide practical examples to deepen your understanding of pandas. Let's dive into the advanced techniques!

## Table of Contents

### 1. **MultiIndex**
- Creating MultiIndex
- Accessing MultiIndex
- Advanced Indexing with MultiIndex

### 2. **Advanced GroupBy**
- Grouping by Multiple Columns
- Custom Aggregations
- Grouping with Functions

### 3. **Reshaping**
- Pivot and Pivot Tables
- Stack and Unstack
- Melting DataFrames

### 4. **Time Series**
- Date Range Generation
- Time Zone Handling
- Time Series Resampling

### 5. **Merging, Joining, and Concatenating**
- Concatenating DataFrames
- Merging on Index
- Advanced Joining Techniques

### 6. **Window Functions**
- Rolling Windows
- Expanding Windows
- Applying Custom Functions

### 7. **Text Data**
- String Methods
- Regular Expressions
- Extracting Information from Text

### 8. **Performance and Optimization**
- Efficient Data Loading
- Memory Usage Reduction
- Parallel Processing

### 9. **Visualization**
- Plotting with Pandas
- Advanced Plotting Techniques
- Integrating with Other Libraries

### 10. **Fun Challenges**
- Challenge 1: The MultiIndex Mystery
- Challenge 2: The GroupBy Gauntlet
- Challenge 3: The Reshaping Riddle
- Challenge 4: The Time Series Trial
- Challenge 5: The Optimization Obstacle

### Ready for the Ultimate Challenge?

Once you've completed all the notebooks in the Pandas Mastery Series, you'll be ready to tackle the final challenge: [Pandas Mastery Series - Ultimate Challenge](https://www.kaggle.com/code/matinmahmoudi/pandas-mastery-series-ultimate-challenge). This ultimate challenge will put your pandas skills to the test and ensure you're truly a pandas master.

Let's get started and become pandas advanced masters!


# 1. MultiIndex

MultiIndex (also known as hierarchical indexing) is an advanced indexing technique that allows for multiple levels of indexing within a pandas DataFrame or Series. This feature enables more complex data analysis and manipulation.

### Creating MultiIndex
MultiIndex can be created from arrays, lists, or tuples. You can also set an existing DataFrame's index to be a MultiIndex.

### Accessing MultiIndex
Accessing data in a MultiIndex DataFrame or Series involves using a combination of levels and labels.

### Advanced Indexing with MultiIndex
Advanced indexing techniques, such as slicing and indexing with multiple levels, can be performed on MultiIndex objects.


In [1]:
# Import pandas library
import pandas as pd

arrays = [
    ['Hobbit', 'Hobbit', 'Wizard', 'Human', 'Elf'],
    ['Frodo', 'Sam', 'Gandalf', 'Aragorn', 'Legolas']
]
multi_index = pd.MultiIndex.from_arrays(arrays, names=('Race', 'Character'))
print("MultiIndex from arrays:\n", multi_index)

# Creating a DataFrame with MultiIndex
data = {
    'Age': [50, 38, 2019, 87, 2931],
    'Role': ['Ring-bearer', 'Gardener', 'Wizard', 'King', 'Archer']
}
df_multi = pd.DataFrame(data, index=multi_index)
print("\nDataFrame with MultiIndex:\n", df_multi)

# Accessing MultiIndex
# Accessing data for 'Hobbit' race
hobbit_data = df_multi.loc['Hobbit']
print("\nData for 'Hobbit' race:\n", hobbit_data)

# Accessing data for 'Gandalf' in 'Wizard' race
gandalf_data = df_multi.loc[('Wizard', 'Gandalf')]
print("\nData for 'Gandalf' in 'Wizard' race:\n", gandalf_data)

# Advanced Indexing with MultiIndex
# Slicing data for 'Hobbit' and 'Wizard' races
hobbit_wizard_data = df_multi.loc[['Hobbit', 'Wizard']]
print("\nData for 'Hobbit' and 'Wizard' races:\n", hobbit_wizard_data)


MultiIndex from arrays:
 MultiIndex([('Hobbit',   'Frodo'),
            ('Hobbit',     'Sam'),
            ('Wizard', 'Gandalf'),
            ( 'Human', 'Aragorn'),
            (   'Elf', 'Legolas')],
           names=['Race', 'Character'])

DataFrame with MultiIndex:
                    Age         Role
Race   Character                   
Hobbit Frodo        50  Ring-bearer
       Sam          38     Gardener
Wizard Gandalf    2019       Wizard
Human  Aragorn      87         King
Elf    Legolas    2931       Archer

Data for 'Hobbit' race:
            Age         Role
Character                  
Frodo       50  Ring-bearer
Sam         38     Gardener

Data for 'Gandalf' in 'Wizard' race:
 Age       2019
Role    Wizard
Name: (Wizard, Gandalf), dtype: object

Data for 'Hobbit' and 'Wizard' races:
                    Age         Role
Race   Character                   
Hobbit Frodo        50  Ring-bearer
       Sam          38     Gardener
Wizard Gandalf    2019       Wizard


# 2. Advanced GroupBy

The GroupBy operation in pandas is powerful for aggregating data. Advanced GroupBy techniques include grouping by multiple columns, custom aggregations, and grouping with functions.

### Grouping by Multiple Columns
You can group data by more than one column, allowing for more granular analysis.

### Custom Aggregations
Custom aggregation functions can be applied to grouped data for specific calculations.

### Grouping with Functions
You can group data using custom functions to define the grouping criteria.


In [2]:
# Import pandas library
import pandas as pd

# Creating a DataFrame for GroupBy operations
data = {
    'Character': ['Frodo', 'Sam', 'Gandalf', 'Aragorn', 'Legolas', 'Boromir', 'Gimli', 'Pippin', 'Merry'],
    'Race': ['Hobbit', 'Hobbit', 'Wizard', 'Human', 'Elf', 'Human', 'Dwarf', 'Hobbit', 'Hobbit'],
    'Age': [50, 38, 2019, 87, 2931, 41, 139, 29, 37]
}
df = pd.DataFrame(data)

# Grouping by multiple columns
grouped_multi = df.groupby(['Race', 'Character'])['Age'].mean()
print("Grouped by Race and Character:\n", grouped_multi)

# Custom Aggregations
# Define custom aggregation functions
def range_agg(series):
    return series.max() - series.min()

custom_agg = df.groupby('Race').agg(
    count=('Age', 'size'),
    mean_age=('Age', 'mean'),
    age_range=('Age', range_agg)
)
print("\nCustom Aggregations:\n", custom_agg)

# Grouping with Functions
# Define a function to group characters into 'Young' and 'Old'
def age_group(age):
    return 'Young' if age < 100 else 'Old'

# Apply the function to the 'Age' column to create a new column 'AgeGroup'
df['AgeGroup'] = df['Age'].apply(age_group)

# Group by 'AgeGroup' and calculate the mean for numeric columns
grouped_func = df.groupby('AgeGroup').mean(numeric_only=True)
print("\nGrouped by Age Group:\n", grouped_func)



Grouped by Race and Character:
 Race    Character
Dwarf   Gimli         139.0
Elf     Legolas      2931.0
Hobbit  Frodo          50.0
        Merry          37.0
        Pippin         29.0
        Sam            38.0
Human   Aragorn        87.0
        Boromir        41.0
Wizard  Gandalf      2019.0
Name: Age, dtype: float64

Custom Aggregations:
         count  mean_age  age_range
Race                              
Dwarf       1     139.0          0
Elf         1    2931.0          0
Hobbit      4      38.5         21
Human       2      64.0         46
Wizard      1    2019.0          0

Grouped by Age Group:
                   Age
AgeGroup             
Old       1696.333333
Young       47.000000


# 3. Reshaping

Reshaping data in pandas involves changing the layout of a DataFrame or Series. This can include pivoting, stacking, unstacking, and melting data.

### Pivot and Pivot Tables
Pivoting involves reshaping data to form a different DataFrame, typically for summary statistics.

### Stack and Unstack
Stacking involves compressing a level in a MultiIndex to columns, while unstacking involves expanding a level in a MultiIndex to rows.

### Melting DataFrames
Melting converts a DataFrame from a wide format to a long format.


In [3]:
# Import pandas library
import pandas as pd

data = {
    'Character': ['Frodo', 'Sam', 'Gandalf', 'Aragorn', 'Legolas'],
    'Race': ['Hobbit', 'Hobbit', 'Wizard', 'Human', 'Elf'],
    'Age': [50, 38, 2019, 87, 2931],
    'Role': ['Ring-bearer', 'Gardener', 'Wizard', 'King', 'Archer']
}
df = pd.DataFrame(data)

# Pivot Table
pivot_table = df.pivot_table(values='Age', index='Race', columns='Role', aggfunc='mean')
print("Pivot Table:\n", pivot_table)

# Stack
stacked = df.set_index(['Race', 'Character']).stack()
print("\nStacked DataFrame:\n", stacked)

# Unstack
unstacked = stacked.unstack()
print("\nUnstacked DataFrame:\n", unstacked)

# Melting
melted = pd.melt(df, id_vars=['Character'], value_vars=['Race', 'Age', 'Role'], var_name='Attribute', value_name='Value')
print("\nMelted DataFrame:\n", melted)


Pivot Table:
 Role    Archer  Gardener  King  Ring-bearer  Wizard
Race                                               
Elf     2931.0       NaN   NaN          NaN     NaN
Hobbit     NaN      38.0   NaN         50.0     NaN
Human      NaN       NaN  87.0          NaN     NaN
Wizard     NaN       NaN   NaN          NaN  2019.0

Stacked DataFrame:
 Race    Character      
Hobbit  Frodo      Age              50
                   Role    Ring-bearer
        Sam        Age              38
                   Role       Gardener
Wizard  Gandalf    Age            2019
                   Role         Wizard
Human   Aragorn    Age              87
                   Role           King
Elf     Legolas    Age            2931
                   Role         Archer
dtype: object

Unstacked DataFrame:
                    Age         Role
Race   Character                   
Elf    Legolas    2931       Archer
Hobbit Frodo        50  Ring-bearer
       Sam          38     Gardener
Human  Aragorn      87