# Getting Started with GPU-Accelerated Pandas 🚀

This notebook demonstrates how to use GPU acceleration with pandas using NVIDIA's cuDF. You'll learn:

1. How to enable GPU acceleration for pandas
2. Basic operations that get automatically accelerated
3. Performance comparisons between CPU and GPU
4. Best practices and tips

No CUDA expertise required - just add one line of code and your existing pandas code runs faster!

## 1. Setup and Installation

First, make sure you have the required packages installed:
```bash
pip install cudf-cu11 dask-cudf-cu11 --extra-index-url=https://pypi.nvidia.com
```

Then, we'll import the necessary libraries and enable GPU acceleration:

In [None]:
# Enable GPU acceleration for pandas operations
%load_ext cudf.pandas

# Import required libraries
import numpy as np
import pandas as pd
import time

# For visualizations
import matplotlib.pyplot as plt
import seaborn as sns

# Set plotting style
plt.style.use('seaborn')
%matplotlib inline

## 2. Generate Sample Data

Let's create a sample dataset to demonstrate the performance benefits of GPU acceleration. We'll generate:
- 10 million rows of transaction data
- Multiple columns including timestamps, amounts, and categories
- Random but realistic-looking data

This will help us compare CPU vs GPU performance on common operations.

In [None]:
# Set random seed for reproducibility
np.random.seed(42)

# Generate sample transaction data
n_rows = 10_000_000
n_customers = 100_000
n_products = 1_000

# Generate customer IDs and product IDs
customer_ids = np.random.randint(1, n_customers + 1, n_rows)
product_ids = np.random.randint(1, n_products + 1, n_rows)

# Generate timestamps over the last year
end_date = pd.Timestamp('2025-08-04')
start_date = end_date - pd.Timedelta(days=365)
timestamps = pd.date_range(start=start_date, end=end_date, periods=n_rows)

# Generate transaction amounts (log-normal distribution)
amounts = np.random.lognormal(mean=3, sigma=1, size=n_rows)

# Create the DataFrame
df = pd.DataFrame({
    'timestamp': timestamps,
    'customer_id': customer_ids,
    'product_id': product_ids,
    'amount': amounts,
})

print(f"Generated DataFrame with shape: {df.shape}")
print("\nSample data:")
print(df.head())

## 3. Basic Operations with GPU Acceleration

Now let's perform some common pandas operations and compare their performance between CPU and GPU:

1. Grouping and aggregation
2. Sorting and filtering
3. Time-based operations

For each operation, we'll measure the execution time on both CPU and GPU to see the speedup.

In [None]:
# Helper function for timing operations
def compare_performance(operation_name, cpu_func, gpu_func):
    # CPU timing
    start = time.time()
    cpu_result = cpu_func()
    cpu_time = time.time() - start
    
    # GPU timing
    start = time.time()
    gpu_result = gpu_func()
    gpu_time = time.time() - start
    
    # Calculate speedup
    speedup = cpu_time / gpu_time
    
    print(f"\n{operation_name}:")
    print(f"CPU time: {cpu_time:.3f} seconds")
    print(f"GPU time: {gpu_time:.3f} seconds")
    print(f"Speedup: {speedup:.1f}x")
    
    return cpu_time, gpu_time

# 1. Grouping and Aggregation
print("Testing groupby and aggregation performance...")

def cpu_groupby():
    return df.groupby('customer_id').agg({
        'amount': ['count', 'mean', 'sum'],
        'product_id': 'nunique'
    })

def gpu_groupby():
    return df.groupby('customer_id').agg({
        'amount': ['count', 'mean', 'sum'],
        'product_id': 'nunique'
    })

cpu_time, gpu_time = compare_performance(
    "Customer Analysis",
    cpu_groupby,
    gpu_groupby
)

In [None]:
# 2. Sorting and Filtering
print("\nTesting sorting and filtering performance...")

def cpu_sort_filter():
    # Find high-value transactions
    return df[df['amount'] > df['amount'].mean()].sort_values('amount', ascending=False)

def gpu_sort_filter():
    # Find high-value transactions
    return df[df['amount'] > df['amount'].mean()].sort_values('amount', ascending=False)

cpu_time, gpu_time = compare_performance(
    "High-value Transaction Analysis",
    cpu_sort_filter,
    gpu_sort_filter
)

# 3. Time-based Analysis
print("\nTesting time-based operations performance...")

def cpu_time_analysis():
    return df.set_index('timestamp').resample('D').agg({
        'amount': 'sum',
        'customer_id': 'nunique',
        'product_id': 'count'
    })

def gpu_time_analysis():
    return df.set_index('timestamp').resample('D').agg({
        'amount': 'sum',
        'customer_id': 'nunique',
        'product_id': 'count'
    })

cpu_time, gpu_time = compare_performance(
    "Daily Transaction Summary",
    cpu_time_analysis,
    gpu_time_analysis
)

## 4. Performance Visualization

Let's create some visualizations to better understand the performance improvements:
1. Bar chart comparing CPU vs GPU times for each operation
2. Speedup factors for different operations
3. Execution time vs data size relationship

In [None]:
# Create performance comparison visualization
operations = ['Groupby & Agg', 'Sort & Filter', 'Time Analysis']
cpu_times = [2.5, 1.8, 3.2]  # Example times
gpu_times = [0.3, 0.2, 0.4]  # Example times

# Create a DataFrame for plotting
performance_df = pd.DataFrame({
    'Operation': operations,
    'CPU Time': cpu_times,
    'GPU Time': gpu_times
})

# Calculate speedup
performance_df['Speedup'] = performance_df['CPU Time'] / performance_df['GPU Time']

# Create comparison plot
plt.figure(figsize=(12, 6))
x = np.arange(len(operations))
width = 0.35

plt.bar(x - width/2, cpu_times, width, label='CPU', alpha=0.8)
plt.bar(x + width/2, gpu_times, width, label='GPU', alpha=0.8)

plt.xlabel('Operation')
plt.ylabel('Time (seconds)')
plt.title('Performance Comparison: CPU vs GPU')
plt.xticks(x, operations)
plt.legend()

# Add speedup annotations
for i, (cpu_time, gpu_time) in enumerate(zip(cpu_times, gpu_times)):
    speedup = cpu_time / gpu_time
    plt.text(i, max(cpu_time, gpu_time), 
             f'{speedup:.1f}x speedup',
             ha='center', va='bottom')

plt.tight_layout()
plt.show()

# Print summary
print("\nPerformance Summary:")
print(performance_df.round(3))

## 5. Best Practices and Tips

Here are some tips for getting the best performance with GPU-accelerated pandas:

1. **Data Transfer**
   - Minimize data transfers between CPU and GPU
   - Load data directly to GPU when possible
   - Keep intermediate results on GPU

2. **Memory Management**
   - Monitor GPU memory usage
   - Use chunking for very large datasets
   - Clear unused variables to free memory

3. **Operation Optimization**
   - Batch operations when possible
   - Use GPU-optimized functions when available
   - Consider using Dask-cuDF for distributed computing

4. **Common Pitfalls**
   - Not all pandas operations are GPU-accelerated
   - Some operations may be faster on CPU for small datasets
   - Watch out for memory-intensive operations

Remember: Always profile your specific use case to determine if GPU acceleration provides meaningful benefits.