# Performance Profiling and Debugging with cuml.accel

This notebook demonstrates how to use the profiling capabilities in `cuml.accel` to understand which operations are being accelerated on GPU and which are falling back to CPU execution. This can be particularly useful for debugging performance issues or understanding why certain operations might not be accelerated.

`cuml.accel` provides two types of profilers:

1. **Function Profiler**: Shows statistics about potentially accelerated function and method calls
2. **Line Profiler**: Shows per-line statistics on your script with GPU utilization percentages

Let's explore both profilers with practical examples.


## Setup

First, let's load the cuml.accel extension and import the necessary libraries.


In [None]:
# Load the cuml.accel extension
%load_ext cuml.accel


In [None]:
from sklearn.linear_model import Ridge
from sklearn.datasets import make_regression
from sklearn.ensemble import RandomForestClassifier
from sklearn.model_selection import train_test_split

## Function Profiler

The function profiler gathers statistics about potentially accelerated function and method calls. It can show:

- Which method calls `cuml.accel` had the potential to accelerate
- Which methods were accelerated on GPU, and their total runtime
- Which methods required a CPU fallback, their total runtime, and why a fallback was needed

### Example 1: Ridge Regression with Mixed GPU/CPU Execution

Let's start with a simple example that demonstrates both GPU acceleration and CPU fallback using Ridge regression.


In [None]:
# Generate sample data
X, y = make_regression(n_samples=1000, n_features=100, noise=0.1, random_state=42)
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)


In [None]:
%%cuml.accel.profile

# Fit and predict on GPU (supported parameters)
ridge = Ridge(alpha=1.0)
ridge.fit(X_train, y_train)
predictions_gpu = ridge.predict(X_test)

# Retry, using a hyperparameter that isn't supported on GPU
ridge_cpu = Ridge(positive=True)  # positive=True is not supported on GPU
ridge_cpu.fit(X_train, y_train)
predictions_cpu = ridge_cpu.predict(X_test)


The function profiler output above shows:

- **GPU calls**: Methods that ran successfully on GPU
- **GPU time**: Total time spent on GPU operations
- **CPU calls**: Methods that fell back to CPU execution
- **CPU time**: Total time spent on CPU operations
- **Fallback reasons**: Why certain operations couldn't run on GPU

### Example 2: Random Forest Classification

Let's try a more complex example with Random Forest classification.


In [None]:
# Generate classification data
from sklearn.datasets import make_classification
X_class, y_class = make_classification(n_samples=2000, n_features=20, n_informative=15, 
                                      n_redundant=5, n_classes=3, random_state=42)
X_train_class, X_test_class, y_train_class, y_test_class = train_test_split(
    X_class, y_class, test_size=0.2, random_state=42)


In [None]:
%%cuml.accel.profile

# Random Forest with supported parameters
rf = RandomForestClassifier(n_estimators=100, max_depth=10, random_state=42)
rf.fit(X_train_class, y_train_class)
rf_predictions = rf.predict(X_test_class)
rf_probabilities = rf.predict_proba(X_test_class)


## Line Profiler

The line profiler collects per-line statistics on your script. It can show:

- Which lines took the most cumulative time
- Which lines (if any) were able to benefit from acceleration
- The percentage of each line's runtime that was spent on GPU through `cuml.accel`

⚠️ **Warning**: The line profiler can add non-negligible overhead. It's useful for understanding what parts of your code were accelerated, but you shouldn't compare runtimes when run with the line profiler enabled to other runs.

### Example 3: Line Profiling with Ridge Regression

Let's use the line profiler to see detailed per-line statistics.


In [None]:
%%cuml.accel.line_profile

# Generate data
X, y = make_regression(n_samples=1000, n_features=100, noise=0.1, random_state=42)

# Fit and predict on GPU
ridge = Ridge(alpha=1.0)
ridge.fit(X, y)
predictions = ridge.predict(X)

# Retry, using a hyperparameter that isn't supported on GPU
ridge_cpu = Ridge(positive=True)
ridge_cpu.fit(X, y)
predictions_cpu = ridge_cpu.predict(X)


The line profiler output shows:

- **#**: Line number
- **N**: Number of times the line was executed
- **Time**: Total time spent on that line
- **GPU %**: Percentage of time spent on GPU for that line
- **Source**: The actual code line

At the bottom, you'll see the total runtime and the percentage of time spent on GPU.

### Example 4: Line Profiling with Multiple Algorithms

Let's try a more comprehensive example with multiple machine learning algorithms.


In [None]:
from sklearn.linear_model import LogisticRegression
from sklearn.cluster import KMeans


In [None]:
%%cuml.accel.line_profile

# Generate data for multiple tasks
X_reg, y_reg = make_regression(n_samples=500, n_features=50, noise=0.1, random_state=42)
X_class, y_class = make_classification(n_samples=500, n_features=20, n_classes=2, random_state=42)

# Regression task
ridge = Ridge(alpha=1.0)
ridge.fit(X_reg, y_reg)
ridge_pred = ridge.predict(X_reg)

# Classification task
logreg = LogisticRegression(random_state=42)
logreg.fit(X_class, y_class)
logreg_pred = logreg.predict(X_class)

# Clustering task
kmeans = KMeans(n_clusters=3, random_state=42)
kmeans.fit(X_class)
kmeans_pred = kmeans.predict(X_class)


## Programmatic Profiling

You can also use the profilers programmatically with context managers. This is useful when you want to profile specific sections of code rather than entire cells.


In [None]:
# Generate data
X, y = make_regression(n_samples=1000, n_features=100, noise=0.1, random_state=42)
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)


In [None]:
# Using function profiler programmatically
# Note: that requires to import cuml – typically not needed for zero-code-change acceleration
import cuml

with cuml.accel.profile():
    # This section will be profiled
    ridge = Ridge(alpha=1.0)
    ridge.fit(X_train, y_train)
    predictions = ridge.predict(X_test)
    
    # This will fall back to CPU
    ridge_cpu = Ridge(positive=True)
    ridge_cpu.fit(X_train, y_train)
    predictions_cpu = ridge_cpu.predict(X_test)

# This section will NOT be profiled
print("Profiling complete!")


## Logging

In addition to profiling, `cuml.accel` also provides logging capabilities. You can enable different levels of logging to see what's happening behind the scenes.

### Setting Log Levels

You can set the logging level when installing cuml.accel programmatically:


In [None]:
# Note: This needs to be done before loading the extension
# Uncomment and restart kernel to try different log levels

# import cuml
# cuml.accel.install(log_level="debug")  # Most verbose
# cuml.accel.install(log_level="info")   # Shows GPU/CPU dispatch info
# cuml.accel.install(log_level="warn")   # Default - warnings only


### Example with Info Logging

Let's demonstrate what info-level logging looks like. First, let's reinstall cuml.accel with info logging:


In [None]:
# Reinstall with info logging
cuml.accel.install(log_level="info")


In [None]:
# Now let's run some code and see the logging output
X, y = make_regression(n_samples=100, n_features=10, noise=0.1, random_state=42)

# This should run on GPU
ridge = Ridge(alpha=1.0)
ridge.fit(X, y)
ridge.predict(X)

# This should fall back to CPU
ridge_cpu = Ridge(positive=True)
ridge_cpu.fit(X, y)
ridge_cpu.predict(X)


## Key Takeaways

1. **Function Profiler** (`%%cuml.accel.profile`): Best for understanding which methods were accelerated and why some fell back to CPU

2. **Line Profiler** (`%%cuml.accel.line_profile`): Best for understanding which specific lines of code benefited from acceleration and the overall GPU utilization percentage

3. **Logging**: Useful for real-time feedback on what's happening during execution

4. **Performance Insights**: 
   - High GPU utilization percentages indicate good acceleration
   - CPU fallbacks are clearly identified with reasons
   - Small datasets may show higher GPU times due to transfer overhead
   - Larger datasets typically show better GPU acceleration benefits

5. **Debugging**: Use these tools to identify why certain operations aren't being accelerated and optimize your code accordingly.

The profiling tools in `cuml.accel` are essential for understanding and optimizing your GPU-accelerated machine learning workflows!
