# CUDA Convolution Accelerator - Speed Demo\n
\n
This notebook demonstrates the speedup achieved by using CUDA-accelerated convolution compared to CPU baseline.\n
\n
## Setup\n
\n
First, let's check if we have GPU access and install dependencies.

In [None]:
# Check GPU availability\n
!nvidia-smi

In [None]:
# Install dependencies (if running in Colab)\n
import sys\n
IN_COLAB = 'google.colab' in sys.modules\n
\n
if IN_COLAB:\n
    !pip install cupy-cuda11x scipy matplotlib pillow tqdm -q\n
    print("✓ Dependencies installed")\n
else:\n
    print("Running locally - ensure dependencies are installed")

## Imports

In [None]:
import numpy as np\n
import matplotlib.pyplot as plt\n
from PIL import Image\n
import cupy as cp\n
from tqdm import tqdm\n
import time\n
\n
# Import our modules\n
import sys\n
sys.path.insert(0, '..')\n
\n
from src.api import convolve, convolve_cpu\n
from src.presets import get_kernel, list_kernels\n
from src.timing import benchmark_all, benchmark_kernel_only, print_results\n
\n
print(f"✓ CuPy version: {cp.__version__}")\n
print(f"✓ GPU: {cp.cuda.runtime.getDeviceProperties(0)['name'].decode()}")
print(f"✓ Available kernels: {', '.join(list_kernels())}")

## Load Sample Images and Run Benchmark\n\nSee notebook for full analysis!