# CUDA Convolution Accelerator - Colab Setup

This notebook sets up the CUDA convolution project in Google Colab.

## Prerequisites

**IMPORTANT:** Set runtime to GPU!

- Go to **Runtime → Change runtime type → Hardware accelerator → GPU**
- Recommended: **T4 or better**


In [1]:
# Check NVIDIA GPU
!nvidia-smi

import torch
if torch.cuda.is_available():
    print(f"\n✓ GPU Available: {torch.cuda.get_device_name(0)}")
    print(f"✓ CUDA Version: {torch.version.cuda}")
else:
    print("\n✗ No GPU detected. Please change runtime type to GPU.")


Tue Oct 28 18:36:05 2025       
+-----------------------------------------------------------------------------------------+
| NVIDIA-SMI 550.54.15              Driver Version: 550.54.15      CUDA Version: 12.4     |
|-----------------------------------------+------------------------+----------------------+
| GPU  Name                 Persistence-M | Bus-Id          Disp.A | Volatile Uncorr. ECC |
| Fan  Temp   Perf          Pwr:Usage/Cap |           Memory-Usage | GPU-Util  Compute M. |
|                                         |                        |               MIG M. |
|   0  Tesla T4                       Off |   00000000:00:04.0 Off |                    0 |
| N/A   76C    P8             12W /   70W |       0MiB /  15360MiB |      0%      Default |
|                                         |                        |                  N/A |
+-----------------------------------------+------------------------+----------------------+
                                                

In [2]:
# Install CuPy and other dependencies
!pip install scipy matplotlib pillow tqdm pytest -q

print("✓ Dependencies installed")

✓ Dependencies installed


###Check if github repo exists

In [3]:
!rm -rf /content/cuda-conv

In [4]:
# Option A: Clone from GitHub (replace with your repo URL)
!git clone https://github.com/elcruzo/cuda-conv.git
%cd cuda-conv

Cloning into 'cuda-conv'...
remote: Enumerating objects: 84, done.[K
remote: Counting objects: 100% (84/84), done.[K
remote: Compressing objects: 100% (64/64), done.[K
remote: Total 84 (delta 29), reused 71 (delta 19), pack-reused 0 (from 0)[K
Receiving objects: 100% (84/84), 87.40 KiB | 745.00 KiB/s, done.
Resolving deltas: 100% (29/29), done.
/content/cuda-conv


In [5]:
# Test imports
import cupy as cp
from src.api import convolve
from src.presets import get_kernel, list_kernels

print(f"✓ CuPy version: {cp.__version__}")
print(f"✓ GPU: {cp.cuda.runtime.getDeviceProperties(0)['name'].decode()}")
print(f"✓ Available kernels: {', '.join(list_kernels())}")

# Quick test
import numpy as np
test_img = np.random.rand(32, 32).astype(np.float32)
test_kernel = get_kernel('box_blur')
result = convolve(test_img, test_kernel)

print(f"\n✓ Test convolution successful! Result shape: {result.shape}")

✓ CuPy version: 13.3.0
✓ GPU: Tesla T4
✓ Available kernels: sobel_x, sobel_y, gaussian, box_blur, sharpen, edge_detect, emboss, gaussian_5x5, box_blur_5x5

✓ Test convolution successful! Result shape: (32, 32)


In [6]:
# Generate sample images
!python3 scripts/generate_sample_images.py

print("✓ Sample images generated")

Generating sample images...
  return Image.fromarray(Z, mode='L')
✓ Saved /content/cuda-conv/data/lena.png
  return Image.fromarray(img, mode='L')
✓ Saved /content/cuda-conv/data/checker.png
  return Image.fromarray(Z, mode='L')
✓ Saved /content/cuda-conv/data/gradient.png
  return Image.fromarray(img, mode='L')
✓ Saved /content/cuda-conv/data/edges.png

Done! Sample images generated in data/ directory.
✓ Sample images generated


## Step 6: Run Tests


In [7]:
# Run test suite
!pytest tests/ -v

print("\n✓ All tests completed")

platform linux -- Python 3.12.12, pytest-8.4.2, pluggy-1.6.0 -- /usr/bin/python3
cachedir: .pytest_cache
rootdir: /content/cuda-conv
plugins: typeguard-4.4.4, anyio-4.11.0, langsmith-0.4.38
collected 28 items                                                             [0m

tests/test_correctness.py::TestConvolutionCorrectness::test_simple_3x3_identity [32mPASSED[0m[32m [  3%][0m
tests/test_correctness.py::TestConvolutionCorrectness::test_simple_3x3_box_blur [32mPASSED[0m[32m [  7%][0m
tests/test_correctness.py::TestConvolutionCorrectness::test_sobel_edge_detection [32mPASSED[0m[32m [ 10%][0m
tests/test_correctness.py::TestConvolutionCorrectness::test_gaussian_blur [32mPASSED[0m[32m [ 14%][0m
tests/test_correctness.py::TestConvolutionCorrectness::test_5x5_kernel [32mPASSED[0m[32m [ 17%][0m
tests/test_correctness.py::TestConvolutionCorrectness::test_integer_kernel [32mPASSED[0m[32m [ 21%][0m
tests/test_correctness.py::TestConvolutionCorrectness::test_rgb_image [