pscan_torch

Learning GPU programming with Mojo on parallel scan.

This project demonstrates how to wrap Mojo kernels and expose them to PyTorch through custom operations. It implements parallel prefix sum (scan) algorithms as an educational example.

Usage

Run tests: pixi run test-all-wrappers (also installs dependencies if needed)

Features

Single-block and multi-block parallel prefix sum implementations in Mojo
PyTorch wrapper functions using MAX's CustomOpLibrary
CUDA and ROCm support through Pixi environments
Test suite comparing results against NumPy reference implementation

Requirements

Python 3.12
Mojo
CUDA 12.x or ROCm 6.3
PyTorch 2.7.1

Project Structure

op/: Mojo kernel implementations
wrappers.py: PyTorch wrapper functions for Mojo kernels
test_wrappers.py: Test suite for kernel implementations

Name		Name	Last commit message	Last commit date
Latest commit History 10 Commits
op		op
.gitattributes		.gitattributes
.gitignore		.gitignore
README.md		README.md
notes.md		notes.md
pixi.lock		pixi.lock
pixi.toml		pixi.toml
test_prefix_sum_multiblock.mojo		test_prefix_sum_multiblock.mojo
test_wrappers.py		test_wrappers.py
wrappers.py		wrappers.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

pscan_torch

Usage

Features

Requirements

Project Structure

About

Uh oh!

Releases

Packages

Languages

jacobnzw/pscan_torch

Folders and files

Latest commit

History

Repository files navigation

pscan_torch

Usage

Features

Requirements

Project Structure

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages