Thork: A Tracing DSL for NVIDIA GPUs

Thork is a DSL for writing kernels for NVIDIA GPUs. You can think of it as a Python wrapper on top of CUDA C++. This makes development more convenient, since it makes the kernels live in the same source language as most ML code, and makes it easy to verify correctness against NumPy.

Unlike Triton, Thork is a tracing DSL rather than a parsing-based DSL. This means that a Thork program is fundamentally a Python program that creates a CUDA program. Thork can use any Python libraries to aid in metaprogramming.

Thork is the sister project to Spork (the same idea, but for Apple GPUs and Metal).

Installation

Thork runs on machines with an NVIDIA GPU and CUDA toolkit installed (nvcc + nvrtc). Install with:

uv add thork

or with pip:

pip install thork

import thork as tk

Example: Matrix Addition

In NumPy, you'd write:

shape = (1024, 1024)
A = np.random.randn(*shape).astype(np.float32)
B = np.random.randn(*shape).astype(np.float32)
out = A + B

A CUDA kernel to do the same:

extern "C" __global__ void matrix_add(float *out, const float *A, const float *B)
{
    unsigned int i = blockIdx.x * blockDim.x + threadIdx.x;
    out[i] = A[i] + B[i];
}

The equivalent thork kernel:

@tk.jit
def matrix_add(
    out : tk.DevicePointer[tk.dt.float32],
    A   : tk.DevicePointer[tk.dt.float32],
    B   : tk.DevicePointer[tk.dt.float32],
    bid : tk.Uint[tk.BlockIdx],
    tid : tk.Uint[tk.ThreadIdx],
    bdm : tk.Uint[tk.BlockDim],
):
    i = bid * bdm + tid
    out[i] = A[i] + B[i]

The attribute parameters (tk.Uint[tk.BlockIdx], etc.) expand at the top of the generated kernel body into the matching CUDA built-in (blockIdx.x, threadIdx.x, blockDim.x). For vector versions (tk.Uint3[tk.BlockIdx]) you get a value with .x, .y, .z fields.

To launch:

matrix_add[
    (int(np.prod(shape)) // 128, 1, 1),
    (128, 1, 1),
](
    C,
    A,
    B,
)

The first bracketed tuple is the grid; the second is the block. In the parentheses you pass the NumPy arrays — Thork allocates device memory, copies inputs, launches, and copies back any array that the kernel wrote to.

Name		Name	Last commit message	Last commit date
Latest commit History 6 Commits
src/thork		src/thork
tests		tests
third_party		third_party
.gitignore		.gitignore
.gitmodules		.gitmodules
.python-version		.python-version
README.md		README.md
pyproject.toml		pyproject.toml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Thork: A Tracing DSL for NVIDIA GPUs

Installation

Example: Matrix Addition

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Thork: A Tracing DSL for NVIDIA GPUs

Installation

Example: Matrix Addition

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages