EDSL for GPGPU programming
Haskell C
Fetching latest commit…
Cannot retrieve the latest commit at this time.
Failed to load latest commit information.



  • Embedded DSL for implementation of GPU Kernels
  • Generates CUDA/OpenCL kernels
  • Generates Sequential C code for testing purposes

News and current status

  • Bunch of small bug fixes Conditional expressions (?:) should be printed correctly now. Added precedence levels for operations.
  • Added &&*, ||* and /=*
  • OpenCL and C code generation is currently broken!! This is because of addition of new intermediate representation and common subexpression elimination experiments. Been working on getting the CUDA part to work first.
  • Global Arrays Experimenting with a new kind of Kernel. These kernels have types like: GlobalArray Pull (Exp a) -> Kernel (GlobalArray Push (Exp a)) This also lets the Obsidian programmer express some aspects of what we usually refer to as Kernel coordination. Two functions (block and unblock) chop global arrays up into small arrays suitable for processing by what we have so far called a Kernel. There are some restrictions (maybe totally necessary): There is no way to, within a kernel, go from (GlobalArray Push a) to (GlobalArray Pull a).
  • Push Arrays: Explanation to come


  • (Solved, it works, see below) OpenCL: The kind of pointer arithmetic + casting, that has worked so well in CUDA, seems to not work in OpenCL on the local memory. Will need to scan the specification/documentation to find exactly what is allowed and not on pointers to local memory in an OpenCL kernel. (Solved, see below)
  • The address space qualifiers are important! (__local int*)shared vs (int*)shared. (int*)shared tried to cast local memory array called shared into a global array, so illegal!
  • Obsidian Int are “64”bit on 64bit architectures. Is it also true that a CUDA int is 64bit on 64bit architectures?

and to look into:

  • Break out SPMDC into its own library and try to abstract this from any OpenCL, CUDA Specifics.
  • Update the Cabal file. Or fix whatever the problem with that is.
  • PRIO => Detect if output is computed directly from input. If that happens store result fist in a variable and then to output array from there.
  • Many-in -> Many-out operations
  • PRIO => Coordination
  • Improve upon the things in Sync.hs Syncing needs to be improved. (work on more kinds of arrays, needs type class magic)
  • Fix SyncAnalysis
  • IN PROGRESS: Perform CSE
  • IN PROGRESS: Kernel Coordination layer
  • DONE: Generate the vSwap/iSwap kind of kernels. That is Kernels that operate on an array based on its global tid instead of local tid.
  • DONE: Generate Kernels that take scalar parameters
  • PRIO => Filters and unsafe operations that lead to nondeterminism
  • 2D and 3D Arrays. Add in some form.
  • clean up
  • Sequential arrays