Fetching latest commit…
Cannot retrieve the latest commit at this time.
News and current status
- Embedded DSL for implementation of GPU Kernels
- Generates CUDA/OpenCL kernels
- Generates Sequential C code for testing purposes
- Bunch of small bug fixes
Conditional expressions (?:) should be printed correctly now.
Added precedence levels for operations.
- Added &&*, ||* and /=*
- OpenCL and C code generation is currently broken!!
This is because of addition of new intermediate representation and
common subexpression elimination experiments. Been working on
getting the CUDA part to work first.
- Global Arrays
Experimenting with a new kind of Kernel. These kernels
have types like: GlobalArray Pull (Exp a) -> Kernel (GlobalArray Push (Exp a))
This also lets the Obsidian programmer express some aspects of what we
usually refer to as Kernel coordination.
Two functions (block and unblock) chop global arrays up into small arrays suitable
for processing by what we have so far called a Kernel.
There are some restrictions (maybe totally necessary): There is no way to,
within a kernel, go from (GlobalArray Push a) to (GlobalArray Pull a).
- Push Arrays: Explanation to come
and to look into:
- (Solved, it works, see below)
OpenCL: The kind of pointer arithmetic + casting, that has worked so well
in CUDA, seems to not work in OpenCL on the local memory.
Will need to scan the specification/documentation to find exactly what is
allowed and not on pointers to local memory in an OpenCL kernel. (Solved, see below)
- The address space qualifiers are important! (__local int*)shared vs (int*)shared.
(int*)shared tried to cast local memory array called shared into a global array, so illegal!
- Obsidian Int are “64”bit on 64bit architectures. Is it also true that
a CUDA int is 64bit on 64bit architectures?
- Break out SPMDC into its own library and try to abstract this
from any OpenCL, CUDA Specifics.
- Update the Cabal file. Or fix whatever the problem with that is.
- PRIO => Detect if output is computed directly from input.
If that happens store result fist in a variable and
then to output array from there.
- Many-in -> Many-out operations
- PRIO => Coordination
- Improve upon the things in Sync.hs
Syncing needs to be improved. (work on more kinds of arrays, needs type class magic)
- Fix SyncAnalysis
- IN PROGRESS: Perform CSE
- IN PROGRESS: Kernel Coordination layer
- DONE: Generate the vSwap/iSwap kind of kernels. That is Kernels that operate
on an array based on its global tid instead of local tid.
- DONE: Generate Kernels that take scalar parameters
- PRIO => Filters and unsafe operations that lead to nondeterminism
- 2D and 3D Arrays. Add in some form.
- clean up
- Sequential arrays