jCUDA

experiments with providing L3 BLAS and CUDA for J for array math speedup

Step completed: dgemm, sgemm

Org emacs list of nice things to put in here to make it useful;

** TODO jCUDA [40%]

Try dgemm BLAS port [100%]
- Does it work?
- simple test script
- Make it work with vectors
Build wrapper for sgemm [100%]
- Wrapper done
- Make it work with raw float arrays
- Add to test script
Port sgemm wrapper to CUDA [0%]
- CMalloc/Free
- CMmemcpy, host2dev,dev2host,dev2dev
- sgemm
Other CUDA functions [0%]
- "log", "log1p", "exp","cos", "sin", "sqrt","ceil", "floor","abs"
- "acos", "cosh","tan", "atan", "asin", "sinh","tanh",
- random number generators
stuff looted from R package gputools [0%]
- distance
- qrdecomp
- mi
- hcluster
- kendall
- sort
- lsfit (same as qrdecomp?)

Name		Name	Last commit message	Last commit date
Latest commit History 10 Commits
test		test
BLAS.ijs		BLAS.ijs
CUDA.ijs		CUDA.ijs
CUDAAPI.ijs		CUDAAPI.ijs
README.md		README.md

Provide feedback