Skip to content
/ jCUDA Public

experiments with providing L3 BLAS and CUDA to J for array math speedup

Notifications You must be signed in to change notification settings

locklin/jCUDA

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

10 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

jCUDA

experiments with providing L3 BLAS and CUDA for J for array math speedup

Step completed: dgemm, sgemm

Org emacs list of nice things to put in here to make it useful;

** TODO jCUDA [40%]

  • Try dgemm BLAS port [100%]
    • Does it work?
    • simple test script
    • Make it work with vectors
  • Build wrapper for sgemm [100%]
    • Wrapper done
    • Make it work with raw float arrays
    • Add to test script
  • Port sgemm wrapper to CUDA [0%]
    • CMalloc/Free
    • CMmemcpy, host2dev,dev2host,dev2dev
    • sgemm
  • Other CUDA functions [0%]
    • "log", "log1p", "exp","cos", "sin", "sqrt","ceil", "floor","abs"
    • "acos", "cosh","tan", "atan", "asin", "sinh","tanh",
    • random number generators
  • stuff looted from R package gputools [0%]
    • distance
    • qrdecomp
    • mi
    • hcluster
    • kendall
    • sort
    • lsfit (same as qrdecomp?)

About

experiments with providing L3 BLAS and CUDA to J for array math speedup

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages