# Array operations on the GPU with ArrayFire


* Julia wrapper: https://github.com/JuliaComputing/ArrayFire.jl
* Source & build instructions: https://github.com/arrayfire/arrayfire
* Documentation: http://arrayfire.org/docs/
* CUDA: https://developer.nvidia.com/cuda-downloads

### Quick install guide

1. Install CUDA SDK on your machine (needs root).
2. Download ArrayFire binaries (registration required) OR build from source

When building from source

```
 git clone https://github.com/arrayfire/arrayfire.git
 cd arrayfire
 mkdir build && cd build
 cmake .. -DBUILD_OPENCL=OFF -DBUILD_CUDA=ON -DBUILD_CPU=OFF -DCUDA_TOOLKIT_ROOT_DIR=/Developer/NVIDIA/CUDA-7.5 -DINSTALL_PREFIX=/installdir
 make install
``` 

The dynamic linker needs to find `libaf.dylib`:
* Add `export LD_LIBRARY_PATH="$LD_LIBRARY_PATH:/installdir"` to `~/.bashrc`

OR

* Add `ENV["LD_LIBRARY_PATH"] *= ":/installdir"` to `~/.juliarc`

---

In [1]:
using ArrayFire

In [2]:
getActiveBackend()

CUDA Backend


__Construct array on the host__

In [3]:
a = rand(4096,4096);

__Transfer it to the GPU__

In [4]:
af = AFArray(a);

__or directly on the GPU__

In [16]:
bf = rand(AFArray{Float64},512,512);

### Are LinAlg operations really faster...?

In [7]:
@elapsed a*a

0.892678401

In [9]:
@elapsed begin
    af*af
    sync()
end

1.472777204

Well, that's disappointing...

### Consumer GPUs are (usually) not designed for double precision floats

In [10]:
a = rand(Float32, 4096,4096);

In [11]:
af = AFArray(a);

In [12]:
@elapsed a*a

0.679704067

In [14]:
@elapsed begin
    af*af
    sync()
end

0.271361098

__Some operations are faster...__

In [18]:
@elapsed fft(a)

3.212796643

In [19]:
@elapsed begin 
    fft(af)
    sync()
end

0.108558854

In [21]:
@elapsed a'

0.051237157

In [23]:
@elapsed begin
    af'
    sync()
end

0.02108886

__than others__

In [25]:
@elapsed det(a)

0.290905408

In [28]:
@elapsed det(af)

0.438934104