In [1]:
# Setting up a custom stylesheet in IJulia
file = open("./../style.css") # A .css file in the same folder as this notebook file
styl = read(file, String) # Read the file
HTML("$styl") # Output as HTML

## CUDA.jl (based on [CUDA.jl/ docs]

### Memory management

Memory management  is a highly important aspect of GPU programming. The `CuArray` type is the primary interface for doing so: Certain `CuArray` will automatically allocate memory data on the GPU, copying elements to it will uplad and then convering back to an Array will download to the CPU: 

In [17]:
import CUDA, Test
# generate cpu array
cpu = rand(Float32, 1024)

# allocate memory on the GPU
gpu = CUDA.CuArray{Float32}(undef, 1024)

# copy from the CPU to the the GPU copyto!(destiny, origin)
copyto!(gpu, cpu)

# download and verify 
Test.@test cpu == Array(gpu)

# this is the same to do 
gpu = CUDA.CuArray(cpu)

# verify 
Test.@test cpu == Array(gpu)


[32m[1mTest Passed[22m[39m
  Expression: cpu == Array(gpu)
   Evaluated: Float32[0.9769538, 0.29255325, 0.9496431, 0.9284388, 0.59763557, 0.8097115, 0.55664915, 0.5397078, 0.3149128, 0.7165329  …  0.8114963, 0.3918023, 0.5987678, 0.43801403, 0.38594592, 0.030004144, 0.6159654, 0.2991097, 0.8174435, 0.37383997] == Float32[0.9769538, 0.29255325, 0.9496431, 0.9284388, 0.59763557, 0.8097115, 0.55664915, 0.5397078, 0.3149128, 0.7165329  …  0.8114963, 0.3918023, 0.5987678, 0.43801403, 0.38594592, 0.030004144, 0.6159654, 0.2991097, 0.8174435, 0.37383997]

In many cases, you might not want to convert your input data to a dense CuArray. For example, with array wrappers you will want to preserve that wrapper type on the GPU and only upload the contained data. The Adapt.jl package does exactly that, and contains a list of rules on how to unpack and reconstruct types like array wrappers so that we can preserve the type when, e.g., uploading data to the GPU:

In [28]:
import LinearAlgebra
# define a cpu diagonal object
cpu_diag = LinearAlgebra.Diagonal([1,2])

# adapt the struct to CuArray object function 
import Adapt
gpu_diag = Adapt.adapt(CUDA.CuArray, cpu)

# or instead we can use: 
CUDA.cu(cpu_diag)

2×2 Diagonal{Int64, CUDA.CuArray{Int64, 1, CUDA.Mem.DeviceBuffer}}:
 1  ⋅
 ⋅  2

### Garbage collection

In computer science, garbage collection (GC) is a form of automatic memory management. The garbage collector, or just collector, attempts to reclaim garbage, or memory occupied by objects that are no longer in use by the program. `CuArray` objects type are managed by the Julia garbage collector. This means that will be collected once they are unreachable, and the memory hold by it will be reused. 

### Memory pool
Behind the scenes, a memory pool will hold on to your objects and cache the underlying memory to speed up future allocations. As a result, your GPU might seem to be running out of memory while it isn't. When memory pressure is high, the pool will automatically free cached objects:

Initial state:

In [31]:
CUDA.memory_status() 

Effective GPU memory usage: 2.45% (145.500 MiB/5.807 GiB)
Memory pool usage: 28.062 KiB (32.000 MiB reserved)

Allocate memory:

In [42]:
var = CUDA.CuArray{Int}(undef, 100_000_00)

println("kb_size of var is $(100_000_00*8/1000)") 

kb_size of var is 80000.0


New state:

In [43]:
CUDA.memory_status()

Effective GPU memory usage: 3.52% (209.500 MiB/5.807 GiB)
Memory pool usage: 76.321 MiB (96.000 MiB reserved)

We can reclaim more cache memory using:

In [44]:
CUDA.reclaim()
CUDA.memory_status()

Effective GPU memory usage: 3.52% (209.500 MiB/5.807 GiB)
Memory pool usage: 76.321 MiB (96.000 MiB reserved)

free memory allocated using:

In [46]:
CUDA.unsafe_free!(var)
CUDA.memory_status()

Effective GPU memory usage: 3.52% (209.500 MiB/5.807 GiB)
Memory pool usage: 28.062 KiB (96.000 MiB reserved)

¿What if we have no enough memory?