# AMDGPU Code

This is created for version 1.11+

In [1]:
versioninfo()

Julia Version 1.11.1
Commit 8f5b7ca12ad (2024-10-16 10:53 UTC)
Build Info:
  Official https://julialang.org/ release
Platform Info:
  OS: macOS (x86_64-apple-darwin22.4.0)
  CPU: 16 × Intel(R) Core(TM) i9-9880H CPU @ 2.30GHz
  WORD_SIZE: 64
  LLVM: libLLVM-16.0.6 (ORCJIT, skylake)
Threads: 4 default, 0 interactive, 2 GC (on 16 virtual cores)
Environment:
  JULIA_NUM_THREADS = 4


In [2]:
# For installing
using Pkg
#Pkg.add("AMDGPU")
Pkg.update("AMDGPU")
#println("AMDGPU Installed")

Pkg.add("BenchmarkTools")

[32m[1m    Updating[22m[39m registry at `~/.julia/registries/General`
[32m[1m  No Changes[22m[39m to `~/.julia/environments/v1.6/Project.toml`
[32m[1m  No Changes[22m[39m to `~/.julia/environments/v1.6/Manifest.toml`
[32m[1m   Resolving[22m[39m package versions...
[32m[1m  No Changes[22m[39m to `~/.julia/environments/v1.6/Project.toml`
[32m[1m  No Changes[22m[39m to `~/.julia/environments/v1.6/Manifest.toml`


In [4]:
using LinearAlgebra
using AMDGPU
using Test
using BenchmarkTools

AMDGPU.versioninfo()

┌───────────┬──────────────────┬─────────┬───────────────────────────────────────────────────────────────────────────────────────────┐
│[1m Available [0m│[1m Name             [0m│[1m Version [0m│[1m Path                                                                                      [0m│
├───────────┼──────────────────┼─────────┼───────────────────────────────────────────────────────────────────────────────────────────┤
│     +     │ LLD              │ -       │ /Applications/Julia-1.11.app/Contents/Resources/julia/libexec/julia/lld                   │
│     +     │ Device Libraries │ -       │ /Users/josephlee/.julia/artifacts/5ad5ecb46e3c334821f54c1feecc6c152b7b6a45/amdgcn/bitcode │
│     -     │ HIP              │ -       │ -                                                                                         │
│     -     │ rocBLAS          │ -       │ -                                                                                         │
│     -     │ rocSOLVER

[36m[1m[ [22m[39m[36m[1mInfo: [22m[39mAMDGPU versioninfo


First, let's talk about some coding practices:
- function names that end with an exclamation mark modify one or more of their arguments by convention


# Matrix Multiplication

We will start with the same example that I did in CUDA.


In [5]:
# Fast matrix multiplication function using BLAS
function fast_matmul!(C, A, B)
    # C should be initialized to the correct size before calling this function
    mul!(C, A, B) # mul! is from the BLAS.  # As an aside, I *think* A*B calls on BLAS functions
end

fast_matmul! (generic function with 1 method)

In [None]:
testMat = rand(100, 500)
println("here")
testMat_d = ROCArray(testMat)
size(testMat_d)

In [53]:
A = rand(1000, 500)  # Generate a random 1000x500 matrix
B = rand(500, 2000)  # Generate a random 500x2000 matrix
C = zeros(1000, 2000)
println(sum(C))
# Perform matrix multiplication on the GPU
fast_matmul!(C, A, B)
C

0.0


1000×2000 Matrix{Float64}:
 122.085  124.367  117.955  120.673  …  127.311  120.176  121.221  122.627
 131.479  134.291  125.347  127.901     137.384  128.11   130.163  135.986
 126.807  132.284  128.902  128.195     135.742  128.342  127.904  134.684
 117.003  124.505  120.232  119.494     125.956  118.821  121.383  121.224
 124.39   128.686  126.833  129.858     132.027  126.384  126.725  126.692
 123.08   130.885  127.633  125.086  …  131.579  123.411  127.739  130.459
 126.205  132.3    124.283  126.624     131.724  122.261  125.053  128.263
 118.929  128.073  122.211  122.838     126.123  117.958  121.699  122.888
 127.475  130.564  126.832  126.9       134.415  126.276  127.594  130.587
 124.114  128.297  126.935  127.506     134.641  124.337  126.549  130.273
 123.511  126.602  124.143  123.982  …  131.309  121.829  126.413  123.99
 124.013  133.116  124.904  126.222     135.304  129.072  127.561  129.111
 123.573  129.263  124.519  125.532     133.499  123.283  125.155  126.69


In [21]:
# Function to perform matrix multiplication on the GPU
function kernel_Matmul_ADMGPU!(C, A, B, M, N, K)
    """
    Kernel function for the AMD GPU.
    A has dimensions M x K, and B has dimensions K x N
    """
    for i in 1:M
        for j in 1:N
            value = 0.0
            for k in 1:K
                value += A[i, k] * B[k, j]
            end
            C[i, j] = value
        end
    end
    return
end


# Wrapper function for the GPU code
function Matmul_AMDGPU!(C_d, A_d, B_d)
    """
  
    """
    # Check if dimensions are compatible
    if size(A, 2) != size(B, 1) # compares the 2nd dimension of A and the first dimension of B
        throw(ArgumentError("Inner dimensions must match for multiplication."))
    end
    
    M, K = size(A_d)
    K, N = size(B_d)
    #println(M, K, N)
    
    groupsize = 256
    gridsize = 4
    @roc groupsize=groupszie gridsize=gridsize kernel_Matmul_ADMGPU!(C_d, A_d, B_d, M, N, K)
    #println("Here")
    # Wait for the kernel to finish
    #AMDGPU.synchronize()
    return 
end

    




Matmul_AMDGPU! (generic function with 1 method)

In [20]:
cld(1024, 256)

4

In [23]:
# Example usage
M, K, N = 2048, 2048, 2048
A = rand(Float32, M, K);
B = rand(Float32, K, N);
C = zeros(Float32, M, N);
println("The benchmarking (fast mul): ");
@btime fast_matmul!(C, A, B);
println()

C2 = zeros(Float32, M, N);

A_d = AMDGPU.Array(A)
B_d = AMDGPU.Array(B)
C2_d = AMDGPU.Array(C2)
# Perform matrix multiplication on the GPU
println("Benchmarking GPU code")
@btime Matmul_AMDGPU!(C2_d, A_d, B_d)
println("-Comment out if you want to see the resulting matrix")

The benchmarking (fast mul): 
  25.495 ms (0 allocations: 0 bytes)

Benchmarking GPU code


LoadError: could not load library ""
dlopen(.dylib, 0x0001): tried: '/Applications/Julia-1.11.app/Contents/Resources/julia/lib/julia/.dylib' (no such file), '/Applications/Julia-1.11.app/Contents/Resources/julia/lib/julia/../.dylib' (no such file), '/Applications/Julia-1.11.app/Contents/Resources/julia/bin/../lib/.dylib' (no such file), '.dylib' (no such file), '/usr/local/lib/.dylib' (no such file), '/usr/lib/.dylib' (no such file), '/Users/josephlee/Julia Code/Github-Repos/algorithms/.dylib' (no such file)