Skip to content

saromleang/dir-benchmark-dgemm-C

Repository files navigation

dir-benchmark-dgemm-C

Benchmark K20 dgemm.c

Copy the appropriate Makefile.(bolt|cyence) to Makefile

bolt: make [options] BLAS=(atlas-sandybridge-gnu|atlas-haswell-gnu|atlas-dev-sandybridge-gnu|atlas-dev-haswell-gnu|openBLAS-sandybridge-gnu|openBLAS-sandybridge-intel|openBLAS-haswell-gnu|openBLAS-haswell-intel|mkl|mkl-with-cuda)

cyence: make [options] BLAS=(atlas-sandybridge-gnu|atlas-dev-sandybridge-gnu|openBLAS-sandybridge-gnu|openBLAS-sandybridge-intel|mkl|mkl-with-cuda)

LOCAL(sarom): make [options] BLAS=(atlas|openBLAS|mkl)

options:

all: default

Compiler optimization level: -O3
Builds benchmark with no pinned host memory, with no niave cpu dgemm calculation, with no verification of results.

debug:

Compiler optimization level: not set
Builds the default with verbose printing.

cpu-dgemm:

Compiler optimization level: -O3
A niave cpu dgemm implementation is performed.

verify:

Compiler optimization level: -O3
Accelerator blas calculated DGEMM product and CPU blas calcualated DGEMM product is compared elememt by element a niave cpu calculated DGEMM product. The accumulated unsigned error is printed out.

debug-verify:

Compiler optimization level: not set
Adds verbose printing on top of a verify build.

pinned:

Compiler optimization level: -O3
Builds benchmark with pinned host memory.

debug-pinned:

Compiler optimization level: not set
Adds verbose printing on top of a pinned build.

cpu-dgemm-pinned:

Compiler optimization level: -O3
Builds benchmark with pinned host memory and a niave cpu dgemm implementation is performed.

verify-pinned:

Compiler optimization level: -O3
Builds benchmark with pinned host memory. Accelerator blas calculated DGEMM product and CPU blas calcualated DGEMM product is compared elememt by element a niave cpu calculated DGEMM product. The accumulated unsigned error is printed out.

debug-verify-pinned:

Compiler optimization level: not set
Adds verbose printing on top of a verify-pinned build.

no-gpu:

Compiler optimization level: -O3
Pure CPU code. Random-number generation and DGEMM operations performed on the CPU.

About

Benchmark K20 dgemm.c

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published