Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Nomic Vulkan backend #4456

Merged
merged 155 commits into from
Jan 29, 2024
Merged
Show file tree
Hide file tree
Changes from 131 commits
Commits
Show all changes
155 commits
Select commit Hold shift + click to select a range
ba15dfd
Nomic vulkan backend licensed under the Software for Open Models Lice…
niansa Jun 22, 2023
48a45ea
Remove warning which fails on windows.
manyoso Aug 30, 2023
8563fa0
remove dynamic deps from kompute build
apage43 Sep 5, 2023
45c8778
Switch to a dynamic dispatch table instead of linking hard against li…
manyoso Sep 12, 2023
b7e2e69
Completely revamp how we do object management with the vulkan backend…
manyoso Sep 12, 2023
beee572
Make kompute actually include external SDK headers when requested
apage43 Sep 12, 2023
68cf1df
Throw an exception when allocation fails for vulkan.
manyoso Sep 13, 2023
8bea719
vulkan: disambiguate gpus with the same name
apage43 Sep 13, 2023
bd5f639
Don't try and install kompute artifacts.
manyoso Sep 13, 2023
4ed25b2
Sync from device back to host at begin of new prompt.
manyoso Sep 14, 2023
68aca6b
Only use vulkan with known quant that work.
manyoso Sep 14, 2023
addac25
Set the singleton to nullptr here.
manyoso Sep 14, 2023
2c24d67
Don't crash on available devices if we can't even create an instance.
manyoso Sep 16, 2023
1b1416d
Support for gguf.
manyoso Sep 21, 2023
6b6c73a
kompute : don't fail build because of -Warray-bounds
cebtenzzre Sep 26, 2023
9e4f8b4
Upload immediately to device.
manyoso Sep 26, 2023
77135a3
Add a common boilerplate code via include and elim copy pasta
manyoso Sep 21, 2023
93306f1
Consolidate code for mat x vec kernels and use subgroups more extensi…
manyoso Sep 29, 2023
601905e
Move the subgroups and printf into common.
manyoso Oct 2, 2023
5509f74
Minor cleanup.
manyoso Oct 2, 2023
4b223ec
Refactor getrows to use common code and get ready for q6_k.
manyoso Oct 2, 2023
f1c9bc1
Add q6_k getrows and mul*vec kernel.
manyoso Oct 2, 2023
06d4b21
Fix offset into the qh and now we have working vulkan accelerated for…
manyoso Oct 2, 2023
32289aa
Fixes for norm.
manyoso Oct 3, 2023
6ac3975
Fixup the upstream CMakelists.txt so we can build just llama.cpp with…
manyoso Oct 3, 2023
de589ce
Change this back to be in agreement with metal and our previous softm…
manyoso Oct 3, 2023
bc4b5ed
Fixes for subgroup size to bring AMD and NVIDIA inline with eachother…
manyoso Oct 4, 2023
24a4a59
kompute : only try to use Vulkan for LLaMA itself
cebtenzzre Oct 4, 2023
3d850db
kompute : remove Q6_K from list of supported quant types
cebtenzzre Oct 4, 2023
9db90cb
f16 mv broadcasting fix (gqa fix)
apage43 Oct 5, 2023
ff4212d
q8 mat*vec
apage43 Oct 5, 2023
020b174
vulkan: implement neox mode for rope
apage43 Oct 5, 2023
8564f79
falcon h2d + reenable vulkan
apage43 Oct 5, 2023
09d83f0
Delete TODO now that we have q8_0.
manyoso Oct 5, 2023
f0cd38b
add mat*mat ops
apage43 Oct 11, 2023
46385ee
misc vulkan cleanup
apage43 Oct 11, 2023
3327d84
perf: use bigger threadgroups in mm
apage43 Oct 11, 2023
d5741c0
use op param epsilon for norms
apage43 Oct 12, 2023
b78a94b
q6k mm works
apage43 Oct 12, 2023
4809890
rm commented dbg print
apage43 Oct 12, 2023
cd0257e
q4_1 mat*mat
apage43 Oct 12, 2023
8dc79ac
clean up vulkan/cpu switch
apage43 Oct 12, 2023
9bc52eb
attempted speedups
apage43 Oct 13, 2023
c1fd645
attempted speedups 2
apage43 Oct 13, 2023
cc05a60
use mat*vec shaders for mat*mat
apage43 Oct 16, 2023
21841d3
kompute : enable kp_logger and make it static (#8)
cebtenzzre Oct 16, 2023
cbc0d1a
kompute : make scripts executable
cebtenzzre Oct 23, 2023
8400015
Don't try an allocation on a heap that is smaller than the size we re…
manyoso Oct 26, 2023
752f7eb
Remove unused push constant that was giving validation errors.
manyoso Oct 26, 2023
8d9efbf
Lower the workgroup count for some shaders by providing a loop that p…
manyoso Oct 26, 2023
74ddf0f
Fix synchronization problem for AMD Radeon with amdvlk driver or windows
manyoso Oct 27, 2023
1c17010
vulkan : fix missing break in matmul selection (#9)
cebtenzzre Oct 23, 2023
89b7127
llama : decide to disable Vulkan before loading tensors (#7)
cebtenzzre Oct 27, 2023
e006d37
Scale the workgroup count down to allow correct generation for falcon…
manyoso Oct 27, 2023
a5eb001
Revert the prompt processing on gpu for now.
manyoso Oct 27, 2023
ffd0624
Remove this debug code.
manyoso Oct 30, 2023
f88b198
llama : fix Vulkan whitelist (#11)
cebtenzzre Nov 1, 2023
a8cac53
kompute : fix issues with debug layers
cebtenzzre Nov 6, 2023
c438c16
fix build with external fmtlib (v10)
cebtenzzre Nov 7, 2023
af00cca
Merge commit 'ec893798b7a2a803466cc8f063051499ec3d96f7' into HEAD
cebtenzzre Nov 8, 2023
71565eb
vulkan : replace ggml_diag_mask_inf with ggml_add (custom -inf mask)
cebtenzzre Nov 23, 2023
84f7fc4
vulkan : rope n_past is now KQ_pos, f16 rope kernel
cebtenzzre Nov 23, 2023
39abedd
vulkan : optimize workgroup sizes
cebtenzzre Nov 23, 2023
f194e1b
Merge commit 'fcca0a700487999d52a525c96d6661e9f6a8703a' into nomic-vu…
cebtenzzre Nov 23, 2023
a934b2c
vulkan : assert various kernel requirements
cebtenzzre Nov 14, 2023
2a41ba7
Merge commit '469c9addef75893e6be12edda852d12e840bf064' into nomic-vu…
cebtenzzre Nov 14, 2023
6474fc8
vulkan : handle ggml_scale for n%8 != 0
cebtenzzre Nov 14, 2023
fe26e6a
Merge commit 'e16b9fa4baa8a09c6619b116159830e898050942' into nomic-vu…
cebtenzzre Nov 14, 2023
9c4dfd0
mention skipped change
cebtenzzre Nov 15, 2023
02c3309
merge fixup (e16b9fa4baa8a09c6619b116159830e898050942)
cebtenzzre Nov 14, 2023
1829f1d
Merge commit '4760e7cc0b68570d58f55e8dda469805d1759d0d~' into nomic-v…
cebtenzzre Nov 23, 2023
208cd52
vulkan : implement YaRN RoPE scaling (#2268)
cebtenzzre Nov 15, 2023
23f6d51
Merge commit '4760e7cc0b68570d58f55e8dda469805d1759d0d' into nomic-vu…
cebtenzzre Nov 23, 2023
a4bb9c5
vulkan : sync with "migrate to dynamic graphs"
cebtenzzre Nov 23, 2023
9ae88ba
Merge remote-tracking branch 'upstream/master' into nomic-vulkan-redo
cebtenzzre Nov 23, 2023
56430c3
relicense Vulkan backend as MIT
cebtenzzre Dec 13, 2023
3e09e12
rename ggml-vulkan -> ggml-kompute
cebtenzzre Dec 13, 2023
27631db
separate shaders from kompute itself
cebtenzzre Dec 13, 2023
747e1ea
Merge commit '81bc9214a389362010f7a57f4cbc30e5f83a2d28' into nomic-vu…
cebtenzzre Dec 13, 2023
b906e12
kompute : fix compile warnings
cebtenzzre Dec 13, 2023
9af7f58
move kompute to a submodule
cebtenzzre Dec 13, 2023
f7cb0a6
remove script with unclear purpose
cebtenzzre Dec 13, 2023
c8fd4ba
ggml : restore 'static' specifiers
cebtenzzre Dec 14, 2023
f58f581
refactor llama.cpp modifications
cebtenzzre Dec 15, 2023
2d2c76a
vulkan : fix free of stack addr in llama_buffer
cebtenzzre Nov 29, 2023
8072706
kompute : always destroy Manager via the destructor
cebtenzzre Dec 15, 2023
44b1a97
kompute : fix -Wunused-private-field warnings from clang
cebtenzzre Dec 11, 2023
8b65f4c
Merge commit 'bcc0eb4591bec5ec02fad3f2bdcb1b265052ea56' into ceb/nomi…
cebtenzzre Jan 8, 2024
3959283
Merge commit '31f27758faf4a4bd08101a57c7ec3a473f771f86' into ceb/nomi…
cebtenzzre Jan 8, 2024
904c563
sync xxd commands with GPT4All llama.cpp.cmake
cebtenzzre Jan 10, 2024
ae6d682
Merge commit 'd232aca5a73b290e218a2e48b91023d5e994203f' into ceb/nomi…
cebtenzzre Jan 9, 2024
3773e1a
Merge branch 'master' of https://github.com/ggerganov/llama.cpp into …
cebtenzzre Jan 9, 2024
7c527eb
Merge commit 'e7e4df031b9e29d4b55a4e0b0295187f6b213db1' into HEAD
cebtenzzre Jan 24, 2024
298d6ee
kompute : initial attempt at ggml-backend v2 support
cebtenzzre Jan 9, 2024
5f660da
fix assertion failure
cebtenzzre Jan 10, 2024
070919d
attempt to get test-backend-ops working
cebtenzzre Jan 10, 2024
cad72e1
add sanity check and fix kompute teardown order
cebtenzzre Jan 17, 2024
76474a7
kompute : ignore exceptions in ggml_vk_available_devices (#12)
cebtenzzre Jan 17, 2024
d6bd471
kompute : fix rope_f32 and scale ops (#5008)
ggerganov Jan 18, 2024
9431026
clean up old backend code
cebtenzzre Jan 18, 2024
e9d5223
actually fix this assertion
cebtenzzre Jan 18, 2024
729e1a4
sync op_rope_f16 with recent op_rope_f32 changes
cebtenzzre Jan 18, 2024
0753073
never try to evaluate an empty command buffer
cebtenzzre Jan 18, 2024
2f6a279
fix supported ops for kompute backend
cebtenzzre Jan 18, 2024
33e8d6a
kompute : fix ggml_add kernel (#5027)
ggerganov Jan 18, 2024
cb9ceff
minor cleanup
cebtenzzre Jan 19, 2024
0899adf
kompute : fix get_rows dispatch -> 4 less failures
cebtenzzre Jan 22, 2024
08e23fd
kompute : fix op_mul kernel -> 13 less test failures
cebtenzzre Jan 22, 2024
2755ae3
kompute : fix more dispatch ambiguity -> 12 less failures
cebtenzzre Jan 22, 2024
0787b80
kompute : remove broken mulrow kernel -> 1 less test failure
cebtenzzre Jan 22, 2024
1a14099
fix q4_0/q4_1 mmv, 65 -> 49 failures
cebtenzzre Jan 24, 2024
2b0f642
fix f16 mmv, 49 -> 41 failures
cebtenzzre Jan 24, 2024
2852902
test-backend-ops : add llama test
cebtenzzre Jan 24, 2024
1450966
test-backend-ops : test scale parameter of ggml_soft_max_ext
cebtenzzre Jan 24, 2024
308f279
kompute : support scale parameter of softmax
cebtenzzre Jan 24, 2024
8bd38fe
test-backend-ops : test mask parameter of ggml_soft_max_ext
cebtenzzre Jan 24, 2024
df687b1
kompute : support mask parameter of softmax
cebtenzzre Jan 24, 2024
ebb5f7e
test-backend-ops : test llama with different batch sizes
cebtenzzre Jan 24, 2024
ec68a96
test-backend-ops : increase max_nmse_err so Llama passes
cebtenzzre Jan 24, 2024
987335e
kompute : fix algorithm names
cebtenzzre Jan 25, 2024
f5ac635
kompute : fix q8_0 mmv, 41 -> 28 failures
cebtenzzre Jan 25, 2024
1849b85
test-backend-ops : add Falcon test
cebtenzzre Jan 25, 2024
6fc99a6
test-backend-ops : test larger GELU range
cebtenzzre Jan 25, 2024
38d1f0c
kompute : fix op_gelu -> Falcon is working on AMDVLK
cebtenzzre Jan 25, 2024
11b3050
test-backend-ops : restore softmax tests
cebtenzzre Jan 25, 2024
de9fba0
kompute : fix basic f16 get_rows, 28 -> 26 failures
cebtenzzre Jan 25, 2024
445a373
kompute : fix basic Q6_K get_rows, 26 -> 24 failures
cebtenzzre Jan 25, 2024
3fbf052
kompute : mark last few failing ops as unsupported
cebtenzzre Jan 25, 2024
3915194
test-backend-ops : make Falcon test faster with a smaller model
cebtenzzre Jan 25, 2024
bc28704
kompute : remove unused immintrin.h #include
cebtenzzre Jan 25, 2024
91654ff
kompute : fix a -Wstrict-aliasing warning
cebtenzzre Jan 25, 2024
61a5cf8
kompute : remove unnecessary use_mmap=false
cebtenzzre Jan 26, 2024
e6ce5f2
llama : revert unintended whitespace change
cebtenzzre Jan 26, 2024
aea8498
Merge branch 'master' of https://github.com/ggerganov/llama.cpp into …
cebtenzzre Jan 26, 2024
2512799
test-backend-ops : comment out Llama and Falcon tests
cebtenzzre Jan 26, 2024
8ca33de
test-backend-ops : check all the ops in the test for support in the b…
slaren Jan 26, 2024
6af02b1
kompute : init device automatically and remove an unnecessary free
cebtenzzre Jan 26, 2024
2ff2d16
ggml-kompute.h : remove anything that doesn't need to be public
cebtenzzre Jan 26, 2024
cdab404
kompute : fix #includes
cebtenzzre Jan 26, 2024
454baeb
op_mul_mat_mat_f32.comp : fix missing final newline
cebtenzzre Jan 26, 2024
297fde5
editorconfig-checker : exclude .gitmodules
cebtenzzre Jan 26, 2024
9132485
ci : initial attempt at testing Kompute backend
cebtenzzre Jan 26, 2024
57cecad
main : remove ggml-kompute.h #include
cebtenzzre Jan 26, 2024
4b0c96a
kompute : adapt ggml-kompute API to be compatible with C
cebtenzzre Jan 26, 2024
e6edd44
ci : attempt to fix Vulkan installer path
cebtenzzre Jan 27, 2024
050d450
ci : do not run tests for Kompute (no GPU)
cebtenzzre Jan 27, 2024
5304625
kompute : use llama_backend_init/llama_backend_free to manage device
cebtenzzre Jan 27, 2024
be7c055
kompute : better device management
cebtenzzre Jan 29, 2024
da1dc66
Merge branch 'master' of https://github.com/ggerganov/llama.cpp into …
cebtenzzre Jan 29, 2024
dc08e51
kompute : fix merge issues
cebtenzzre Jan 29, 2024
7e11fe0
kompute : remove llama_load_model_from_file_internal
cebtenzzre Jan 29, 2024
b932cd7
vulkan : correctly fix use-after-free in ggml_vk_current_device
cebtenzzre Nov 30, 2023
48db724
minor fixup
cebtenzzre Jan 29, 2024
1f98dff
fix trailing whitespace
cebtenzzre Jan 29, 2024
2998211
fix incorrect memcpy
cebtenzzre Jan 29, 2024
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
3 changes: 3 additions & 0 deletions .gitmodules
Original file line number Diff line number Diff line change
@@ -0,0 +1,3 @@
[submodule "kompute"]
path = kompute
url = https://github.com/nomic-ai/kompute.git
166 changes: 161 additions & 5 deletions CMakeLists.txt
Original file line number Diff line number Diff line change
Expand Up @@ -96,6 +96,7 @@ option(LLAMA_CLBLAST "llama: use CLBlast"
option(LLAMA_METAL "llama: use Metal" ${LLAMA_METAL_DEFAULT})
option(LLAMA_METAL_NDEBUG "llama: disable Metal debugging" OFF)
option(LLAMA_METAL_SHADER_DEBUG "llama: compile Metal with -fno-fast-math" OFF)
option(LLAMA_KOMPUTE "llama: use Kompute" OFF)
option(LLAMA_MPI "llama: use MPI" OFF)
option(LLAMA_QKK_64 "llama: use super-block size of 64 for k-quants" OFF)

Expand Down Expand Up @@ -442,6 +443,160 @@ if (LLAMA_HIPBLAS)
endif()
endif()

if (LLAMA_KOMPUTE)
add_compile_definitions(VULKAN_HPP_DISPATCH_LOADER_DYNAMIC=1)
find_package(Vulkan COMPONENTS glslc REQUIRED)
find_program(glslc_executable NAMES glslc HINTS Vulkan::glslc)
if (NOT glslc_executable)
message(FATAL_ERROR "glslc not found")
endif()

function(compile_shader)
set(options)
set(oneValueArgs)
set(multiValueArgs SOURCES)
cmake_parse_arguments(compile_shader "${options}" "${oneValueArgs}" "${multiValueArgs}" ${ARGN})
foreach(source ${compile_shader_SOURCES})
get_filename_component(filename ${source} NAME)
set(spv_file ${filename}.spv)
add_custom_command(
OUTPUT ${spv_file}
DEPENDS ${CMAKE_CURRENT_SOURCE_DIR}/${source}
${CMAKE_CURRENT_SOURCE_DIR}/kompute-shaders/common.comp
${CMAKE_CURRENT_SOURCE_DIR}/kompute-shaders/op_getrows.comp
${CMAKE_CURRENT_SOURCE_DIR}/kompute-shaders/op_mul_mv_q_n_pre.comp
${CMAKE_CURRENT_SOURCE_DIR}/kompute-shaders/op_mul_mv_q_n.comp
COMMAND ${glslc_executable} --target-env=vulkan1.2 -o ${spv_file} ${CMAKE_CURRENT_SOURCE_DIR}/${source}
COMMENT "Compiling ${source} to ${spv_file}"
)

get_filename_component(RAW_FILE_NAME ${spv_file} NAME)
set(FILE_NAME "shader${RAW_FILE_NAME}")
string(REPLACE ".comp.spv" ".h" HEADER_FILE ${FILE_NAME})
string(TOUPPER ${HEADER_FILE} HEADER_FILE_DEFINE)
string(REPLACE "." "_" HEADER_FILE_DEFINE "${HEADER_FILE_DEFINE}")
set(OUTPUT_HEADER_FILE "${HEADER_FILE}")
message(STATUS "${HEADER_FILE} generating ${HEADER_FILE_DEFINE}")
if(CMAKE_GENERATOR MATCHES "Visual Studio")
add_custom_command(
OUTPUT ${OUTPUT_HEADER_FILE}
COMMAND ${CMAKE_COMMAND} -E echo "/*THIS FILE HAS BEEN AUTOMATICALLY GENERATED - DO NOT EDIT*/" > ${OUTPUT_HEADER_FILE}
COMMAND ${CMAKE_COMMAND} -E echo \"\#ifndef ${HEADER_FILE_DEFINE}\" >> ${OUTPUT_HEADER_FILE}
COMMAND ${CMAKE_COMMAND} -E echo \"\#define ${HEADER_FILE_DEFINE}\" >> ${OUTPUT_HEADER_FILE}
COMMAND ${CMAKE_COMMAND} -E echo "namespace kp {" >> ${OUTPUT_HEADER_FILE}
COMMAND ${CMAKE_COMMAND} -E echo "namespace shader_data {" >> ${OUTPUT_HEADER_FILE}
COMMAND ${CMAKE_BINARY_DIR}/bin/$<CONFIG>/xxd -i ${RAW_FILE_NAME} >> ${OUTPUT_HEADER_FILE}
COMMAND ${CMAKE_COMMAND} -E echo "}}" >> ${OUTPUT_HEADER_FILE}
COMMAND ${CMAKE_COMMAND} -E echo \"\#endif // define ${HEADER_FILE_DEFINE}\" >> ${OUTPUT_HEADER_FILE}
DEPENDS ${spv_file} xxd
COMMENT "Converting to hpp: ${FILE_NAME} ${CMAKE_BINARY_DIR}/bin/$<CONFIG>/xxd"
)
else()
add_custom_command(
OUTPUT ${OUTPUT_HEADER_FILE}
COMMAND ${CMAKE_COMMAND} -E echo "/*THIS FILE HAS BEEN AUTOMATICALLY GENERATED - DO NOT EDIT*/" > ${OUTPUT_HEADER_FILE}
COMMAND ${CMAKE_COMMAND} -E echo \"\#ifndef ${HEADER_FILE_DEFINE}\" >> ${OUTPUT_HEADER_FILE}
COMMAND ${CMAKE_COMMAND} -E echo \"\#define ${HEADER_FILE_DEFINE}\" >> ${OUTPUT_HEADER_FILE}
COMMAND ${CMAKE_COMMAND} -E echo "namespace kp {" >> ${OUTPUT_HEADER_FILE}
COMMAND ${CMAKE_COMMAND} -E echo "namespace shader_data {" >> ${OUTPUT_HEADER_FILE}
COMMAND ${CMAKE_BINARY_DIR}/bin/xxd -i ${RAW_FILE_NAME} >> ${OUTPUT_HEADER_FILE}
COMMAND ${CMAKE_COMMAND} -E echo "}}" >> ${OUTPUT_HEADER_FILE}
COMMAND ${CMAKE_COMMAND} -E echo \"\#endif // define ${HEADER_FILE_DEFINE}\" >> ${OUTPUT_HEADER_FILE}
DEPENDS ${spv_file} xxd
COMMENT "Converting to hpp: ${FILE_NAME} ${CMAKE_BINARY_DIR}/bin/xxd"
)
endif()
endforeach()
endfunction()

if (EXISTS "${CMAKE_CURRENT_SOURCE_DIR}/kompute/CMakeLists.txt")
message(STATUS "Kompute found")
set(KOMPUTE_OPT_LOG_LEVEL Error CACHE STRING "Kompute log level")
add_subdirectory(kompute)

# Compile our shaders
compile_shader(SOURCES
kompute-shaders/op_scale.comp
kompute-shaders/op_scale_8.comp
kompute-shaders/op_add.comp
kompute-shaders/op_addrow.comp
kompute-shaders/op_mul.comp
kompute-shaders/op_silu.comp
kompute-shaders/op_relu.comp
kompute-shaders/op_gelu.comp
kompute-shaders/op_softmax.comp
kompute-shaders/op_norm.comp
kompute-shaders/op_rmsnorm.comp
kompute-shaders/op_diagmask.comp
kompute-shaders/op_mul_mat_mat_f32.comp
kompute-shaders/op_mul_mat_f16.comp
kompute-shaders/op_mul_mat_q8_0.comp
kompute-shaders/op_mul_mat_q4_0.comp
kompute-shaders/op_mul_mat_q4_1.comp
kompute-shaders/op_mul_mat_q6_k.comp
kompute-shaders/op_getrows_f16.comp
kompute-shaders/op_getrows_q4_0.comp
kompute-shaders/op_getrows_q4_1.comp
kompute-shaders/op_getrows_q6_k.comp
kompute-shaders/op_rope_f16.comp
kompute-shaders/op_rope_f32.comp
kompute-shaders/op_cpy_f16_f16.comp
kompute-shaders/op_cpy_f16_f32.comp
kompute-shaders/op_cpy_f32_f16.comp
kompute-shaders/op_cpy_f32_f32.comp
)

# Create a custom target for our generated shaders
add_custom_target(generated_shaders DEPENDS
shaderop_scale.h
shaderop_scale_8.h
shaderop_add.h
shaderop_addrow.h
shaderop_mul.h
shaderop_silu.h
shaderop_relu.h
shaderop_gelu.h
shaderop_softmax.h
shaderop_norm.h
shaderop_rmsnorm.h
shaderop_diagmask.h
shaderop_mul_mat_mat_f32.h
shaderop_mul_mat_f16.h
shaderop_mul_mat_q8_0.h
shaderop_mul_mat_q4_0.h
shaderop_mul_mat_q4_1.h
shaderop_mul_mat_q6_k.h
shaderop_getrows_f16.h
shaderop_getrows_q4_0.h
shaderop_getrows_q4_1.h
shaderop_getrows_q6_k.h
shaderop_rope_f16.h
shaderop_rope_f32.h
shaderop_cpy_f16_f16.h
shaderop_cpy_f16_f32.h
shaderop_cpy_f32_f16.h
shaderop_cpy_f32_f32.h
)

# Create a custom command that depends on the generated_shaders
add_custom_command(
OUTPUT ${CMAKE_CURRENT_BINARY_DIR}/ggml-kompute.stamp
COMMAND ${CMAKE_COMMAND} -E touch ${CMAKE_CURRENT_BINARY_DIR}/ggml-kompute.stamp
DEPENDS generated_shaders
COMMENT "Ensuring shaders are generated before compiling ggml-kompute.cpp"
)

# Add the stamp to the main sources to ensure dependency tracking
set(GGML_SOURCES_KOMPUTE ggml-kompute.cpp ${CMAKE_CURRENT_BINARY_DIR}/ggml-kompute.stamp)
set(GGML_HEADERS_KOMPUTE ggml-kompute.h ${CMAKE_CURRENT_BINARY_DIR}/ggml-kompute.stamp)
add_compile_definitions(GGML_USE_KOMPUTE)
set(LLAMA_EXTRA_LIBS ${LLAMA_EXTRA_LIBS} kompute)
set(LLAMA_EXTRA_INCLUDES ${LLAMA_EXTRA_INCLUDES} ${CMAKE_BINARY_DIR})
else()
message(WARNING "Kompute not found")
endif()
endif()

function(get_flags CCID CCVER)
set(C_FLAGS "")
set(CXX_FLAGS "")
Expand Down Expand Up @@ -758,11 +913,12 @@ add_library(ggml OBJECT
ggml-backend.h
ggml-quants.c
ggml-quants.h
${GGML_SOURCES_CUDA} ${GGML_HEADERS_CUDA}
${GGML_SOURCES_OPENCL} ${GGML_HEADERS_OPENCL}
${GGML_SOURCES_METAL} ${GGML_HEADERS_METAL}
${GGML_SOURCES_MPI} ${GGML_HEADERS_MPI}
${GGML_SOURCES_EXTRA} ${GGML_HEADERS_EXTRA}
${GGML_SOURCES_CUDA} ${GGML_HEADERS_CUDA}
${GGML_SOURCES_OPENCL} ${GGML_HEADERS_OPENCL}
${GGML_SOURCES_METAL} ${GGML_HEADERS_METAL}
${GGML_SOURCES_MPI} ${GGML_HEADERS_MPI}
${GGML_SOURCES_EXTRA} ${GGML_HEADERS_EXTRA}
${GGML_SOURCES_KOMPUTE} ${GGML_HEADERS_KOMPUTE}
)

target_include_directories(ggml PUBLIC . ${LLAMA_EXTRA_INCLUDES})
Expand Down
8 changes: 8 additions & 0 deletions examples/main/main.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -31,6 +31,10 @@
#pragma warning(disable: 4244 4267) // possible loss of data
#endif

#if defined(GGML_USE_KOMPUTE)
#include "ggml-kompute.h"
#endif

cebtenzzre marked this conversation as resolved.
Show resolved Hide resolved
static llama_context ** g_ctx;
static llama_model ** g_model;
static gpt_params * g_params;
Expand Down Expand Up @@ -182,6 +186,10 @@ int main(int argc, char ** argv) {
g_model = &model;
g_ctx = &ctx;

#if defined(GGML_USE_KOMPUTE)
ggml_vk_init_device(0, "gpu");
#endif
cebtenzzre marked this conversation as resolved.
Show resolved Hide resolved

// load the model and apply lora adapter, if any
LOG("%s: load the model and apply lora adapter, if any\n", __func__);
std::tie(model, ctx) = llama_init_from_gpt_params(params);
Expand Down
6 changes: 6 additions & 0 deletions ggml-backend.c
Original file line number Diff line number Diff line change
Expand Up @@ -342,6 +342,12 @@ static void ggml_backend_registry_init(void) {
extern ggml_backend_buffer_type_t ggml_backend_metal_buffer_type(void);
ggml_backend_register("Metal", ggml_backend_reg_metal_init, ggml_backend_metal_buffer_type(), NULL);
#endif

#ifdef GGML_USE_KOMPUTE
extern ggml_backend_t ggml_backend_reg_kompute_init(const char * params, void * user_data);
extern ggml_backend_buffer_type_t ggml_backend_kompute_buffer_type(void);
ggml_backend_register("Kompute", ggml_backend_reg_kompute_init, ggml_backend_kompute_buffer_type(), NULL);
#endif
}

void ggml_backend_register(const char * name, ggml_backend_init_fn init_fn, ggml_backend_buffer_type_t default_buffer_type, void * user_data) {
Expand Down
Loading
Loading