This repo has:
- UI app (Android): Java + native (C/C++) using OpenCL.
- CLI (Android): native binary built with the Android NDK.
- CLI (Desktop): same native code, desktop build to debug outside the phone.
Tested hardware: UI verified on an ARM64 device with a Mali GPU. The code also builds for ARMv7 and x86_64, but those weren’t runtime‑tested here.
- UI (APK):
arm64-v8a
,armeabi-v7a
,x86_64
(tested only onarm64-v8a
). - CLI (Android): selectable via
ARCH
(see Makefile). - CLI (Desktop): host machine.
- Android SDK + NDK (r27 in Makefile defaults) and CMake.
- Device/vendor OpenCL drivers (ICD) on the phone/tablet.
- For desktop CLI: a desktop OpenCL runtime/ICD.
- Android SDK location (
local.properties
): Gradle expects alocal.properties
file at the repo root withsdk.dir=/absolute/path/to/Android/Sdk
. This file is typically untracked and machine‑specific. Ensuresdk.dir
(or environmentANDROID_HOME
/ANDROID_SDK_ROOT
) points to a valid SDK. Without it, Gradle builds (e.g.,make ui
) will fail. - Makefile toolchain paths: The
Makefile
provides fallbacks forCMAKE_PATH
andTOOLCHAIN_FILE
under$(HOME)/Android/Sdk/...
(or the Windows equivalent). If your SDK/NDK versions or install paths differ, override via environment variables when invoking make, for example:CMAKE_PATH=/path/to/cmake CMAKE_TOOLCHAIN_FILE=/path/to/android.toolchain.cmake make cli
- Other useful overrides:
ARCH=arm64-v8a|armeabi-v7a|x86_64
,API=24
,BUILD_TYPE=Debug|Release
.
- CMake OpenCL headers and libraries: The native build (
app/src/main/cpp/CMakeLists.txt
) includes headers fromapp/src/main/cpp/OpenCL
and imports an OpenCL library depending on the ABI:- Headers: ensure
OpenCL/CL/*.h
(e.g.,CL/cl.h
) are present or point includes to your SDK. - Libraries: the imported
.so
/.dll
are referenced asOpenCL/libOpenCL64.so
,OpenCL/libOpenCL32.so
,OpenCL/libOpenCLx86_64.so
(Android ABIs) andOpenCL/OpenCL.dll
+OpenCL/OpenCL.lib
(Windows desktop). You may need to replace these with your vendor ICD/loader or adjust the paths to match your setup. - For desktop builds, you can alternatively switch to a system installation via
find_package(OpenCL REQUIRED)
and link the discovered targets instead of the imported ones.
- Headers: ensure
- ABI/runtime match: Ensure the OpenCL library you link matches the target ABI (e.g.,
arm64-v8a
on device,x86_64
on desktop). Mismatches will lead to load/link errors at runtime.
Using the provided Makefile (see configurable vars inside):
# UI APK (Debug/Release)
make ui BUILD_TYPE=Debug # or Release
# CLI (Android). Choose ABI and API level as needed
make cli ARCH=arm64-v8a API=24 BUILD_TYPE=Release
# ARCH options typically: arm64-v8a | armeabi-v7a | x86_64
# CLI (Desktop) to debug native code on your machine
make cli_desktop BUILD_TYPE=Debug
Outputs:
- UI APKs:
app/build/outputs/apk/<variant>/
. - Android CLI:
build_cli/
(binary and libs). - Desktop CLI:
build_cli_desktop/
.
- UI: install the APK (
adb install ...
) and run on device. - Android CLI:
adb push
the binary to the device and run from shell. Make sure the device has working OpenCL drivers. - Desktop CLI: run the built binary locally; ensure your OpenCL ICD is installed.
- This is a PoC; expect rough edges. Check the
Makefile
for build switches and paths. - The desktop CLI exists to validate the native OpenCL path when mobile debugging is painful.
- UI flow:
MainActivity
collects two matrices (or generates random values), validates shapes, then calls into JNI foradd
,sub
,matmul
, ortranspose
. Results (or errors) are shown inResultActivity
. - JNI bridge: The app loads the native library
openclandroidexample
and exposes native calls likeprocessCommand(...)
,dumpOpenCLDevices()
, andgetGpuName()
. - Native/OpenCL: The native code executes the chosen operation on the GPU (when available) using OpenCL kernels, then returns a printable matrix string back to the UI.
Tip: Use the "GPU Info" button to dump platforms/devices detected by the device's OpenCL ICD.
- Enter rows/cols for Matrix A and Matrix B. Leave values blank to auto‑generate random matrices.
- Tap one of the operations (Add/Sub/MatMul/Transpose). For
transpose
, only Matrix A is used. - View the result or error details on the next screen.
- If the device has no OpenCL ICD or the vendor library is missing, the app may fall back or fail during device/context creation.
- Ensure the target ABI matches your device (
arm64-v8a
for most modern phones). - Some vendors restrict OpenCL features; try smaller matrices first and verify with the desktop CLI.
- Keep a single context, device, and command queue for the app lifetime instead of reinitializing per ops.
- Create and cache kernels/programs once (or lazily on first use), then reuse.
- Precompile kernels (where supported) and cache program binaries with
clGetProgramInfo(CL_PROGRAM_BINARIES)
; reload withclCreateProgramWithBinary
to avoid rebuilds. - Maintain a simple kernel registry keyed by op and matrix shape characteristics.
- Reuse device buffers (memory pools) to cut allocations and reduce GC/driver overhead.
- experimnt with local work‑group sizes and tiling; choose per‑device defaults via
clGetDeviceInfo
. - Use event profiling (
CL_QUEUE_PROFILING_ENABLE
) to measure enqueue, transfer, and kernel times. - Add shape checks and clearer error messages at the JNI boundary.
- Provide sample presets and a quick benchmark mode in the UI.
- integrate a BLAS library for CPU matmul to compare.?