-
Notifications
You must be signed in to change notification settings - Fork 13.9k
RVV1.0 Supported tests for RISC-V #16682
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
@alitariq4589 Any progress on this? Also, it looks like the runner has gone offline... |
|
@CISC There was a minor internet outage. It is fixed now. Can you check and let me know if the board still shows offline? If that is the case, I will restart the container.
I had ordered and integrated 9 RISC-V boards with RVV1.0 so all tests can run smoothly, I am currently testing the ccache by running multiple tests if it works. Here is the build. As soon as I am done testing, I will open the PR for review. Additionally I will also need a single runner token for adding all those boards in llama.cpp repository once I open the PR for review. It would be great if we have some faster means of communication other than issues and emails. Do you use some other messaging platform (discord, mastodon etc.?). If it is okay, you can also join this discord server. |
I cancelled all the old jobs, but there are currently 2 new ones queued and not picked up yet.
Great, ping Georgi when you do.
Sorry, email only. |
|
@CISC I have restarted the runner and it is picking up the jobs again |
Corrections included: 1. Changed the test names from debian to ubuntu as it is more stable than Debian Trixie 2. Added explicit compiler in cmake command as GCC compiler below version 14 have been recorded to throw errors with rvv1.0 and some other extensions 3. Added dependencies which are not installed by default in the RISC-V Ubuntu 24.04 4. Separate ccache directory for all jobs as all the ccache results are not the same and may cause ccache to not work
|
@CISC This PR is ready for review. I have excluded The result of the added builds can be seen here in my fork. There are multiple attempts, so you can check each of them. As the number of boards integrated is greater than the number of tests ported for RISC-V (because there may be some builds in the queue to, these boards will offload some time), the ccache effect is not immediated visible in 4 attempts, but I have tested individually and so far, according to the stats, the ccache seems to be working (check this and this job builds too for ccache results in which I executed each job multiple times for checking ccache results). Following RISC-V boards will be integrated once I get the token.
For checking the resource utilization, I have set up grafana to track usage. This tracking site tracks the usage of the host machine and not the containers in which the builds will be running. Use the following links to view resource usage.
NOTE: Since our network engineer is out of the office for the next couple of days, @ggerganov, please share a github runner token when you can, and I will use that to register all these boards for builds. The token will be valid for one hour after generation, so let me know as soon as you generate it. You can send it to me at my email. Let me know if anyone has any questions 🙂 |
ggerganov
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@alitariq4589 Sending you the token in a min
ci/run.sh
Outdated
| <<<<<<< HEAD | ||
| CMAKE_EXTRA="-DLLAMA_FATAL_WARNINGS=${LLAMA_FATAL_WARNINGS:-ON} -DLLAMA_CURL=ON" | ||
| ======= | ||
| CMAKE_EXTRA="-DLLAMA_FATAL_WARNINGS=ON -DLLAMA_CURL=ON -DGGML_SCHED_NO_REALLOC=ON" | ||
| >>>>>>> master |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
conflict
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let me check.
BTW, the background of this change is that, since there is a warning (in the very first comment), tests were failing so I had to turn the "warnings as errors" off for CI to pass. I think we also need to create an issue and ping the contributor for this change. Maybe the RVV1.0 intrinsics change somewhere which is causing this error.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@xctan Could you take a look at this compile warning?
In file included from ../../../ggml/src/ggml-cpu/arch/riscv/quants.c:6:
../../../ggml/src/ggml-cpu/simd-mappings.h: In function 'riscv_compute_fp16_to_fp32':
../../../ggml/src/ggml-cpu/simd-mappings.h:101:9: error: ISO C does not support the '_Float16' type before C23 [-Werror=pedantic]
101 | _Float16 hf;
| ^~~~~~~~
../../../ggml/src/ggml-cpu/simd-mappings.h: In function 'riscv_compute_fp32_to_fp16':
../../../ggml/src/ggml-cpu/simd-mappings.h:108:9: error: ISO C does not support the '_Float16' type before C23 [-Werror=pedantic]
108 | _Float16 hf = (_Float16)f;
| ^~~~~~~~
../../../ggml/src/ggml-cpu/simd-mappings.h:108:24: error: ISO C does not support the '_Float16' type before C23 [-Werror=pedantic]
108 | _Float16 hf = (_Float16)f;
| ^~~~~~~~
cc1: all warnings being treated as errors
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@ggerganov strangely I dont see conflicts after merging. I dont see conflicts here on the github ui too. Are you testing this on older commit? I just merged master branch to my branch and it seems okay.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You have committed the merge conflict text, it just needs cleaning up.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The _Float16 type seems to be the only way to get the compiler's built-in code generation to work for Zfh and Zvfh extensions. The catch is, this type is only available starting with the ISO C C23 standard. Otherwise, we'd have to resort to inline assembly, which isn't ideal for register usage. Plus, vector intrinsic functions also need this type, and float16_t is exclusively defined in C++ headers (since C++23). So, given all that, maybe we can just disable this diagnostic when _Float16 types are being used?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@CISC My bad. I have cleaned it up now.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So, given all that, maybe we can just disable this diagnostic when _Float16 types are being used?
Sounds good
|
@CISC the @ggerganov Thanks for sharing the token. Can you please confirm the following added runners in the repository settings as online? jupiter-16G-1 |
Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com>
You tell me. :) If it is of no further value to you it can be removed. |
Since it was putting extra load on the runners, I have removed it. Everything is already covered in the newly added tests. |
|
Lots of whitespace errors: |
I had added those for verbosity, but I have now removed them. Can you please cancel other CI runs which passed in previous commits? Will save us a lot of time |
It's not the linebreaks that's the issue, it's the indentation spaces. CIs will autocancel when you push a new commit, however we have a general congestion atm... |
0c9f0a0 to
a2a24c6
Compare
Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com>
I see. I have removed the trailing spaces now. I also ran the editor-config locally to check and seems good to go. |
Will merge once EditorConfig and your CIs finish. |
|
Hmmm, it seems no tests are actually run? Edit: Ah, you disable building tests. :) |
|
I think |
Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com>
Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com>
That is expected right now because there are a total of 9 boards, and currently, there is no ccache recorded. For ccache to start hitting, every job should have been run at least once on each board. There are currently 6 jobs. If i am not wrong, the probability that the job in the next run will pick the same runner is 1/9. We need around (approx.) 50 runs to start seeing the ccache results because job picks up runner randomly out of 9 (pardon my probability knowledge if I'm mistaken somewhere 🥲) |
|
@CISC You can see the llguidance took 10 minutes this time |
|
Looks like git-lfs is not inited. Also, |
|
Looks good now, will merge once the rest are done, |

This PR adds tests supported for RISC-V
Tests which are added for execution with RISC-V
debian-cpu-cmake-rv64-nativedebian-trixie-cmake-sanitizer-riscv64-native* 3debian-trixie-llguidance-riscv64-nativedebian-trixie-cmake-rpc-riscv64-nativeggml-ci-riscv64-native-cpu-low-perfDependencies which are not yet supported for RISC-V:
rocblas-devhipblas-devvulkan-sdkmthreads/musa:rc4.3.0-devel-ubuntu22.04-amd64intel-oneapi-compiler-dpcpp-cppintel-oneapi-mkl-develcudamacOS*windows*torchTests which are not added for RISC-V due to above unmet dependencies:
macOS-latest-cmake-arm64macOS-latest-cmake-x64macOS-latest-cmake-arm64-webgpuubuntu-24-cmake-vulkanubuntu-22-cmake-webgpuubuntu-22-cmake-hipubuntu-22-cmake-musaubuntu-22-cmake-syclubuntu-22-cmake-sycl-fp16macOS-latest-cmake-iosmacOS-latest-cmake-tvosmacOS-latest-cmake-visionosmacOS-latest-swiftwindows-msys2windows-latest-cmakeubuntu-latest-cmake-cudawindows-2022-cmake-cudawindows-latest-cmake-syclwindows-latest-cmake-hipios-xcode-buildandroid-buildopenEuler-latest-cmake-cannggml-ci-x64-nvidia-cudaggml-ci-x64-nvidia-vulkan-cmggml-ci-x64-nvidia-vulkan-cm2ggml-ci-x64-cpu-amxggml-ci-mac-metalggml-ci-mac-vulkanggml-ci-arm64-cpu-high-perf-sveggml-ci-riscv64-native-cpu-high-perfAdditional Notes
Note 1
Due to a warning (treated as error) related to RISC-V simd mappings
-DLLAMA_FATAL_WARNINGS=ONhas to be turned off for all the tests otherwise CI fails(This can be created a separated issue. I can track the contributor and ping him if you want)
Note 2
One RISC-V board may not be optimal for running all these tests so by the end of RISC-V summit North America, we are expecting more boards to arrive (around mid of November). So this PR can be treated as draft for reviews till then.