-
Notifications
You must be signed in to change notification settings - Fork 14.4k
[Do Not Merge] model : LFM2.5-Audio-1.5B #18641
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
[Do Not Merge] model : LFM2.5-Audio-1.5B #18641
Conversation
c275436 to
e1a8fd1
Compare
|
If the string |
Or that. We just have to remember to remove them all from the merge message. :) |
Change is decoupled from ggml-org#18641. [LFM2.5-Audio-1.5B](https://huggingface.co/LiquidAI/LFM2.5-Audio-1.5B) needs streaming istft for generating output audio. * add streaming ISTFT class (`mtmd_audio_streaming_istft`) with overlap-add for audio reconstruction * replace global audio cache with per-instance cache, the model requires two independent caches, for preprocessing (audio input) and for istft (audio output). * unified templated FFT/IFFT implementation supporting both forward and inverse transforms
… tarek/feat/os-lfm2.5-audio-1.5b-upstream [no ci]
Change is decoupled from #18641. [LFM2.5-Audio-1.5B](https://huggingface.co/LiquidAI/LFM2.5-Audio-1.5B) needs streaming istft for generating output audio. * add streaming ISTFT class (`mtmd_audio_streaming_istft`) with overlap-add for audio reconstruction * replace global audio cache with per-instance cache, the model requires two independent caches, for preprocessing (audio input) and for istft (audio output). * unified templated FFT/IFFT implementation supporting both forward and inverse transforms
|
Hello Tarek, I am trying to build your WIP PR. With the last commit: 'Read n_layer from gguf', using LTO, building fails at the very end of building here: llama-server and llama-liquid-audio-server are succefully built, cli fails. If there is anything I can do to help testing let me know. Thank you so much. |
|
@elfarolab , mentioned commit didn't change anything related to compilation or LTO, could it be that there are stale object files somewhere? Tested that the clean build in UPD: it's related to miniaudio cli defines implementation here https://github.com/ggml-org/llama.cpp/pull/18641/changes#diff-73f13371b37801825dc2cdbfacadf9af40aef9dca4770d9dacbbe3534c7a7dacR13 , another implementation is defined in mtmd audio. try commenting this line |
|
Before building I delete the building destination directory every time. I always build llama.cpp the same way with the options above, never get failures. |
|
@elfarolab , it should work now, there were two implementations of miniaudio |
rebuilding |
|
Built successfully. that CUDA error: out of memory is weird, I am not running anything.. I reboot just in case |
|
with fresh system, anything GPU or CPU intensive running: free -h loading tensors.. then: free -h same error: |
|
audio file is < 30 sec duration I know, 24KHz looks strange but it is because later is used with libopus |
|
@elfarolab can you try with a smaller audio file (e.g., 4 seconds)? Add |
TEST Q4_0with fresh rebooted system, anything GPU or CPU intensive is running: tegrastats same error: looks like the model try to expand into the whole available memory and beyond Full Log |
|
@elfarolab , I was able to reproduce the issue on Orin Nano, will look into it |
|
@elfarolab temporary solution will be to build without VMM |
ok rebuilding now with -DGGML_CUDA_NO_VMM=ON |
|
The alternative is to build with VMM, but reduce the VMM size to 16 GB with --- a/ggml/src/ggml-cuda/ggml-cuda.cu
+++ b/ggml/src/ggml-cuda/ggml-cuda.cu
@@ -418,7 +418,7 @@ struct ggml_cuda_pool_leg : public ggml_cuda_pool {
// pool with virtual memory
#if defined(GGML_USE_VMM)
struct ggml_cuda_pool_vmm : public ggml_cuda_pool {
- static const size_t CUDA_POOL_VMM_MAX_SIZE = 1ull << 35; // 32 GB
+ static const size_t CUDA_POOL_VMM_MAX_SIZE = 1ull << 34; // 16 GB
int device;
CUdeviceptr pool_addr = 0; |
|
I found this discussion about VMM on Orin:
On Orin and on all Jetson series and DGX Spark memory is unified. Thank you! |
|
Orin Nano reports lfm2.5-audio creates 4 cuda context and 4 cuda pools, with CUDA_POOL_VMM_MAX_SIZE =16GB, they fit, with CUDA_POOL_VMM_MAX_SIZE = 32GB they don't. For 4070 For Orin |
|
it works with -DGGML_CUDA_NO_VMM=ON |
Do you suggest to use -DGGML_CUDA_NO_VMM=ON or rather to reduce the VMM memory size with: CUDA_POOL_VMM_MAX_SIZE =16GB |
I suggest trying both and picking the option that works faster |
|
I can confirm that also ASR with llama-liquid-audio-server works as expected in streaming mode only. Do you plan to integrate your changes to llama-server when everything is working? Thank you so much. |
|
@elfarolab , thanks for testing!
Note that ASR works with the existing llama-cli from the main branch.
We are working on it. |
Liquid AI released LFM2.5-Audio-1.5B.
This PR is intended to provide a functional implementation in
llama.cppuntil necessary infrastructure is implemented.The plan is to split and merge it into upstream in smaller chunks, while keeping and tracking functional implementation here. It will be rebased from time to time.
GGUFs, precompiled runners, and instructions, live in https://huggingface.co/LiquidAI/LFM2.5-Audio-1.5B-GGUF.
Merge plan:
n_embd_outmodel : add LFM2-ColBert-350M #18607Demo of capabilities (watch with audio on)
demo.mp4
Thank you, @ngxson for the help!