Inspiration from llama.cpp: Implementing a 4.3B Tokens/s Stateless Data Engine #22586
Anh-Khoa-PC
started this conversation in
General
Replies: 0 comments
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
Uh oh!
There was an error while loading. Please reload this page.
-
Hi Georgi (@ggerganov),
I'm Khoa, a 15-year-old developer from Vietnam. Inspired by your work on making AI accessible on consumer hardware, I've developed Vantage V-AI, a stateless data engine that achieves 4.3 Billion Tokens/s on a standard 12-thread CPU with minimal RAM footprint (< 2MB).
Like your projects, Vantage focuses on CPU optimization and memory efficiency using LZ4-stream and zero-copy batching. I’d love to get your thoughts on the architecture or any potential integration with high-performance C++ workflows.
Repo: [](https://github.com/Anh-Khoa-PC/VANTAGE-V-AI)
Keep up the amazing work with llama.cpp!
Beta Was this translation helpful? Give feedback.
All reactions