-
Notifications
You must be signed in to change notification settings - Fork 109
Description
Hi MLCommons Tiny folks,
I wanted to share a small but unusual MCU language-runtime experiment and ask whether systems like this suggest a benchmark gap in the current Tiny landscape.
We built a public demo line called Engram and deployed it on a commodity ESP32-C3.
Current public numbers:
-
Host-side benchmark capability
LogiQA = 0.392523IFEval = 0.780037
-
Published board proof
LogiQA 642 = 249 / 642 = 0.3878504672897196host_full_match = 642 / 642- runtime artifact size =
1,380,771 bytes
Important scope note:
This is not presented as unrestricted open-input native LLM generation on MCU.
The board-side path is closer to a flash-resident, table-driven runtime with:
- packed token weights
- hashed lookup structures
- fixed compiled probe batches
- streaming fold / checksum style execution over precompiled structures
So this is not a standard vision/KWS/anomaly micro model. It is closer to a task-specialized language runtime whose behavior has been pushed into a very compact executable form.
Repo:
https://github.com/Alpha-Guardian/Engram
What I’m genuinely curious about is whether systems like this point to a missing benchmark category in the TinyML / MCU benchmark ecosystem.
Would something like the following make sense as a future benchmark direction?
- constrained language-task execution
- auditable board-measured language behavior
- fixed-memory / fixed-artifact board deployment
- explicit separation between host benchmark capability and board execution mode
If people here think this is out of scope for MLCommons Tiny, that would also be useful to know.