Middle Ground between dual and dual - turbo? #33

JusefPol · 2026-05-02T19:51:53Z

JusefPol
May 2, 2026

Just noticed that the dual is using simple fp8 and dual turbo is turboquant 3bit. Have you tested already or do you have any plans to try a version with turboquant_k8v4? it might give just enough room without any real penalty on tool calling and quality.

noonghunna · 2026-05-02T20:19:42Z

noonghunna
May 2, 2026
Maintainer

Good question and the intuition is sound — k8v4 (8-bit K, 4-bit V, ~6 bits avg/token) genuinely sits in the middle ground you're describing. We have data on it from earlier rounds, and an explicit decision history worth surfacing.

What we measured

From BENCHMARKS.md (TP=2 dual-card, full 262K, mem-util 0.85, max-num-seqs=4):

Variant	KV pool tokens	Concurrency at 262K	Narr TPS	Code TPS	VRAM/card
`dual.yml` (fp8_e5m2)	168,000	2.36×	71	89	22.4 GB
`dual-turbo.yml` k8v4 (was)	764,544	2.59×	60	77	23.67 GB
`dual-turbo.yml` TQ3 (now)	1,498,464	4.59×	58	69	24.09 GB

So vs fp8: k8v4 buys you 4.5× the KV pool and +10% concurrency at the cost of −15% per-stream TPS. vs TQ3: k8v4 keeps +3% narr / +12% code per-stream but loses 2× the KV pool and 45% of the concurrency headroom. Quality-wise, 8-bit K is close to fp16; 3-bit nc has measurable cascade-risk on long-context decoding (which is why we bias toward fp8 for IDE-agent workloads even though TQ3 has more pool capacity).

Why we swapped k8v4 → TQ3 on 2026-04-28

dual-turbo.yml shipped k8v4 in the predecessor qwen36-dual-3090 repo, then we aligned it to TQ3 when consolidating into club-3090. The reasoning at the time: we wanted dual-turbo to be unambiguously the "max concurrency at full ctx" variant, and TQ3 wins that workload by ~2×. Conclusion in BENCHMARKS.md: "k8v4 doesn't have a clear niche between fp8 and 3bit_nc."

That conclusion was framed around raw TPS vs concurrency. Your question reframes it around quality preservation — which is a legitimately different axis we didn't optimize for explicitly.

What's changed since that decision

Two things, neither of them re-tested on dual-card yet:

Sandermage 2026-04-29 unlock: TurboQuant k8v4 was unlocked on hybrid GDN via P4 + P98 — measured +1.9% on Sander's A5000. We noted "we'll bench on dual" but never followed through.
Genesis v7.69 substrate (2026-05-02 PM): PN30 part3 + P103 worker self-install + PN32 chunked-prefill. None of these target k8v4 specifically but the broader stability fixes apply.

So the 60/77 numbers above are from the v7.51-stable era. A re-bench against current substrate would likely move them — possibly meaningfully.

What I'd propose

Ship a third dual variant: docker-compose.dual-k8v4.yml. Same shape as dual-turbo.yml (Genesis env-vars active, MTP n=3, vision on) but with --kv-cache-dtype turboquant_k8v4. Pick a target between TQ3's 4-stream concurrency and fp8's 2-stream — probably 3 streams at full 262K, which fits comfortably in the 764K KV pool budget.

Concrete plan:

Author the compose file off dual-turbo.yml's pattern
Re-bench narrative + code TPS, AL, recall ladder on current substrate
Run a tool-call quality A/B vs fp8 (canonical bench prompts + a couple of multi-turn IDE-agent shapes)
If the numbers justify it (within ~5% per-stream TPS of fp8 + ~1.5× concurrency), ship it as the "balanced quality + concurrency" middle option

Cross-rig opportunity

Saw your PR #31 for the NVLink variant — once we have a dual-k8v4.yml to test against, would you want to bench it on your NVLink rig? Cross-rig data would clarify whether the NVLink path changes the per-stream TPS calculus enough to flip the decision differently than PCIe.

I'll get the compose drafted + benched against current substrate over the next day or two and update this thread with numbers. If the middle-ground holds up under measurement, it ships.

0 replies

JusefPol · 2026-05-03T12:44:12Z

JusefPol
May 3, 2026
Author

Yes, I'd love to, as I mentioned on the other discussion. I am really interested to push parallelism, so if you get a working variant where you see almost no degradation in quality, as soon as you get it out, I will test it, both default, and high-concurrency mode

0 replies

noonghunna · 2026-05-03T12:51:49Z

noonghunna
May 3, 2026
Maintainer

Logical next variant. Sequence makes sense:

Now — your existing 4 streams × 142K + NVLink config (TQ3 or fp8) is the documented baseline. Numbers in #29.
Next — vllm/dual-nvlink-k8v4 as a sibling: same NVLink env (P2P_LEVEL=NVL, custom-AR enabled, max_split_size_mb cap), but with --kv-cache-dtype turboquant_k8v4 and the matching Genesis env. ~0.375 KB/token vs TQ3's ~0.25 — gives you ~25% less pool than TQ3 but K stays at 8-bit which is the bit that matters for tool-call attention adherence.
Stretch — push concurrency to 6 streams at 96-100K with TQ3 + NVLink, see if 6 × 100 tok/sec aggregate beats 4 × 130 tok/sec on real multi-agent traffic.

Math + operating-point discussion is in the Discussion #29 reply just now (link) so I won't duplicate it here. Short version: the k8v4 sibling is the cleanest next thing to ship; the 6-stream stretch needs ctx trimming to fit the pool.

I won't build ahead of test signal — your 4×142K + NVLink config is one cross-rig data point, and shipping a k8v4 sibling without anyone running it would be premature commitment. When you've had a chance to run dual-nvlink in real opencode/hermes traffic for a week or two and have stability data, that's the right moment to spec the k8v4 variant — your real workload tells us where the useful operating point actually is, vs where the math says it could theoretically be.

0 replies

JusefPol · 2026-05-04T17:31:40Z

JusefPol
May 4, 2026
Author

I am publishing here my first results trying the turboquant_k8v4 variant with context window of 131072 (128k) and concurrency of 5 (I had to increase gpu utilization from 0.85 to 0.9 from the dual turbo variant)

This results come with the NVlink modifications active.
Interesting things that happened, although I can't interpret them was, during the execution of some of the tests (don't know which ones) there was a warning on the logs with the comment:

[17:20:34] /project/cpp/grammar_matcher.cc:497: Warning: The matcher has terminated after accepting the stop token, but is trying to accept new token with id 198.

I see one of the tests failed, but there is no specific log in the docker compose that might indicate what happened.

my-rig-tqk8v4.md

It's too soon to ship this on a PR yet, let me know if you see something weird on the results, and I will try for a while with real use case data.

0 replies

noonghunna · 2026-05-04T17:39:00Z

noonghunna
May 4, 2026
Maintainer

@JusefPol — interesting middle-ground attempt. Quick read on the two anomalies you flagged, plus what I'd want to see before this becomes a shippable variant.

On the `grammar_matcher.cc:497` warning

That specific warning is benign, ignore it:

The matcher has terminated after accepting the stop token, but is trying to accept new token with id 198.

xgrammar's matcher tracks state across token emissions; once the stop token is accepted, the matcher transitions to "terminated" state. Token 198 is \n in the Qwen3 tokenizer — what's happening is the engine is emitting a final newline (or the spec-decode draft batch contained one) right after the model emitted its stop token. The matcher logs the warning because in principle no more tokens should be coming; in practice this is harmless — the engine drops the trailing token and finishes the request normally.

This is NOT the same class as the silent-empty grammar reject we're tracking in #43 / #47 / #50 — that one fires WARNING: Unexpected: grammar rejected tokens [60, 248046] (different log line, different mechanism: every candidate gets masked and the model emits zero tokens). Yours is the post-stop "extra token arrived after termination" path.

On the unspecified test failure

That's the more concerning signal and I can't diagnose without more data. The [my-rig-tqk8v4.md] attachment didn't include a docker-logs snippet narrow enough to pin which probe failed. Could be:

Failure class	Symptom	What to check
Silent-empty (#43, #47, #50 class)	HTTP 200 + 0 completion tokens, model "thought" then output nothing	`docker logs ... 2>&1 \| grep -iE "grammar.*reject"` should be empty if not this
OOM during prefill at 131K ctx	HTTP 500 with allocator trace	`grep -iE "out of memory\|cudaMalloc"`
Cliff 2-class GDN forward OOM	HTTP 500, traces through `chunk_gated_delta_rule_fwd`	Probe 7 of verify-stress at 60K-90K
Concurrency 5 contention	One stream's HTTP error during multi-stream test	`grep "Engine.*0 reqs"` near the failure window

If you can post the specific verify-stress probe number (1/7 through 7/7) or the failing test's name + the docker logs vllm-qwen36-27b-dual-turbo 2>&1 | tail -100 around the moment of failure, I can disambiguate. The "no specific log" report you got might just be that the failure is a soft-empty (HTTP 200 + 0 tokens) which doesn't surface as an error in the engine logs — exactly the gap commit f32d8a6 closes for soak-test specifically. Could you re-run with git pull to get the silent-empty detection?

On the K8V4 + 131K + concurrency=5 + NVLink config itself

Conceptually this is a sensible middle ground:

Variant	KV format	Per-token KV bytes (after TP=2)	Concurrency at 262K	Activation budget
`dual.yml`	fp8_e5m2	1.0	2.47×	comfortable
Yours: K8V4 + 131K	k8v4 (8-bit K, 4-bit V)	0.625	~5× at 131K	tighter
`dual-turbo.yml`	turboquant_3bit_nc	0.375	4.67× at 262K	tightest

K8V4 is the asymmetric pick from the lucebox-hub family (PR #56 lineage) — 8 bits on K because K is more sensitive to quantization noise than V, 4 bits on V because V tolerates aggressive quant well. Should give you better accuracy on long-ctx tasks than TQ3 with similar KV-pool capacity. So if it survives stress tests cleanly, it's worth shipping as a third variant.

What I'd want to see before merging it into the variant matrix:

verify-stress.sh 7/7 PASS at your 131K config — including 60K + 90K Cliff 2 needles. That's the gate other dual-* variants pass.
SOAK_MODE=continuous PASS at concurrency=5 — multi-turn at high concurrency is where Cliff 2b shows up (still pending the upstream issue we're tracking; even passing soak today is meaningful).
bench.sh numbers showing TPS isn't a regression vs dual-turbo.yml (NVLink active should help here, since PCIe-only dual-turbo bench is your reference).
The unspecified test failure resolved — whatever it was, we need to know what you're trading.

If you're willing to put together a full bash scripts/report.sh --full and post it as a numbers-from-your-rig issue, that'd give us the canonical data shape we use for cross-rig validation. Then we can discuss merging this into a dual-balanced.yml or similar third-rung variant.

The NVLink win you reported on the original dual.yml (the +57% TPS finding from your earlier numbers) means K8V4 + NVLink + concurrency=5 is a configuration we don't have a published number for anywhere on this stack — meaningful contribution if it pans out.

TL;DR

The grammar_matcher warning: benign, ignore.
The unspecified test failure: need more data — specifically which probe failed and docker logs around the failure window. Try git pull first; soak-helper now flags silent-empty automatically (f32d8a6).
The K8V4 + 131K + concurrency=5 + NVLink concept is sensible. Worth bringing to a numbers-from-your-rig issue with a full verify-stress + soak + bench. If it clears those gates, it's a real third variant the matrix doesn't cover today.

2 replies

JusefPol May 4, 2026
Author

You'll have to guide me to track the failure of the stress test, as is not appearing anywhere. I have ran specifically the bash scripts/verify-stress.sh as the error appeared there. Error appeared again:

Running STRESS / boundary test against http://localhost:8011 (model=qwen3.6-27b-autoround, container=vllm-qwen36-27b)
This script does the heavy stuff (longctx needle ladder + ~25K-token tool prefill).
For the fast functional smoke (~2 min), use verify-full.sh instead.

[1/7] Long-context needle small rungs (10K / 30K) ...
✓ 9819 tokens: recalled 'amber iguana 54' (got: The hidden phrase is 'amber iguana 54'. )
✓ 29320 tokens: recalled 'emerald chinchilla 74' (got: emerald chinchilla 74 )
✓ all long-ctx depths recalled secret correctly
[2/7] Tool response prefill OOM (~25K-token mock tool response) ...
✓ tool prefill OK — model emitted 1 tool_call(s) (finish=tool_calls, prefill survived)
[3/7] IDE-agent one-shot prompt (sys + tool schemas + user request) ...
✓ IDE-agent one-shot OK — 109 completion tokens (195 chars), finish=stop
[4/7] Multi-turn agent prompt (sys + tools + 4-turn history) ...
✓ multi-turn agent OK
[5/7] LCB-coding shape (LeetCode-style problem + structured plan) ...
✓ LCB-coding shape OK
[6/7] Reasoning-heavy (math problem + max_tokens=8192) ...
✓ reasoning-heavy OK — 8192 completion tokens
[7/7] Long-context needle large rungs (60K / 90K — Cliff 2 territory) ...
✓ 58570 tokens: recalled 'crimson narwhal 67' (got: crimson narwhal 67 )
✗ scale=1400: HTTP 000 (request failed)
✗ partial recall — some in-budget depths failed
→ Attention quality degrades at longer contexts on this config OR the deployment crashed mid-test. Check docker logs.

Here attach is the log trace
failure-stress.md

Additional info I can tell you is that the log line at 05-04 17:34:34 is the FIRST test of this last stress test executed. The last two POST commands on the trace correspond to the Long-context needle large rungs test.
The first of the two happened at 05-04 17:36:34, took a bit less than 3 minutes, and was succesfull (the crimson narwhal 67),

Then the second test, started at INFO 05-04 17:39:14, after some time 5-6 minutes, no more lines are generated on the log and the test ends with the error. Not sure where else I can find more information about what happened.

noonghunna May 4, 2026
Maintainer

@JusefPol — diagnosed from your attached log. Two important observations:

What actually happened

It's not a crash, it's an engine hang during 90K prefill. The engine never crashed — it stayed alive with KV cache usage at 8.5% the whole time, but the request froze and never completed. The verify-stress harness eventually returned HTTP 000 because curl timed out client-side. That's why "no error in the docker log" — there was no error to log; the engine just stopped making progress.

The smoking gun is at 17:39:14 in your log:

INFO 17:39:14 prompt throughput: 4589.4 tok/s   ← prefill ramping
INFO 17:39:14 SpecDecoding: AL 4.00, Accepted: 6, Drafted: 6,
              Per-position rate: 1.000, 1.000, 1.000   ← degenerate state
INFO 17:39:24 prompt 0.0 / gen 0.0   ← frozen
... silence for 5-6 minutes ...
[verify-stress.sh] HTTP 000 (curl timeout)

Per-position rate: 1.000, 1.000, 1.000 is suspicious — every draft accepted, no rejection diversity. Combined with the engine going idle (0 prompt, 0 gen) but holding the request open with stable KV usage, this is a deadlock rather than a slow-but-progressing prefill.

Two candidates:

Hypothesis	Mechanism
NCCL all-reduce deadlock on TP=2 during chunked prefill	At 90K with 8192 chunks, ~12 sequential allreduce rounds. NCCL has known deadlock paths on PCIe/NVLink topology mismatches when one rank's compute lags. NVLink-bonded rigs sometimes hit this on long-prefill + spec-decode combos because MTP draft adds an extra K+1 verify round.
DeltaNet GDN forward kernel hangs on K8V4 KV at ctx ~90K	K8V4 dequant (8-bit K + 4-bit V separate kernels) has different intermediate tensor shapes than uniform fp8 or TQ3. At 90K the GDN forward state buffer may exceed a memory-fence assumption and stall waiting on a copy that never completes. CLIFFS.md says 60K is the single-card wall, but K8V4-specific behavior at 90K on TP=2 hasn't been characterized anywhere.

How to narrow it on your next run

You'll need to capture data at the moment of hang. Three probes, each ~5 min, pick whichever you want to start with:

Probe A — capture GPU state during the hang

When the 90K probe stalls (you'll see prompt 0.0 / gen 0.0 for 30+ seconds in docker logs), run from a separate terminal:

# Snapshot GPU state — both cards
nvidia-smi --query-gpu=index,memory.used,memory.total,utilization.gpu,temperature.gpu --format=csv,noheader

# And pyspy on the worker processes for ~30 s if you have py-spy installed
docker exec vllm-qwen36-27b-dual-turbo bash -c 'ps aux | grep VLLM' | head -5

# Then dump stack from one of the worker PIDs:
docker exec vllm-qwen36-27b-dual-turbo bash -c \
  'pip install py-spy >/dev/null 2>&1; py-spy dump --pid <WORKER_PID>' 2>&1 | head -60

The key signals:

Both GPUs at 100% util, near-full VRAM → all-reduce deadlock (cards spinning on a barrier)
One GPU at 100%, other at 0% → one rank stuck mid-kernel
Both at 0% util → CPU-side hang (less likely)
py-spy stack showing nccl_* calls or all_reduce → NCCL deadlock confirmed
py-spy showing chunk_gated_delta_rule_fwd → DeltaNet GDN hang on K8V4

Probe B — drop spec-decode, retry 90K probe

To isolate spec-decode-vs-prefill mechanism, run verify-stress.sh with MTP off:

In your override compose, replace:

- --speculative-config
- '{"method":"mtp","num_speculative_tokens":3}'

with nothing (just remove those two lines). Re-bench probe 7. If 90K passes without MTP, the bug is in MTP × K8V4 × long-ctx. If it still hangs, MTP is innocent and the issue is K8V4 × long-ctx itself.

Probe C — drop to fp8 KV, retry 90K probe

To isolate K8V4-vs-uniform-fp8 mechanism:

- --kv-cache-dtype
- fp8_e5m2

(replace turboquant_k8v4). Re-bench probe 7. If 90K passes on fp8, the bug is K8V4-specific at long ctx. If it still hangs, it's a generic 90K-on-TP=2-NVLink issue regardless of KV format.

Why "no specific log" is correct here

verify-stress.sh's "Check docker logs" hint is generic — it assumes the failure leaves a trace. Hangs don't leave traces because the engine is still alive, it's just not making progress. The [loggers.py:271] heartbeat at 17:39:24 with prompt 0.0 / gen 0.0 / Running: 1 reqs IS the failure signal — but the script can't read engine state, only HTTP responses. We could improve verify-stress to detect "engine alive but not progressing" via the /metrics endpoint (KV cache usage + running requests), but that's a separate enhancement.

For now: a docker logs heartbeat showing prompt 0.0 / gen 0.0 with Running: 1 reqs for 30+ seconds = stuck. That's the rule of thumb.

Implications for shipping K8V4 as a third variant

Even if A/B/C narrows the cause, the bottom line is K8V4 at 90K on this rig hangs at least once in the verify-stress probe ladder. That doesn't disqualify K8V4 outright (it's already validated cross-rig at 60K with @efschu — the TQ3→fp8 swap rescue on 20 GB Ampere used K8V4-equivalent reasoning), but it does mean:

K8V4 may be the right pick for shorter-ctx + higher-concurrency workloads where 90K probes never fire
TQ3 stays the best choice for long-ctx single-stream because it's been validated through 91K needles cross-rig (Whamp's 4× 3090 verify-stress 7/7 at 91K, @efschu's 91K passing on 20 GB Ampere via fp8 swap, my own dual-turbo bench)

A dual-balanced.yml config might still be a useful third option once we understand whether the hang is NCCL/MTP/K8V4 — but it'd want a constraint like "max_model_len 65536" baked in so the failing 90K class is structurally avoided.

Soak update — soak-helper now flags silent-empty turns

Commit f32d8a6 tightened soak-helper to detect HTTP 200 + 0 completion turns (the silent-empty class from #43, #47, #50). Pull that and re-run SOAK_MODE=continuous bash scripts/soak-test.sh on your K8V4 config — the soak verdict block now reports silent_empty N/M (X.X%) directly. Useful for catching the workload-level failure mode that your 90K hang adjacent-points to.

Whichever probe you pick, please post the data — even one of A/B/C narrows this. And don't worry about disk cost on Probe C; that's just a config flag swap, no image pull needed.

JusefPol · 2026-05-04T18:08:03Z

JusefPol
May 4, 2026
Author

Cool, I will try tomorrow the tests. Do you want me to continue here, or launch everything on an actual github issue?

1 reply

noonghunna May 4, 2026
Maintainer

A formal numbers-from-your-rig issue would be cleaner — that template captures the full rig context (driver, kernel, topology, NVLink state, exact compose) in one place and lets us add it as a tracked BENCHMARKS row if the K8V4 + 131K + concurrency=5 + NVLink config pans out cleanly.

Template link: https://github.com/noonghunna/club-3090/issues/new?template=numbers-from-your-rig.yml

Suggest bundling everything into one issue:

The passing data — what works on your rig (K8V4 + 131K + concurrency=5 + NVLink, with bash scripts/report.sh --bench numbers from the working state)
The failing probe — probe 7 large-rung at scale=1400 (the 90K-tokens hang). Include the full bash scripts/report.sh --verify --stress output + a recipe for how to repro the hang
Whichever of probes A/B/C from my reply above narrows the cause (nvidia-smi during hang, or MTP-off, or fp8-instead-of-K8V4)

That gives us a complete data point — both the working envelope and where it breaks. We can cross-reference this discussion thread from the issue body so the conversation history is preserved.

Discussion is fine if you'd rather keep iterating informally first, but for the eventual BENCHMARKS row + tracked status the issue is the right home.

noonghunna · 2026-05-05T14:35:21Z

noonghunna
May 5, 2026
Maintainer

@JusefPol — continue here for now, please.

The signal-to-noise on a research-mode exploration like "TQ k8v4 middle-ground for 2-card NVLink rigs" is best in a discussion thread where the back-and-forth on tuning constants doesn't pollute the issue tracker's "real bugs to fix" channel. Issues work better when you have either (a) a reproducible failure mode with a fix proposal, or (b) a shippable artifact that needs review.

If your continued testing lands on:

A clean PR-able variant (compose YAML + bench numbers + soak-clean): then a PR with the new compose + a BENCHMARKS row is the right move. Reference this discussion as the design history.
A reproducible failure mode (something specific that breaks on k8v4 + NVLink + 2-card that doesn't break elsewhere): file as a separate issue with the failure shape + reproducer.
Just exploration data (which is what you're doing now): keep it here. Useful for the next person who tries the same path.

Useful context for your next round: Sandermage just shipped v7.72.2 on 2026-05-05 (we've absorbed it into club-3090 PR #59, branch v7.72.2-uplift). That release includes:

PN59 streaming-GDN orchestrator (designed to fix Cliff 2b — though genesis#22 has the cross-rig finding that it doesn't engage on chunked-prefill on Ampere consumer; awaiting Sander's Step 2 fix)
v7.72.1 P68 xgrammar-incompat skip (closes lex's [bug] P68 (auto-force tool_choice=required) makes #45 xgrammar/patternProperties bug fire on long-prompt agentic-IDE requests that would otherwise succeed #57 — same family as the silent-empty grammar-reject we're tracking)
PN35 native (vllm#35975 backport, ~128 MiB headroom recovery on text-only configs)

If you re-run k8v4 testing on the v7.72.2-uplift branch you'll be on a richer set of patches than your earlier run. Worth a re-bench against your own prior numbers — especially the grammar-reject side, which v7.72.1 should have improved on multi-tool catalogs.

When you do file the eventual PR (assuming the variant survives soak-continuous which is the load-bearing gate per docs/CLIFFS.md on Cliff 2b), the Numbers from your rig issue template is a good first step before opening the PR — it captures rig context cleanly and lands the BENCHMARKS row in a structured way.

0 replies

Middle Ground between dual and dual - turbo? #33

Uh oh!

JusefPol May 2, 2026

Replies: 7 comments · 3 replies

Uh oh!

noonghunna May 2, 2026 Maintainer

What we measured

Why we swapped k8v4 → TQ3 on 2026-04-28

What's changed since that decision

What I'd propose

Cross-rig opportunity

Uh oh!

Uh oh!

JusefPol May 3, 2026 Author

Uh oh!

noonghunna May 3, 2026 Maintainer

Uh oh!

JusefPol May 4, 2026 Author

Uh oh!

noonghunna May 4, 2026 Maintainer

On the grammar_matcher.cc:497 warning

On the unspecified test failure

On the K8V4 + 131K + concurrency=5 + NVLink config itself

TL;DR

Uh oh!

JusefPol May 4, 2026 Author

Uh oh!

noonghunna May 4, 2026 Maintainer

What actually happened

How to narrow it on your next run

Probe A — capture GPU state during the hang

Probe B — drop spec-decode, retry 90K probe

Probe C — drop to fp8 KV, retry 90K probe

Why "no specific log" is correct here

Implications for shipping K8V4 as a third variant

Soak update — soak-helper now flags silent-empty turns

Uh oh!

JusefPol May 4, 2026 Author

Uh oh!

noonghunna May 4, 2026 Maintainer

Uh oh!

noonghunna May 5, 2026 Maintainer

JusefPol
May 2, 2026

Replies: 7 comments 3 replies

noonghunna
May 2, 2026
Maintainer

JusefPol
May 3, 2026
Author

noonghunna
May 3, 2026
Maintainer

JusefPol
May 4, 2026
Author

noonghunna
May 4, 2026
Maintainer

On the `grammar_matcher.cc:497` warning

JusefPol May 4, 2026
Author

noonghunna May 4, 2026
Maintainer

JusefPol
May 4, 2026
Author

noonghunna May 4, 2026
Maintainer

noonghunna
May 5, 2026
Maintainer