Welcome — club-3090 is open to all CUDA hardware (and why we kept the name) #17

noonghunna · 2026-04-30T22:16:56Z

noonghunna
Apr 30, 2026
Maintainer

👋 Welcome to club-3090 — a place to share, debug, and squeeze every last token-per-second out of whatever GPU you happen to have.

What "club-3090" actually means

We've had several people ask in the last 48 hours whether they belong here on a 4090 (#27), a 5090 (#30), 2× modded 3080 20GB (#26), 2× 3090 with NVLink (#29), or 4× 3090 (#25). Worth saying directly:

The name is a tribute, not a hardware filter.

The 3090 holds a unique place in 2026: it's the last GPU class still affordable for someone getting started with serious AI inference (~$700–900 used vs $2–3K for a 4090, $3–4K for a 5090, $30K+ for an H100). Naming the repo after it honors that. It's the card you can buy to learn this stack on without taking a loan.

But this repo is about methodology — TQ3 KV / MTP K=3 / Genesis patches / Cliff 1+2 closures / verify-stress as ground truth — and that methodology applies on every CUDA card from SM 8.0 (A100) through SM 12.0 (5090) and Blackwell datacenter. We just ship default configs tuned for what most people own. You belong here on whatever you have.

What we're building

A reproducible single-shot install + bench harness for serving Qwen3.6-27B (today; more models welcome) on consumer-and-prosumer NVIDIA hardware. Highlights:

vLLM v0.20 + Genesis patches + TurboQuant 3-bit KV + MTP K=3 spec-decode — see docs/SINGLE_CARD.md and docs/DUAL_CARD.md for current TPS / VRAM / context numbers per compose
Cliff 1 (TQ tool-prefill OOM) + Cliff 1 mech B (inductor compile-path FFN leak) — closed across all 4 TQ3 composes via cross-rig Genesis patches; see docs/CLIFFS.md
PN26b sparse-V Triton kernel — first publicly-shipped sparse-V kernel for SM86 (Ampere consumer)
Structured-CoT with grammar-bounded <think> blocks — ~30× compression with iso-or-better accuracy on coding benchmarks; see docs/STRUCTURED_COT.md
DFlash spec-decode alternative engine for narrative-heavy workloads (dual-dflash.yml)
Per-config bench numbers, AL, VRAM, gotchas, and rationale documented in each compose YAML header.

Hardware diversity is the point — please volunteer

@Sandermage (author of genesis-vllm-patches, the single biggest reason this stack performs the way it does) explained openly in #27 that he only has A5000 cards — GPU prices have made wider hardware unaffordable. That's why our cross-rig contribution loop matters: patches written against A5000 PROD get validated on whatever YOU have, bugs surface that pure-A5000 testing never would, and the stack becomes honest across the SM family.

Hardware classes we'd love to characterize (open a discussion + post your verify-stress.sh results):

SM86 (Ampere — Genesis is most mature here):

1× / 2× / 4× 3090 (PCIe OR NVLink — both wanted)
1× / 2× A5000 / A6000 (cross-validation from non-Sander rigs)
1× / 2× A4000 / A4500
3090 Ti / 3080 / 3080 Ti
Modded 3080 20 GB

SM89 (Ada Lovelace — sister tree we want to grow):

4090 (Cliff 2 likely closes on 24 GB + better mem bandwidth)
4080 Super / 4070 Ti Super
L40 / L4 (datacenter Ada)

SM90+ (Hopper / Blackwell — niche but open to it):

H100 / H200 (FA3/FA4 paths we don't currently exercise)
5090 / 5080 (32 GB, native FP8, FA3 — most cliffs probably disappear)
6000 Ada

If you have something not on this list, post anyway — we'll figure out which class it falls in.

Sharing rig data — `scripts/report.sh` captures everything in one pass

Before posting hardware questions, benchmark numbers, or bug reports, run the diagnostic helper:

bash scripts/report.sh > my-rig.md              # rig state (~2 sec)
bash scripts/report.sh --verify > my-rig.md     # + verify-full output (~2 min)
bash scripts/report.sh --bench > my-rig.md      # + canonical TPS bench (~5 min)
bash scripts/report.sh --full > my-rig.md       # all of the above

Single command captures everything we'd otherwise ask for individually:

GPU details: model, VRAM, driver, CUDA, power caps + default + max + draw + ⚠ flag if user-capped, NVLink topology
OS + system: distro, kernel, WSL/VM detection, CPU, RAM, swap, disk + filesystem types
Container runtime: Docker version, NVIDIA Container Toolkit, compose
Stack version: club-3090 commit + branch + dirty-tree warning, GENESIS_PIN default + env override, cached vLLM image SHAs
Active container: name, status, Genesis Results banner, local sidecar status, KV pool sizing, full engine config (CLI flags + speculative_config), warnings/errors, full boot log

Paths under your home, hostname, username, and HF tokens are redacted by default. Use --no-redact only for internal sharing.

Paste the resulting markdown into your issue (for bug reports + benchmark data) or discussion (for design / hardware questions / introductions) — readers click through <details> collapsibles for the noisy bits, surface info stays scannable. This standardizes cross-rig data so benchmark rows compose cleanly into BENCHMARKS.md.

Three tiers of how to contribute

🟢 Low-effort (~15 min): Run the canonical install + diagnostic dump:

git clone https://github.com/noonghunna/club-3090
cd club-3090
bash scripts/setup.sh qwen3.6-27b
docker compose -f models/qwen3.6-27b/vllm/compose/docker-compose.yml up -d
bash scripts/report.sh --verify > my-rig.md   # captures rig state + verify-full

File my-rig.md as a Numbers from your rig issue. High-signal contributions become BENCHMARKS.md rows with credit.

🟡 Medium-effort: Bench a non-default compose variant (dual-turbo.yml, dual-dflash.yml, bounded-thinking.yml) on your rig:

bash scripts/switch.sh vllm/dual-turbo                       # or whichever variant
bash scripts/report.sh --bench > my-rig-bench.md             # rig state + canonical TPS

File my-rig-bench.md via the same template; we co-author the per-config tuning row from your numbers.

🔴 High-effort: Open a PR with a tested compose variant for your topology — e.g. dual-nvlink.yml, quad-3090.yml, 4090.yml, 5090.yml. Include the report.sh --full output in the PR description. Credit goes in the file header.

How this place works — issues vs discussions

Two channels with different shapes; picking the right one keeps the right people seeing the right thing:

🐛 Issues are for:

Bug reports with logs / tracebacks / report.sh dumps — use the bug template
Cross-rig bench data — use the Numbers from your rig template
Reproducible failures, regressions, anything with an open/closed state to track

Issues have proper search surface, labels, assignees, milestones. Future readers hitting the same symptom find them.

💬 Discussions are for:

Introductions ("first run, here's my rig")
Hardware questions ("does X work on club-3090?")
Design proposals ("could we add Y?")
"Is this expected?" before you have logs / a clear reproducer
Open-ended community Q&A

Please don't paste log dumps / tracebacks / report.sh output into discussions. It buries the bug under non-bug content (and the reverse: a bug-shaped problem inside a "great success!" discussion thread doesn't reach the issues triage queue). If a discussion accumulates a log dump or a reproducer, we may ask you to fork the bug-shaped piece into a fresh issue with a cross-link back — not banishment, just keeping each channel doing what it's good at.

🔧 PRs are for: tested compose variants, doc improvements, scripts. See CONTRIBUTING.md.

❤️ Be open-minded and welcoming. This is a community we build together, and the diversity of hardware we cover is directly what makes the stack honest.

Welcome aboard. Drop a comment with what you're running and what you're working on.

— @noonghunna

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Welcome — club-3090 is open to all CUDA hardware (and why we kept the name) #17

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{editor}}'s edit

{{editor}}'s edit

Uh oh!

Replies: 0 comments

Select a reply

Uh oh!

Welcome — club-3090 is open to all CUDA hardware (and why we kept the name) #17

Uh oh!

Uh oh!

noonghunna Apr 30, 2026 Maintainer

What "club-3090" actually means

What we're building

Hardware diversity is the point — please volunteer

Sharing rig data — scripts/report.sh captures everything in one pass

Three tiers of how to contribute

How this place works — issues vs discussions

Replies: 0 comments

noonghunna
Apr 30, 2026
Maintainer

Sharing rig data — `scripts/report.sh` captures everything in one pass