Welcome — club-3090 is open to all CUDA hardware (and why we kept the name) #17
Locked
noonghunna
announced in
Announcements
Replies: 0 comments
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
Uh oh!
There was an error while loading. Please reload this page.
-
👋 Welcome to club-3090 — a place to share, debug, and squeeze every last token-per-second out of whatever GPU you happen to have.
What "club-3090" actually means
We've had several people ask in the last 48 hours whether they belong here on a 4090 (#27), a 5090 (#30), 2× modded 3080 20GB (#26), 2× 3090 with NVLink (#29), or 4× 3090 (#25). Worth saying directly:
The 3090 holds a unique place in 2026: it's the last GPU class still affordable for someone getting started with serious AI inference (~$700–900 used vs $2–3K for a 4090, $3–4K for a 5090, $30K+ for an H100). Naming the repo after it honors that. It's the card you can buy to learn this stack on without taking a loan.
But this repo is about methodology — TQ3 KV / MTP K=3 / Genesis patches / Cliff 1+2 closures / verify-stress as ground truth — and that methodology applies on every CUDA card from SM 8.0 (A100) through SM 12.0 (5090) and Blackwell datacenter. We just ship default configs tuned for what most people own. You belong here on whatever you have.
What we're building
A reproducible single-shot install + bench harness for serving Qwen3.6-27B (today; more models welcome) on consumer-and-prosumer NVIDIA hardware. Highlights:
<think>blocks — ~30× compression with iso-or-better accuracy on coding benchmarks; see docs/STRUCTURED_COT.mddual-dflash.yml)Hardware diversity is the point — please volunteer
@Sandermage (author of genesis-vllm-patches, the single biggest reason this stack performs the way it does) explained openly in #27 that he only has A5000 cards — GPU prices have made wider hardware unaffordable. That's why our cross-rig contribution loop matters: patches written against A5000 PROD get validated on whatever YOU have, bugs surface that pure-A5000 testing never would, and the stack becomes honest across the SM family.
Hardware classes we'd love to characterize (open a discussion + post your
verify-stress.shresults):SM86 (Ampere — Genesis is most mature here):
SM89 (Ada Lovelace — sister tree we want to grow):
SM90+ (Hopper / Blackwell — niche but open to it):
If you have something not on this list, post anyway — we'll figure out which class it falls in.
Sharing rig data —
scripts/report.shcaptures everything in one passBefore posting hardware questions, benchmark numbers, or bug reports, run the diagnostic helper:
Single command captures everything we'd otherwise ask for individually:
Paths under your home, hostname, username, and HF tokens are redacted by default. Use
--no-redactonly for internal sharing.Paste the resulting markdown into your issue (for bug reports + benchmark data) or discussion (for design / hardware questions / introductions) — readers click through
<details>collapsibles for the noisy bits, surface info stays scannable. This standardizes cross-rig data so benchmark rows compose cleanly intoBENCHMARKS.md.Three tiers of how to contribute
🟢 Low-effort (~15 min): Run the canonical install + diagnostic dump:
File
my-rig.mdas a Numbers from your rig issue. High-signal contributions become BENCHMARKS.md rows with credit.🟡 Medium-effort: Bench a non-default compose variant (
dual-turbo.yml,dual-dflash.yml,bounded-thinking.yml) on your rig:File
my-rig-bench.mdvia the same template; we co-author the per-config tuning row from your numbers.🔴 High-effort: Open a PR with a tested compose variant for your topology — e.g.
dual-nvlink.yml,quad-3090.yml,4090.yml,5090.yml. Include thereport.sh --fulloutput in the PR description. Credit goes in the file header.How this place works — issues vs discussions
Two channels with different shapes; picking the right one keeps the right people seeing the right thing:
🐛 Issues are for:
report.shdumps — use the bug templateIssues have proper search surface, labels, assignees, milestones. Future readers hitting the same symptom find them.
💬 Discussions are for:
Please don't paste log dumps / tracebacks /
report.shoutput into discussions. It buries the bug under non-bug content (and the reverse: a bug-shaped problem inside a "great success!" discussion thread doesn't reach the issues triage queue). If a discussion accumulates a log dump or a reproducer, we may ask you to fork the bug-shaped piece into a fresh issue with a cross-link back — not banishment, just keeping each channel doing what it's good at.🔧 PRs are for: tested compose variants, doc improvements, scripts. See CONTRIBUTING.md.
❤️ Be open-minded and welcoming. This is a community we build together, and the diversity of hardware we cover is directly what makes the stack honest.
Welcome aboard. Drop a comment with what you're running and what you're working on.
— @noonghunna
Beta Was this translation helpful? Give feedback.
All reactions