Replies: 3 comments 1 reply
-
|
@Regen-dev — fair, the name is a bit of a 3090-supremacy flex. But for the practical question (will this work on a 4090?) — almost certainly yes, with one caveat: What carries over to 4090 directly
The one caveat — kernels tuned for SM864090 is Ada Lovelace SM 8.9, 3090 is Ampere consumer SM 8.6. Our shipped configs include:
Practical answerIt should boot and work. Performance probably within ±5% of 3090 numbers. If anyone with a 4090 wants to run Alternative we've considered: a For now, give it a shot: git clone https://github.com/noonghunna/club-3090
cd club-3090
bash scripts/setup.sh qwen3.6-27b
bash scripts/launch.sh # interactive wizardIf you hit anything that doesn't transfer cleanly, open an issue. 4090-specific data points we don't have are exactly the kind of thing that justifies expanding this beyond the 3090 family. |
Beta Was this translation helpful? Give feedback.
-
|
The reality is much more straightforward for me... I currently only have A5000 cards available, which is why I built my project around them. Since GPU prices have skyrocketed, I have neither the desire nor the budget to buy them at 5x their actual value. That’s why my focus has been entirely on the Ampere architecture — to squeeze the absolute maximum out of it. Adding patches and optimizations for Ada and Blackwell isn't a problem. The main issue is simply that I don't have that hardware. Once I actually get my hands on those cards, I'll be able to move beyond theory. Through practical testing and debugging, I'll be able to identify all the inaccuracies and bugs specific to those generations and make the patch much more universal |
Beta Was this translation helpful? Give feedback.
-
|
@Sandermage — thank you. That framing is exactly the reality of small-team open-source kernel work, and worth saying out loud. A5000-only, no budget to chase 5×-overpriced cards, full focus on squeezing the absolute maximum out of Ampere — that's not a limitation, it's the discipline that makes this stack actually exist. The SM86 (Ampere consumer + workstation) class is the largest still-affordable, still-getting-cards-into-hands tier in 2026. PN26b sparse-V is the first sparse-V kernel for SM86 in any public tree. That happened because you focused. The cross-rig data we got just this week, on the back of that focusIn the last 48 hours alone, on the back of the v7.66 announcement:
Without owning any of that hardware, you've shipped patches that all 4 of these users are validating within days of pulling. That's the cross-rig contribution loop working as intended. The ask — more volunteers, more hardware classesFor anyone reading this thread who has hardware we haven't characterized, we want your data: SM86 family (where Genesis is most mature):
SM89 family (Ada Lovelace) — we want a sister tree:
SM90+ (Hopper / Blackwell) — niche but valuable:
What "volunteering" actually meansLow-effort: just run
Higher-effort: bench a Highest-effort: open a PR with a tested compose variant for your topology ( The whole reason this stack ships with the breadth it does — TQ3 KV, MTP K=3, sparse-V SM86, structured-CoT, DFlash, vLLM v0.20 + v0.21 dual-pin, Cliff 1 + Cliff 2 closures — is because Sander writes patches against his A5000 PROD, we cross-validate on 3090, and bugs surface that pure-A5000 testing would never catch (TP=1 worker-spawn registration, single-card prefill cliffs at 50K, NVLink vs PCIe collective paths). More hardware classes = more bugs surfaced = better stack for everyone, including the eventual SM89/SM90 users who haven't shown up yet. If you have a card we haven't characterized — even if it's "just a 3090 with a different motherboard" — please open a discussion. The cross-rig contribution loop is the model that's making this stack honest, and it scales directly with how many of you opt in. |
Beta Was this translation helpful? Give feedback.
Uh oh!
There was an error while loading. Please reload this page.
-
why we don't have a 4090 club?
Beta Was this translation helpful? Give feedback.
All reactions