Replies: 6 comments 2 replies
-
|
@Whamp — yes, definitely interested. 4× 3090 unlocks things 1×/2× can't, and we have a wishlist of bench data points we can't generate ourselves: What 4× 3090 can do that 2× can't
What we'd love your help withIn rough priority order:
Best path forward: git clone https://github.com/noonghunna/club-3090
cd club-3090
bash scripts/setup.sh qwen3.6-27b
# Then either: edit dual.yml to TP=4, or open a PR adding a dual4.yml variantHappy to merge any working compose variant + bench data as a PR. The hardware-mention docs ( Welcome aboard. The cross-rig data is exactly what makes this stack honest. |
Beta Was this translation helpful? Give feedback.
-
|
PR opened: #44 Validated the TP=4 fp8/MTP baseline from this thread on my 4× RTX 3090 PCIe rig:
Interpretation: TP=4 gives the expected Cliff 2 margin and full-context concurrency headroom, but single-stream TPS is lower than TP=2 on PCIe-only allreduce. I framed |
Beta Was this translation helpful? Give feedback.
-
|
Follow-up: I added the TP=4 DFlash experiment to the same PR rather than splitting it out: #44
This looks shippable as a 4-card full-262K code-heavy variant: slower than the 2-card DFlash variants for raw single-stream TPS, but much faster than |
Beta Was this translation helpful? Give feedback.
-
|
Quick update for anyone landing here from search: @Whamp's PR (#44) is in review and now adds TWO 4×3090 variants:
Both pass PR is not yet merged — waiting on the canonical rig report ( Variant picks (current best understanding):
Massive thanks @Whamp — cross-rig data of this rigour is what makes club-3090 honest. |
Beta Was this translation helpful? Give feedback.
-
|
Github makes it hard to follow these conversations once started. |
Beta Was this translation helpful? Give feedback.
-
|
@clort81 — fair point. If you have GitHub Copilot enabled on your account, the discussions view has a "summarize this discussion" feature that's designed for exactly this — the spark icon at the top of the discussion (or If you don't have Copilot, the TL;DR for this thread:
The canonical state lives in |
Beta Was this translation helpful? Give feedback.
Uh oh!
There was an error while loading. Please reload this page.
-
I love the work you're doing here. I run a 4x3090 rig though and would love to leverage some of the best ideas being developed here and contribute some of my own. Happy to use my rig for testing some of these things too to provide answers for others. Just wanted to check to see if this would be something you're interested in adding to club-3090
Beta Was this translation helpful? Give feedback.
All reactions