Hardware mention for dual modded 3080s #25
Replies: 13 comments 7 replies
-
|
@troymroberts — yes, dual modded 3080 20GB pairs are squarely in scope and you're right that this method should fully apply. They share the relevant hardware properties:
If you do try the install: bash scripts/setup.sh qwen3.6-27b
# For the dual-3080 case, recommend starting at:
bash scripts/switch.sh vllm/dual # 262K target — will fail engine pre-check, will report estimated maxThe engine pre-check will print the real max your config supports. Drop Happy to add a hardware mention in 2× modded 3080 20GB is a meaningful coverage point for the SM86 / 40 GB combined memory class — would be the first published data we have outside the 3090 family. |
Beta Was this translation helpful? Give feedback.
-
|
Hey there. yes I fully intend to unlock the power scaling. I have them in a dell precision 9820 and i didn't get the correct dell GPU power splitter cables that I ordered, so each gpu only has 1x 8 pin power connector installed for now. When i get the correct cables i will bench at full power. But thanks for the repo it was very helpful. I'm excited to see how far these cards can be pushed. i.e. if they ever get around to implementing Pflash into vllm that could meaningfully reduce wall clocks even more. |
Beta Was this translation helpful? Give feedback.
-
|
running 2x3080 @ 320W PL ========== NARRATIVE (prompt=65 chars, max_tokens=1000) ========== === measured (5) === === summary [narrative] (n=5) === ========== CODE (prompt=78 chars, max_tokens=800) ========== === measured (5) === === summary [code] (n=5) === === GPU state === K8V4 |
Beta Was this translation helpful? Give feedback.
-
|
I'm sorry, it was late yesterday and I started the wrong docker container to bench. I started "normal dual" one with fp8_e5m2 instead of turbo. Here the numbers for dual-turbo.yml with gpu-memory-utilization "0.82" and turboquant_3bit_nc This run I used dual-turbo.yml with gpu-memory-utilization "0.82" and kv-cache-dtype turboquant_k8v4: HEAD 95b2c3b |
Beta Was this translation helpful? Give feedback.
-
|
No worries @efschu — easy to mix up at 11pm. Thanks for the correction. Walking back my earlier reply that framed your first post as K8V4 evidence — that was on me; I should have looked at the actual config you ran rather than the bottom-of-output annotation. Reframing with the corrected attribution:
Two interesting findings from the corrected data: 1. fp8 ≈ TQ3 on your 320W 3080 rig. 75.4 vs 76.4 narr — essentially identical within run-to-run variance. On our 230W 3090 baseline, fp8 (69) vs TQ3 (54) is a 27% gap. Why the difference? Hypotheses worth considering:
If we get a third data point at a different power level (e.g. 230W on your 3080 rig), it'd disambiguate. No pressure — your 320W run is already valuable cross-rig data. 2. The K8V4 question we discussed in #33 is still untested. I'd thought your data was evidence; it wasn't. We don't ship a K8V4 dual variant; my proposal there was to author one ( Quick docs-action: getting your data into HARDWARE.mdBoth your runs are valuable. To turn them into a proper credited row in git pull origin master # gets the new scripts/report.sh diagnostic helper
bash scripts/report.sh --bench > my-rig.mdThat captures hardware (incl. PCIe lane width per card — relevant on multi-card boards), full env, AND runs the canonical bench in one pass. Paste the contents into a comment here and I'll add a 2× 3080 @ 320W row to HARDWARE.md crediting you. |
Beta Was this translation helpful? Give feedback.
-
|
pulled, now container not starging anymore: |
Beta Was this translation helpful? Give feedback.
-
|
@efschu — appreciate the immediate flag. Three things going on; need a bit more log to disambiguate. What I see in your outputThree Genesis dispatcher v2 validator errors: These are real config conflicts on our side. Our
What I don't know yetWhether the validator errors are blocking boot on your rig, OR whether they're informational warnings and the actual failure happens downstream. Reason I'm uncertain: I booted Could you share the boot log after the second apply-matrix print? Specifically the lines between: ...and whatever comes next (Application startup, EngineCore failed, ValueError, etc.). Easiest: docker logs vllm-qwen36-27b-dual-turbo 2>&1 | tail -100In the meantime — quick workaroundIf you want to unblock locally and try booting again, edit - GENESIS_ENABLE_P65_TURBOQUANT_SPEC_CG_DOWNGRADE=1
- GENESIS_ENABLE_P85=1That keeps If boot still fails after that edit, the validator errors weren't the cause — and your On our sideIndependent of your reproduction — our shipped composes have outdated env-var bundles vs Genesis v7.69's dispatcher v2. Filing a fix to drop P65 + P85 from our composes (P67 is strictly better; P85 needs P84 which we don't enable). Will land regardless of whether the validators are the blocker on your specific case. Thanks for catching this immediately — these env-var conflicts would have bitten every dual-turbo / long-* user once they pulled latest. Your bench data tonight unblocks a real fix on our side. |
Beta Was this translation helpful? Give feedback.
-
|
still crashing went back to 95b2c3b, upgraded report.sh and bench.sh to current version |
Beta Was this translation helpful? Give feedback.
-
|
@efschu — your visible log ends mid-dispatcher-matrix, which means the container died WHILE Genesis was still printing the apply table. We need the lines below the matrix to see the actual crash. Two diagnosis paths, do whichever is convenient: Path A — full log (most informative)docker logs vllm-qwen36-27b-dual-turbo > /tmp/dual-turbo.log 2>&1
wc -l /tmp/dual-turbo.logIf it's <1000 lines, paste the whole thing. If larger, grep the tail past the matrix: sed -n '/Genesis Results:/,$p' /tmp/dual-turbo.log
# OR if no 'Genesis Results' line yet (matrix is the last thing printed):
sed -n '/apply matrix:/,$p' /tmp/dual-turbo.logThat reveals whether (a) Genesis crashed mid-apply on a specific patch, (b) Genesis finished but vLLM aborted on the model load / CUDA init / cudagraph capture, or (c) something else entirely. Path B — verify your stack is in sync firstThe dispatcher v2 conflicts I diagnosed earlier (P65/P67/P85) shipped fixes on master 2026-05-03 (commit Today we also shipped bash scripts/update.sh
# → bails if you have local edits (commit/stash them first)
# → git pull --ff-only origin master
# → re-runs setup.sh qwen3.6-27b (re-pins Genesis tree to declared GENESIS_PIN)This handles the most common failure mode: "pulled but didn't re-run setup, Genesis tree is stale, env vars expect new dispatcher behavior, container crashes." Worth running before further investigation — eliminates a class of false positive. If Also helpful
For boot UX: also today shipped a Sorry for the regression. The dispatcher v2 strict-validation in v7.69 made our compose env-var combos more fragile, and the cleanup commits landed late. |
Beta Was this translation helpful? Give feedback.
-
|
Heads-up @efschu — separate from your dual-turbo boot crash above (still need that full log to diagnose), one finding worth flagging in case it intersects: we just reproduced @GuiPerPT's club-3090#41 Cliff 2 OOM on single-card traffic. Both shipping single-card variants fail at ~21-26K accumulated context. Detailed writeup: #41 comment. For your dual-3080 setup running dual-turbo (TP=2): activation should be split across both cards, so this single-card ceiling probably doesn't affect you directly. If you can get past the boot crash and run Codex investigation queued for the kernel-level fix. Updates here as we learn more. |
Beta Was this translation helpful? Give feedback.
-
|
Solved by reading this line: Edit: === measured (5) === === summary [narrative] (n=5) === ========== CODE (prompt=78 chars, max_tokens=800) ========== === measured (5) === === summary [code] (n=5) === === GPU state === is working again |
Beta Was this translation helpful? Give feedback.
-
|
but last check failes: |
Beta Was this translation helpful? Give feedback.
-
|
@efschu — that probe 7 fail at 90K matches your full bug report at #47 (same rig, same finding). Just answered the mechanism + recommended workarounds there in detail. TL;DR for this discussion thread: 90K is at or past the Cliff 2 boundary on 2× 20 GB Ampere even on TP=2 with What you've found defines the working envelope precisely: passes 60K, fails 90K on Continuing on #47 for the bug-tracking thread. |
Beta Was this translation helpful? Give feedback.
Uh oh!
There was an error while loading. Please reload this page.
-
I have dual modded 3080's (20gb each for 40gb vram total), so they should also fully support this method (going to try now actually)
So i feel like they should get a hardware mention.
Beta Was this translation helpful? Give feedback.
All reactions