Releases: joe2gaan/localaiservers
v0.2.0 — ROCm7.2 Dense/MoE GFX906 Active Contracts
v0.2.0-gfx906-rocm72-dense-moe - ROCm7.2 Dense/MoE GFX906 Active Contracts
Release Boundary
This is the published v0.2.0 release boundary for the ROCm7.2 Dense/MoE
GFX906 active-contract milestone.
v0.1.0 remains the earlier Qwen3.6/GFX906/MI50 TP4 published release
baseline. GitHub Releases are canonical for published release claim
boundaries. Docker Hub remains an evergreen artifact distribution channel and
should not be treated as the latest benchmark announcement.
qwen36-gfx906/README.md remains the canonical
deployment and reproducibility package for this release.
These are benchmark and reproducibility artifacts, not general workload
guarantees. The same ROCm7.2 image covers the active dense and MoE contracts
with model-specific environment settings and overlays.
Runtime Image
Image tag:
joe2gaan/localaiservers:qwen36-gfx906-rocm72-dense-moe-runtime-archive-0a2dbd6b7f0b
Docker Hub manifest digest:
sha256:8c380e9ca48943d8617de5a2e2eaf32a26dcc2c341e4b4f4f8c45294a72b8f1e
Active Contracts
Verified portable performance at MAX_MODEL_LEN=131072 with eight pre-measure
warmups.
Dense 27B TP8 Full-BAR/P2P-On
- Profile:
dense27b_tp8_fullbar_p2pon - Host:
.20 - Strict backend TPS:
69.514 c1_2000backend TPS:70.347c1_10000backend TPS:66.069- Strict gate: valid
- ai-info 10K gate: cleared for the recorded active-contract lane
Qwen3.6 35B-A3B MoE TP8 Full-BAR/P2P-On
- Profile:
moe35b_tp8_fullbar_p2pon - Host:
.30 - Strict backend TPS:
94.907 c1_2000backend TPS:97.028c1_10000backend TPS:91.290- Strict gate: valid
- Publication status: strict-valid MoE publication bar
Qwen3.6 35B-A3B MoE TP4 Full-BAR/P2P-On
- Profile:
moe35b_tp4_fullbar_p2pon - Host:
.30 - Strict prompt: invalid/runaway
c1_2000backend TPS:116.146c1_10000backend TPS:109.283- Publication status: capped fixed-token result only, not a strict-valid
publication claim
The TP4 strict prompt did not stop after more than 60K tokens, so TP4 remains
capped-only until the strict runaway behavior is resolved.
Platform Remediation
The full-BAR/P2P-on lane required platform remediation:
- Official AMD VBIOS standardization was required.
- No modified BIOS images were used.
amdgpusource patching was required for the full-BAR/P2P-on lane.- This is not a user instruction to flash cards.
- No BIOS or VBIOS binaries are redistributed.
- No warranty, certification, procurement support, resale support, or hardware
recommendation is implied.
QC Supporting Proof
The MI50/GFX906 VRAM QC field-check tool remains public under
tools/gfx906-mi50-vram-qc/.
Sanitized supporting reports:
- The
.20positive-control report passed 1 GiB smoke, 30 GiB single-device,
and 30 GiB all-device checks. - The 16GB GFX906-class negative-control report passed the 1 GiB smoke check
and failed the 30 GiB check as expected with out-of-memory.
This QC material is educational hardware-verification methodology only, not
certification or warranty.
Links
- Canonical deployment and reproducibility package:
qwen36-gfx906/README.md - ROCm7.2 active-contract notes:
docs/rocm72-dense-moe-active-contracts-20260620.md - Public proof map:
docs/funder-proof-index.md - MI50/GFX906 VRAM QC field-check tool:
tools/gfx906-mi50-vram-qc/
v0.1.0-gfx906-qwen36-mi50 — Reproducible GFX906 Qwen3.6 MI50 Runtime
v0.1.0-gfx906-qwen36-mi50 - Reproducible GFX906 Qwen3.6 MI50 Runtime
Summary
This release summarizes the first public LocalAIServers GFX906 Qwen3.6 MI50 TP4
reproducibility artifact. It is the published v0.1.0 release record for the 90+ TPS
sustained 10K publication baseline.
Public-Benefit Purpose
LocalAIServers is a 501(c)(3) public charity providing public education and open-source
infrastructure for locally hosted AI systems. This artifact helps readers inspect a
reproducible local AI runtime path on affordable AI research hardware.
Hardware Target
- 4x AMD Instinct MI50 32GB.
- GFX906.
- Tensor parallel size: 4.
Runtime Target
- Model:
Qwen/Qwen3.6-35B-A3B. - Runtime: vLLM on ROCm/GFX906.
- Canonical technical deployment package:
qwen36-gfx906/README.md.
Docker/Runtime Artifact
joe2gaan/localaiservers:qwen36-gfx906-c1-topk8-runtime-archive-aa34cb675f83
Docker Hub digest:
sha256:f5e69ee127b766960e386e0e4eda8e26c399bd02f57c494847cb9a92ce04d8ac
Source runtime archive:
aa34cb675f83ff6cade31cbbb357b1c31d793bee18da491f501d7c39fda3612a
This archive SHA is the strict byte-for-byte source-reproduction target for the
published runtime. Source rebuilds should be cited as release-reproduction evidence
only when the exported Docker archive matches this SHA. The live main deploy script
defaults to BYTE_FOR_BYTE_VALIDATION_MODE=auto; set
EXPECTED_REPRO_DOCKER_ARCHIVE_SHA256=aa34cb675f83ff6cade31cbbb357b1c31d793bee18da491f501d7c39fda3612a
when source-reproducing this release so a mismatch fails the source-build path. Use
BYTE_FOR_BYTE_VALIDATION_MODE=0 only for non-canonical local deploys where no
release-reproduction claim is being made. The prebuilt Docker Hub image path is
validated separately by the manifest digest above.
Reference Results
Benchmark artifact:
benchmarks/qwen36-gfx906-mi50-tp4/README.md.
c1_10000: 90+ TPS sustained backend decode publication baseline
This release should be cited as the 90+ TPS sustained 10K publication artifact. It is
not a latest-result announcement. LocalAIServers has reached 95+ TPS on Qwen3.6 10K
decode, but that version is outside this v0.1.0 release and should be cited only after
it is published separately. Use the canonical technical deployment package for
reproduction commands and the exact run context for this release lane.
Source-level GFX906 preservation artifacts
- GFX906 source kernel inventory
- GFX906 key learnings
- Technical progress summary
- Experimental methodology
- Current research roadmap
These artifacts document source-level kernel/runtime preservation work, including MoE
fastpath work, dense RowParallel/RCCL collective-boundary research, rejected paths,
diagnostic lanes, and active source directions.
Reproducibility Contract
The canonical deployment README preserves Docker image identity, Docker Hub digest,
archive hashes, source pins, deploy commands, run commands, benchmark commands, hardware
requirements, and limitations.
Limitations
- Scoped to the documented Qwen3.6 GFX906 MI50 TP4 runtime lane.
- Not a claim about all prompts, workloads, GFX906 systems, or officially supported
performance. - Not a public cloud service or direct public machine-access program.
- Dense model results should not be inferred from this MoE benchmark.
Next Milestones
- Publish more benchmark artifacts.
- Expand QC methodology and hardware verification records.
- Continue source-level GFX906 preservation work.