Skip to content

update b200 peak memory bandwidth#4002

Merged
danielvegamyhre merged 1 commit intomainfrom
specmarch5
Mar 5, 2026
Merged

update b200 peak memory bandwidth#4002
danielvegamyhre merged 1 commit intomainfrom
specmarch5

Conversation

@danielvegamyhre
Copy link
Copy Markdown
Contributor

@danielvegamyhre danielvegamyhre commented Mar 5, 2026

This twitter post claimed B200 memory bandwidth is actually 7680gbps, not 8192gbps, as the original bitwidth of the memory bus reported by CUDA drivers has been updated/corrected from 8192 bits to 7680 bits.

I confirmed this claim using Claude to write a simple CUDA C++ file to query the device driver:

=== Device 0: NVIDIA B200 ===
Memory Bus Width: 7680 bits
Total Global Memory: 178.35 GB
L2 Cache Size: 126.50 MB
Compute Capability: 10.0
Number of SMs: 148
Warp Size: 32
Max Threads per Block: 1024
Max Threads per SM: 2048
Shared Memory per Block: 48.00 KB
Shared Memory per SM: 228.00 KB
Registers per Block: 65536
Registers per SM: 65536
ECC Enabled: Yes

And nvidia-smi to get the memory clock frequency:

nvidia-smi --query-gpu=clocks.mem,clocks.max.mem --format=csv

clocks.current.memory [MHz], clocks.max.memory [MHz]
3996 MHz, 3996 MHz

Bandwidth = (7680bits/8bits per byte)*(3996 MHz memory clock) * 2 DDR / 1e12 = 7.672 TB/s

This PR updates our benchmark/roofline utils accordingly.

@pytorch-bot
Copy link
Copy Markdown

pytorch-bot bot commented Mar 5, 2026

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/ao/4002

Note: Links to docs will display an error until the docs builds have been completed.

⏳ No Failures, 9 Pending

As of commit 1bfdd0c with merge base d6d423e (image):
💚 Looks good so far! There are no failures yet. 💚

This comment was automatically generated by Dr. CI and updates every 15 minutes.

@meta-cla meta-cla bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Mar 5, 2026
@danielvegamyhre danielvegamyhre requested review from drisspg and vkuzo March 5, 2026 18:12
@danielvegamyhre danielvegamyhre added topic: bug fix Use this tag for PRs that fix bugs topic: for developers Use this tag if this PR is mainly developer facing labels Mar 5, 2026
# https://resources.nvidia.com/en-us-blackwell-architecture, page 20
# 8.0 TB per second
"peak_mem_bw_bytes_sec": 8.0e12,
# (7680 memory bus bitwdith / 8 bits per byte) * (3996 MHz memory clock) * 2 DDR
Copy link
Copy Markdown
Contributor

@vkuzo vkuzo Mar 5, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this should definitely keep the source from nvidia, and also cite the additional source you are mentioning in the PR summary

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated

@vkuzo
Copy link
Copy Markdown
Contributor

vkuzo commented Mar 5, 2026

lg once source is updated!

@danielvegamyhre danielvegamyhre added the module: core changes affecting multiple modules, e.g. base config/tensor, observers, quant ops label Mar 5, 2026
@danielvegamyhre danielvegamyhre merged commit df68b82 into main Mar 5, 2026
22 of 23 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. module: core changes affecting multiple modules, e.g. base config/tensor, observers, quant ops topic: bug fix Use this tag for PRs that fix bugs topic: for developers Use this tag if this PR is mainly developer facing

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants