Avoid re-computing computation hashes (#8976) #8977

rpsilva-aws · 2025-04-15T19:12:25Z

Currently, we are recomputing the hash of the underlying computation for every hash lookup, as a mere logging in two places. For small models where tracing is not negligible, this can have a small impact - particularly since we deserialize the protobuf deterministically (requiring the ordering of unordered dictionary/map entries). The logging was unchanged, but the underlying deserialization logic is relatively slower, in order to guarantee deterministic hashes for user computations. C++'s evaluates stream operators eagerly, so the impact is there with or without the logging levels. We recently saw ~10% throughput impact for small BERT/Llama models.

Note that this is only used to provide an unique hash string for which a hash key maps to. The actual hash of the protobuf is only meaningful for UserComputation computations, where it is factored in the hash key. In all other cases, it is unnecessary and serves as an unique (debug) identifier, and the user can still verify the mapping for any given graph hash key when enabling post_compilation_analysis.

We see this during hash lookup, which is evaluated every time. We also see it in Compile, though it is there only for the very first computation (across all instances). The user can still access the computation proto hash by enabling PT_XLA_DEBUG.

e.g. for BERT HF pretraining (20 steps) - 48 metrics with 27 samples each, the collective tracing of each hash computation metric is as follows:

- Average Rate: ~1.98 operations/second
- Most rates fall between 1.4-2.5 ops/second with a few outliers
- Highest Rate: 7.26772 ops/second (outlier)
- Lowest Rate: ~1.42 ops/second

- Typical p50 (median) latency per op: ~8-9 microseconds
- Typical p95 latency per op: ~450-500 microseconds
- Typical p99 latency per op: ~500-600 microseconds

tengyifei · 2025-04-16T04:34:40Z

@zpcore fyi

Avoid re-computing computation hashes (pytorch#8976)

58bba62

rpsilva-aws mentioned this pull request Apr 15, 2025

2.7 backport PR request list #8829

Closed

rpsilva-aws marked this pull request as ready for review April 15, 2025 20:17

jeffhataws requested review from alanwaketan, jeffhataws, miladm and tengyifei and removed request for tengyifei April 16, 2025 04:04

tengyifei requested a review from zpcore April 16, 2025 04:34

tengyifei approved these changes Apr 16, 2025

View reviewed changes

zpcore approved these changes Apr 16, 2025

View reviewed changes

zpcore merged commit 65fc779 into pytorch:r2.7 Apr 16, 2025
22 of 23 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Avoid re-computing computation hashes (#8976) #8977

Avoid re-computing computation hashes (#8976) #8977

Uh oh!

rpsilva-aws commented Apr 15, 2025

Uh oh!

tengyifei commented Apr 16, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Uh oh!

Avoid re-computing computation hashes (#8976) #8977

Avoid re-computing computation hashes (#8976) #8977

Uh oh!

Conversation

rpsilva-aws commented Apr 15, 2025

Uh oh!

tengyifei commented Apr 16, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants