[Sigmoid][Delta Update][2/N] update delta update api to load original value first before casting to target dtype #167039

jijunyan · 2025-11-05T01:13:10Z

Summary: The current delta update has a strong assumption that the non-lowered weights share the same tensor dtype from the lowered version. This is not true by design. When dtype mismatches the data loading will load the data into unexpected dtype which introduces undefined behavior. This diff aims to close the gap by always load tensor by its original dtype first then cast to desired dtype.

Test Plan:
No more NaN values!

{P2022339213}

Reviewed By: kqfu

Differential Revision: D86181685

pytorch-bot · 2025-11-05T01:13:13Z

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/167039

📄 Preview Python docs built from this PR
📄 Preview C++ docs built from this PR
❓ Need help or want to give feedback on the CI? Visit the bot commands wiki

Note: Links to docs will display an error until the docs builds have been completed.

✅ No Failures

As of commit 0c641c1 with merge base 9eebda9 ():
💚 Looks good so far! There are no failures yet. 💚

This comment was automatically generated by Dr. CI and updates every 15 minutes.

meta-codesync · 2025-11-05T01:13:18Z

@jijunyan has exported this pull request. If you are a Meta employee, you can view the originating Diff in D86181685.

… value first before casting to target dtype (pytorch#167039) Summary: The current delta update has a strong assumption that the non-lowered weights share the same tensor dtype from the lowered version. This is not true by design. When dtype mismatches the data loading will load the data into unexpected dtype which introduces undefined behavior. This diff aims to close the gap by always load tensor by its original dtype first then cast to desired dtype. Test Plan: No more NaN values! {P2022339213} Reviewed By: henryoier, kqfu Differential Revision: D86181685

… value first before casting to target dtype (#167039) Summary: The current delta update has a strong assumption that the non-lowered weights share the same tensor dtype from the lowered version. This is not true by design. When dtype mismatches the data loading will load the data into unexpected dtype which introduces undefined behavior. This diff aims to close the gap by always load tensor by its original dtype first then cast to desired dtype. Test Plan: No more NaN values! {P2025779492} Unit tests: buck test fbcode//mode/dev-nosan -c fbcode.nvcc_arch=a100,h100 fbcode//sigmoid/inference/test_gpu:model_runner_test -- --exact 'sigmoid/inference/test_gpu:model_runner_test - ModelRunnerTest.Basic_AOTInductorCuda_SwapWeights' 2>&1 | tee re_local_benchmark_log.txt buck test fbcode//mode/dev-nosan -c fbcode.nvcc_arch=a100,h100 fbcode//sigmoid/inference/test_gpu:model_runner_test -- --exact 'sigmoid/inference/test_gpu:model_runner_test - ModelRunnerTest.Basic_InterpreterCuda_SwapWeights' 2>&1 | tee re_local_benchmark_log.txt buck test fbcode//mode/dev-nosan -c fbcode.nvcc_arch=a100,h100 fbcode//sigmoid/inference/test_gpu:model_runner_test -- --exact 'sigmoid/inference/test_gpu:model_runner_test - ModelRunnerTest.TestLoadDeltaUpdates' 2>&1 | tee re_local_benchmark_log.txt Reviewed By: henryoier, kqfu Differential Revision: D86181685

jijunyan · 2025-11-06T04:02:33Z

@pytorchbot label "topic: not user facing"

… value first before casting to target dtype (pytorch#167039) Summary: The current delta update has a strong assumption that the non-lowered weights share the same tensor dtype from the lowered version. This is not true by design. When dtype mismatches the data loading will load the data into unexpected dtype which introduces undefined behavior. This diff aims to close the gap by always load tensor by its original dtype first then cast to desired dtype. Test Plan: No more NaN values! {P2025779492} Unit tests: buck test fbcode//mode/dev-nosan -c fbcode.nvcc_arch=a100,h100 fbcode//sigmoid/inference/test_gpu:model_runner_test -- --exact 'sigmoid/inference/test_gpu:model_runner_test - ModelRunnerTest.Basic_AOTInductorCuda_SwapWeights' 2>&1 | tee re_local_benchmark_log.txt buck test fbcode//mode/dev-nosan -c fbcode.nvcc_arch=a100,h100 fbcode//sigmoid/inference/test_gpu:model_runner_test -- --exact 'sigmoid/inference/test_gpu:model_runner_test - ModelRunnerTest.Basic_InterpreterCuda_SwapWeights' 2>&1 | tee re_local_benchmark_log.txt buck test fbcode//mode/dev-nosan -c fbcode.nvcc_arch=a100,h100 fbcode//sigmoid/inference/test_gpu:model_runner_test -- --exact 'sigmoid/inference/test_gpu:model_runner_test - ModelRunnerTest.TestLoadDeltaUpdates' 2>&1 | tee re_local_benchmark_log.txt Reviewed By: henryoier, kqfu Differential Revision: D86181685

facebook-github-bot · 2025-11-06T19:23:21Z

@pytorchbot merge

(Initiating merge automatically since Phabricator Diff has merged)

pytorchmergebot · 2025-11-06T19:25:24Z

Merge started

Your change will be merged once all checks pass (ETA 0-4 Hours).

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging

Check the merge workflow status
here

… value first before casting to target dtype (pytorch#167039) Summary: The current delta update has a strong assumption that the non-lowered weights share the same tensor dtype from the lowered version. This is not true by design. When dtype mismatches the data loading will load the data into unexpected dtype which introduces undefined behavior. This diff aims to close the gap by always load tensor by its original dtype first then cast to desired dtype. Test Plan: No more NaN values! {P2022339213} Reviewed By: kqfu Differential Revision: D86181685 Pull Request resolved: pytorch#167039 Approved by: https://github.com/henryoier

meta-codesync bot added fb-exported meta-exported labels Nov 5, 2025

henryoier approved these changes Nov 5, 2025

View reviewed changes

pytorch-bot bot added the ciflow/trunk Trigger trunk jobs on your pull request label Nov 5, 2025

jijunyan force-pushed the export-D86181685 branch from 5aeacd5 to 5e531c4 Compare November 5, 2025 04:42

jijunyan force-pushed the export-D86181685 branch from 5e531c4 to 2222124 Compare November 6, 2025 03:36

pytorch-bot bot added the topic: not user facing topic category label Nov 6, 2025

jijunyan force-pushed the export-D86181685 branch from 2222124 to 864f9c4 Compare November 6, 2025 06:40

jijunyan force-pushed the export-D86181685 branch from 864f9c4 to 0c641c1 Compare November 6, 2025 06:41

pytorchmergebot added the merging label Nov 6, 2025

pytorchmergebot closed this in c9b2db7 Nov 6, 2025

pytorchmergebot added Merged and removed merging labels Nov 6, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[Sigmoid][Delta Update][2/N] update delta update api to load original value first before casting to target dtype #167039

[Sigmoid][Delta Update][2/N] update delta update api to load original value first before casting to target dtype #167039

Uh oh!

jijunyan commented Nov 5, 2025

Uh oh!

pytorch-bot bot commented Nov 5, 2025 •

edited

Loading

Uh oh!

meta-codesync bot commented Nov 5, 2025

Uh oh!

jijunyan commented Nov 6, 2025

Uh oh!

facebook-github-bot commented Nov 6, 2025

Uh oh!

pytorchmergebot commented Nov 6, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

[Sigmoid][Delta Update][2/N] update delta update api to load original value first before casting to target dtype #167039

[Sigmoid][Delta Update][2/N] update delta update api to load original value first before casting to target dtype #167039

Uh oh!

Conversation

jijunyan commented Nov 5, 2025

Uh oh!

pytorch-bot bot commented Nov 5, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/167039

✅ No Failures

Uh oh!

meta-codesync bot commented Nov 5, 2025

Uh oh!

jijunyan commented Nov 6, 2025

Uh oh!

facebook-github-bot commented Nov 6, 2025

Uh oh!

pytorchmergebot commented Nov 6, 2025

Merge started

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

pytorch-bot bot commented Nov 5, 2025 •

edited

Loading