Skip to content

Conversation

@jijunyan
Copy link
Contributor

@jijunyan jijunyan commented Nov 5, 2025

Summary: The current delta update has a strong assumption that the non-lowered weights share the same tensor dtype from the lowered version. This is not true by design. When dtype mismatches the data loading will load the data into unexpected dtype which introduces undefined behavior. This diff aims to close the gap by always load tensor by its original dtype first then cast to desired dtype.

Test Plan:
No more NaN values!

{P2022339213}

Reviewed By: kqfu

Differential Revision: D86181685

@pytorch-bot
Copy link

pytorch-bot bot commented Nov 5, 2025

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/167039

Note: Links to docs will display an error until the docs builds have been completed.

✅ No Failures

As of commit 0c641c1 with merge base 9eebda9 (image):
💚 Looks good so far! There are no failures yet. 💚

This comment was automatically generated by Dr. CI and updates every 15 minutes.

@meta-codesync
Copy link

meta-codesync bot commented Nov 5, 2025

@jijunyan has exported this pull request. If you are a Meta employee, you can view the originating Diff in D86181685.

@pytorch-bot pytorch-bot bot added the ciflow/trunk Trigger trunk jobs on your pull request label Nov 5, 2025
jijunyan added a commit to jijunyan/pytorch that referenced this pull request Nov 5, 2025
… value first before casting to target dtype (pytorch#167039)

Summary:

The current delta update has a strong assumption that the non-lowered weights share the same tensor dtype from the lowered version. This is not true by design. When dtype mismatches the data loading will load the data into unexpected dtype which introduces undefined behavior. This diff aims to close the gap by always load tensor by its original dtype first then cast to desired dtype.

Test Plan:
No more NaN values!

{P2022339213}

Reviewed By: henryoier, kqfu

Differential Revision: D86181685
pytorch-bot bot pushed a commit that referenced this pull request Nov 6, 2025
… value first before casting to target dtype (#167039)

Summary:

The current delta update has a strong assumption that the non-lowered weights share the same tensor dtype from the lowered version. This is not true by design. When dtype mismatches the data loading will load the data into unexpected dtype which introduces undefined behavior. This diff aims to close the gap by always load tensor by its original dtype first then cast to desired dtype.

Test Plan:
No more NaN values!

{P2025779492}

Unit tests:

buck test fbcode//mode/dev-nosan -c fbcode.nvcc_arch=a100,h100 fbcode//sigmoid/inference/test_gpu:model_runner_test -- --exact 'sigmoid/inference/test_gpu:model_runner_test - ModelRunnerTest.Basic_AOTInductorCuda_SwapWeights' 2>&1 | tee re_local_benchmark_log.txt


buck test fbcode//mode/dev-nosan -c fbcode.nvcc_arch=a100,h100 fbcode//sigmoid/inference/test_gpu:model_runner_test -- --exact 'sigmoid/inference/test_gpu:model_runner_test - ModelRunnerTest.Basic_InterpreterCuda_SwapWeights'  2>&1 | tee re_local_benchmark_log.txt


buck test fbcode//mode/dev-nosan -c fbcode.nvcc_arch=a100,h100 fbcode//sigmoid/inference/test_gpu:model_runner_test -- --exact 'sigmoid/inference/test_gpu:model_runner_test - ModelRunnerTest.TestLoadDeltaUpdates'  2>&1 | tee re_local_benchmark_log.txt

Reviewed By: henryoier, kqfu

Differential Revision: D86181685
@jijunyan
Copy link
Contributor Author

jijunyan commented Nov 6, 2025

@pytorchbot label "topic: not user facing"

@pytorch-bot pytorch-bot bot added the topic: not user facing topic category label Nov 6, 2025
jijunyan added a commit to jijunyan/pytorch that referenced this pull request Nov 6, 2025
… value first before casting to target dtype (pytorch#167039)

Summary:

The current delta update has a strong assumption that the non-lowered weights share the same tensor dtype from the lowered version. This is not true by design. When dtype mismatches the data loading will load the data into unexpected dtype which introduces undefined behavior. This diff aims to close the gap by always load tensor by its original dtype first then cast to desired dtype.

Test Plan:
No more NaN values!

{P2025779492}

Unit tests:

buck test fbcode//mode/dev-nosan -c fbcode.nvcc_arch=a100,h100 fbcode//sigmoid/inference/test_gpu:model_runner_test -- --exact 'sigmoid/inference/test_gpu:model_runner_test - ModelRunnerTest.Basic_AOTInductorCuda_SwapWeights' 2>&1 | tee re_local_benchmark_log.txt


buck test fbcode//mode/dev-nosan -c fbcode.nvcc_arch=a100,h100 fbcode//sigmoid/inference/test_gpu:model_runner_test -- --exact 'sigmoid/inference/test_gpu:model_runner_test - ModelRunnerTest.Basic_InterpreterCuda_SwapWeights'  2>&1 | tee re_local_benchmark_log.txt


buck test fbcode//mode/dev-nosan -c fbcode.nvcc_arch=a100,h100 fbcode//sigmoid/inference/test_gpu:model_runner_test -- --exact 'sigmoid/inference/test_gpu:model_runner_test - ModelRunnerTest.TestLoadDeltaUpdates'  2>&1 | tee re_local_benchmark_log.txt

Reviewed By: henryoier, kqfu

Differential Revision: D86181685
… value first before casting to target dtype (pytorch#167039)

Summary:

The current delta update has a strong assumption that the non-lowered weights share the same tensor dtype from the lowered version. This is not true by design. When dtype mismatches the data loading will load the data into unexpected dtype which introduces undefined behavior. This diff aims to close the gap by always load tensor by its original dtype first then cast to desired dtype.

Test Plan:
No more NaN values!

{P2025779492}

Unit tests:

buck test fbcode//mode/dev-nosan -c fbcode.nvcc_arch=a100,h100 fbcode//sigmoid/inference/test_gpu:model_runner_test -- --exact 'sigmoid/inference/test_gpu:model_runner_test - ModelRunnerTest.Basic_AOTInductorCuda_SwapWeights' 2>&1 | tee re_local_benchmark_log.txt


buck test fbcode//mode/dev-nosan -c fbcode.nvcc_arch=a100,h100 fbcode//sigmoid/inference/test_gpu:model_runner_test -- --exact 'sigmoid/inference/test_gpu:model_runner_test - ModelRunnerTest.Basic_InterpreterCuda_SwapWeights'  2>&1 | tee re_local_benchmark_log.txt


buck test fbcode//mode/dev-nosan -c fbcode.nvcc_arch=a100,h100 fbcode//sigmoid/inference/test_gpu:model_runner_test -- --exact 'sigmoid/inference/test_gpu:model_runner_test - ModelRunnerTest.TestLoadDeltaUpdates'  2>&1 | tee re_local_benchmark_log.txt

Reviewed By: henryoier, kqfu

Differential Revision: D86181685
@facebook-github-bot
Copy link
Contributor

@pytorchbot merge

(Initiating merge automatically since Phabricator Diff has merged)

@pytorchmergebot
Copy link
Collaborator

Merge started

Your change will be merged once all checks pass (ETA 0-4 Hours).

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging
Check the merge workflow status
here

yarongmu-google pushed a commit to yarongmu-google/pytorch that referenced this pull request Nov 7, 2025
… value first before casting to target dtype (pytorch#167039)

Summary: The current delta update has a strong assumption that the non-lowered weights share the same tensor dtype from the lowered version. This is not true by design. When dtype mismatches the data loading will load the data into unexpected dtype which introduces undefined behavior. This diff aims to close the gap by always load tensor by its original dtype first then cast to desired dtype.

Test Plan:
No more NaN values!

{P2022339213}

Reviewed By: kqfu

Differential Revision: D86181685

Pull Request resolved: pytorch#167039
Approved by: https://github.com/henryoier
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ciflow/trunk Trigger trunk jobs on your pull request fb-exported Merged meta-exported topic: not user facing topic category

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants