-
Notifications
You must be signed in to change notification settings - Fork 25.8k
[Sigmoid][Delta Update][2/N] update delta update api to load original value first before casting to target dtype #167039
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
🔗 Helpful Links🧪 See artifacts and rendered test results at hud.pytorch.org/pr/167039
Note: Links to docs will display an error until the docs builds have been completed. ✅ No FailuresAs of commit 0c641c1 with merge base 9eebda9 ( This comment was automatically generated by Dr. CI and updates every 15 minutes. |
5aeacd5 to
5e531c4
Compare
… value first before casting to target dtype (pytorch#167039) Summary: The current delta update has a strong assumption that the non-lowered weights share the same tensor dtype from the lowered version. This is not true by design. When dtype mismatches the data loading will load the data into unexpected dtype which introduces undefined behavior. This diff aims to close the gap by always load tensor by its original dtype first then cast to desired dtype. Test Plan: No more NaN values! {P2022339213} Reviewed By: henryoier, kqfu Differential Revision: D86181685
5e531c4 to
2222124
Compare
… value first before casting to target dtype (#167039) Summary: The current delta update has a strong assumption that the non-lowered weights share the same tensor dtype from the lowered version. This is not true by design. When dtype mismatches the data loading will load the data into unexpected dtype which introduces undefined behavior. This diff aims to close the gap by always load tensor by its original dtype first then cast to desired dtype. Test Plan: No more NaN values! {P2025779492} Unit tests: buck test fbcode//mode/dev-nosan -c fbcode.nvcc_arch=a100,h100 fbcode//sigmoid/inference/test_gpu:model_runner_test -- --exact 'sigmoid/inference/test_gpu:model_runner_test - ModelRunnerTest.Basic_AOTInductorCuda_SwapWeights' 2>&1 | tee re_local_benchmark_log.txt buck test fbcode//mode/dev-nosan -c fbcode.nvcc_arch=a100,h100 fbcode//sigmoid/inference/test_gpu:model_runner_test -- --exact 'sigmoid/inference/test_gpu:model_runner_test - ModelRunnerTest.Basic_InterpreterCuda_SwapWeights' 2>&1 | tee re_local_benchmark_log.txt buck test fbcode//mode/dev-nosan -c fbcode.nvcc_arch=a100,h100 fbcode//sigmoid/inference/test_gpu:model_runner_test -- --exact 'sigmoid/inference/test_gpu:model_runner_test - ModelRunnerTest.TestLoadDeltaUpdates' 2>&1 | tee re_local_benchmark_log.txt Reviewed By: henryoier, kqfu Differential Revision: D86181685
|
@pytorchbot label "topic: not user facing" |
2222124 to
864f9c4
Compare
… value first before casting to target dtype (pytorch#167039) Summary: The current delta update has a strong assumption that the non-lowered weights share the same tensor dtype from the lowered version. This is not true by design. When dtype mismatches the data loading will load the data into unexpected dtype which introduces undefined behavior. This diff aims to close the gap by always load tensor by its original dtype first then cast to desired dtype. Test Plan: No more NaN values! {P2025779492} Unit tests: buck test fbcode//mode/dev-nosan -c fbcode.nvcc_arch=a100,h100 fbcode//sigmoid/inference/test_gpu:model_runner_test -- --exact 'sigmoid/inference/test_gpu:model_runner_test - ModelRunnerTest.Basic_AOTInductorCuda_SwapWeights' 2>&1 | tee re_local_benchmark_log.txt buck test fbcode//mode/dev-nosan -c fbcode.nvcc_arch=a100,h100 fbcode//sigmoid/inference/test_gpu:model_runner_test -- --exact 'sigmoid/inference/test_gpu:model_runner_test - ModelRunnerTest.Basic_InterpreterCuda_SwapWeights' 2>&1 | tee re_local_benchmark_log.txt buck test fbcode//mode/dev-nosan -c fbcode.nvcc_arch=a100,h100 fbcode//sigmoid/inference/test_gpu:model_runner_test -- --exact 'sigmoid/inference/test_gpu:model_runner_test - ModelRunnerTest.TestLoadDeltaUpdates' 2>&1 | tee re_local_benchmark_log.txt Reviewed By: henryoier, kqfu Differential Revision: D86181685
… value first before casting to target dtype (pytorch#167039) Summary: The current delta update has a strong assumption that the non-lowered weights share the same tensor dtype from the lowered version. This is not true by design. When dtype mismatches the data loading will load the data into unexpected dtype which introduces undefined behavior. This diff aims to close the gap by always load tensor by its original dtype first then cast to desired dtype. Test Plan: No more NaN values! {P2025779492} Unit tests: buck test fbcode//mode/dev-nosan -c fbcode.nvcc_arch=a100,h100 fbcode//sigmoid/inference/test_gpu:model_runner_test -- --exact 'sigmoid/inference/test_gpu:model_runner_test - ModelRunnerTest.Basic_AOTInductorCuda_SwapWeights' 2>&1 | tee re_local_benchmark_log.txt buck test fbcode//mode/dev-nosan -c fbcode.nvcc_arch=a100,h100 fbcode//sigmoid/inference/test_gpu:model_runner_test -- --exact 'sigmoid/inference/test_gpu:model_runner_test - ModelRunnerTest.Basic_InterpreterCuda_SwapWeights' 2>&1 | tee re_local_benchmark_log.txt buck test fbcode//mode/dev-nosan -c fbcode.nvcc_arch=a100,h100 fbcode//sigmoid/inference/test_gpu:model_runner_test -- --exact 'sigmoid/inference/test_gpu:model_runner_test - ModelRunnerTest.TestLoadDeltaUpdates' 2>&1 | tee re_local_benchmark_log.txt Reviewed By: henryoier, kqfu Differential Revision: D86181685
864f9c4 to
0c641c1
Compare
|
@pytorchbot merge (Initiating merge automatically since Phabricator Diff has merged) |
Merge startedYour change will be merged once all checks pass (ETA 0-4 Hours). Learn more about merging in the wiki. Questions? Feedback? Please reach out to the PyTorch DevX Team |
… value first before casting to target dtype (pytorch#167039) Summary: The current delta update has a strong assumption that the non-lowered weights share the same tensor dtype from the lowered version. This is not true by design. When dtype mismatches the data loading will load the data into unexpected dtype which introduces undefined behavior. This diff aims to close the gap by always load tensor by its original dtype first then cast to desired dtype. Test Plan: No more NaN values! {P2022339213} Reviewed By: kqfu Differential Revision: D86181685 Pull Request resolved: pytorch#167039 Approved by: https://github.com/henryoier
Summary: The current delta update has a strong assumption that the non-lowered weights share the same tensor dtype from the lowered version. This is not true by design. When dtype mismatches the data loading will load the data into unexpected dtype which introduces undefined behavior. This diff aims to close the gap by always load tensor by its original dtype first then cast to desired dtype.
Test Plan:
No more NaN values!
{P2022339213}
Reviewed By: kqfu
Differential Revision: D86181685