-
Notifications
You must be signed in to change notification settings - Fork 25.7k
[AOTI][Tooling][4/n] Add torch.save() for individual intermediate tensor
#133871
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
🔗 Helpful Links🧪 See artifacts and rendered test results at hud.pytorch.org/pr/133871
Note: Links to docs will display an error until the docs builds have been completed. ✅ You can merge normally! (1 Unrelated Failure)As of commit 1c82cac with merge base 68b1a09 ( BROKEN TRUNK - The following job failed but were present on the merge base:👉 Rebase onto the `viable/strict` branch to avoid these failures
This comment was automatically generated by Dr. CI and updates every 15 minutes. |
|
This pull request was exported from Phabricator. Differential Revision: D61415304 |
|
This pull request was exported from Phabricator. Differential Revision: D61415304 |
e1fc6e7 to
58dbfb4
Compare
|
This pull request was exported from Phabricator. Differential Revision: D61415304 |
58dbfb4 to
b4cfee9
Compare
…ensor (pytorch#133871) Summary: Pull Request resolved: pytorch#133871 Differential Revision: D61415304
b4cfee9 to
f3c54ea
Compare
…ensor (pytorch#133871) Summary: Pull Request resolved: pytorch#133871 Differential Revision: D61415304
|
This pull request was exported from Phabricator. Differential Revision: D61415304 |
1 similar comment
|
This pull request was exported from Phabricator. Differential Revision: D61415304 |
f3c54ea to
0cb5e7e
Compare
…ensor (pytorch#133871) Summary: Pull Request resolved: pytorch#133871 This diff adds an additional functionality for debug printer tool which saves each arg before/after kernel launch value into individual`.pt` file. which looks like these: {F1783443253} in a format: `<before/after_launch>_<kernel_name>_<tensor_arg_name>_<device_type>.pt` The generated `.pt` file can be further loaded in `torch.load()` for debugging purpose. Sample program to load a `output.pt`: ``` debug_dump_pt_value.py import torch # Load the tensor from the .pt file tensor = torch.jit.load('/data/users/guorachel/pytorch/test/inductor/tmp/aoti_torch/after_launch_aoti_torch_cuda_addmm_out_buf0_cuda:0.pt') # Print the information of the tensors inside the module for param in tensor.parameters(): print(param) > output: tensor([ 0.6331, 1.6358, -0.3459, 1.0196, -0.4122, 1.4279], device='cuda:0') ``` Test Plan: CI One concern is that "torch/csrc/api/src/serialize/input-archive.cpp", "torch/csrc/api/src/serialize/output-archive.cpp", These newly added source files for inductor may introduce duplicate symbol issue during compiling between `liblibtorch` and `libtorch-cpu-cpp` static libs Differential Revision: D61415304
|
This pull request was exported from Phabricator. Differential Revision: D61415304 |
0cb5e7e to
23ccbde
Compare
23ccbde to
07f884b
Compare
|
This pull request was exported from Phabricator. Differential Revision: D61415304 |
…ensor (pytorch#133871) Summary: Pull Request resolved: pytorch#133871 This diff adds an additional functionality for debug printer tool which saves each arg before/after kernel launch value into individual`.pt` file. which looks like these: {F1783443253} in a format: `<before/after_launch>_<kernel_name>_<tensor_arg_name>_<device_type>.pt` The generated `.pt` file can be further loaded in `torch.load()` for debugging purpose. Sample program to load a `output.pt`: ``` debug_dump_pt_value.py import torch # Load the tensor from the .pt file tensor = torch.jit.load('/data/users/guorachel/pytorch/test/inductor/tmp/aoti_torch/after_launch_aoti_torch_cuda_addmm_out_buf0_cuda:0.pt') # Print the information of the tensors inside the module for param in tensor.parameters(): print(param) > output: tensor([ 0.6331, 1.6358, -0.3459, 1.0196, -0.4122, 1.4279], device='cuda:0') ``` Test Plan: CI One concern is that "torch/csrc/api/src/serialize/input-archive.cpp", "torch/csrc/api/src/serialize/output-archive.cpp", These newly added source files for inductor may introduce duplicate symbol issue during compiling between `liblibtorch` and `libtorch-cpu-cpp` static libs Differential Revision: D61415304
|
This pull request was exported from Phabricator. Differential Revision: D61415304 |
07f884b to
e38a61f
Compare
e38a61f to
b6bfb26
Compare
|
This pull request was exported from Phabricator. Differential Revision: D61415304 |
b6bfb26 to
583cc73
Compare
e304e17 to
b864635
Compare
|
This pull request was exported from Phabricator. Differential Revision: D61415304 |
1 similar comment
|
This pull request was exported from Phabricator. Differential Revision: D61415304 |
b864635 to
5f26686
Compare
5f26686 to
1fe918e
Compare
|
This pull request was exported from Phabricator. Differential Revision: D61415304 |
…diate tensor (pytorch#133871) Summary: Pull Request resolved: pytorch#133871 - This diff adds an additional functionality for debug printer tool which saves each arg before/after kernel launch value into individual`.pt` file. which looks like these: {F1817615926} in a format: `<before/after_launch>_<kernel_name>_<tensor_arg_name>_<device_type>.pt` The generated `.pt` file can be further loaded in `torch.load()` for debugging purpose. Sample program to load a `output.pt`: ``` debug_dump_pt_value.py import torch # Load the tensor from the .pt file tensor = torch.jit.load('/data/users/guorachel/pytorch/test/inductor/tmp/aoti_torch/after_launch_aoti_torch_cuda_addmm_out_buf0_cuda:0.pt') # Print the information of the tensors inside the module for param in tensor.parameters(): print(param) > output: tensor([ 0.6331, 1.6358, -0.3459, 1.0196, -0.4122, 1.4279], device='cuda:0') ``` - Also consolidates debug_printer related configs. see below ``` class IntermediateValueDebuggingLevel(Enum): # OFF: No intermediate tensor value debug info will be printed or saved. OFF = "0" # LEVEL 1: Save all intermediate tensor values to individual `.pt` files. No debug printing will be displayed. SAVE_ONLY = "1" # LEVEL 2: Print all intermediate tensor values by default to the console. (`save` functionality is available) DEFAULT_PRINT = "2" ``` Test Plan: CI And usage cmd: ``` AOT_INDUCTOR_FILTERED_KERNELS_TO_PRINT="aoti_torch_cuda_addmm_out" AOT_INDUCTOR_DEBUG_INTERMEDIATE_VALUE_PRINTER=2 TORCHINDUCTOR_FORCE_DISABLE_CACHES=1 TORCHINDUCTOR_ABI_COMPATIBLE=1 TORCH_COMPILE_DEBUG=1 TORCH_LOGS="+graph, inductor, +schedule, output_code" buck2 run -c fbcode.enable_gpu_sections=true -c fbcode.nvcc_arch=h100 @//mode/opt fbcode//caffe2/test/inductor:test_aot_inductor -- -r test_addmm_abi_compatible_cuda ``` ``` AOT_INDUCTOR_DEBUG_INTERMEDIATE_VALUE_PRINTER=2 TORCHINDUCTOR_FORCE_DISABLE_CACHES=1 TORCHINDUCTOR_ABI_COMPATIBLE=1 TORCH_COMPILE_DEBUG=1 TORCH_LOGS="+graph, inductor, +schedule, output_code" buck2 run -c fbcode.enable_gpu_sections=true -c fbcode.nvcc_arch=h100 @//mode/opt fbcode//caffe2/test/inductor:test_aot_inductor -- -r test_aoti_debug_printer_codegen_abi_compatible_cuda ``` Differential Revision: D61415304
|
This pull request was exported from Phabricator. Differential Revision: D61415304 |
1fe918e to
2081f7f
Compare
|
This pull request was exported from Phabricator. Differential Revision: D61415304 |
2081f7f to
4e8cfdc
Compare
…diate tensor (pytorch#133871) Summary: Pull Request resolved: pytorch#133871 - This diff adds an additional functionality for debug printer tool which saves each arg before/after kernel launch value into individual`.pt` file. which looks like these: {F1817615926} in a format: `<before/after_launch>_<kernel_name>_<tensor_arg_name>_<device_type>.pt` The generated `.pt` file can be further loaded in `torch.load()` for debugging purpose. Sample program to load a `output.pt`: ``` debug_dump_pt_value.py import torch # Load the tensor from the .pt file tensor = torch.jit.load('/data/users/guorachel/pytorch/test/inductor/tmp/aoti_torch/after_launch_aoti_torch_cuda_addmm_out_buf0_cuda:0.pt') # Print the information of the tensors inside the module for param in tensor.parameters(): print(param) > output: tensor([ 0.6331, 1.6358, -0.3459, 1.0196, -0.4122, 1.4279], device='cuda:0') ``` - Also consolidates debug_printer related configs. see below ``` class IntermediateValueDebuggingLevel(Enum): # OFF: No intermediate tensor value debug info will be printed or saved. OFF = "0" # LEVEL 1: Save all intermediate tensor values to individual `.pt` files. No debug printing will be displayed. SAVE_ONLY = "1" # LEVEL 2: Print all intermediate tensor values by default to the console. (`save` functionality is available) DEFAULT_PRINT = "2" ``` Test Plan: CI And usage cmd: ``` AOT_INDUCTOR_FILTERED_KERNELS_TO_PRINT="aoti_torch_cuda_addmm_out" AOT_INDUCTOR_DEBUG_INTERMEDIATE_VALUE_PRINTER=2 TORCHINDUCTOR_FORCE_DISABLE_CACHES=1 TORCHINDUCTOR_ABI_COMPATIBLE=1 TORCH_COMPILE_DEBUG=1 TORCH_LOGS="+graph, inductor, +schedule, output_code" buck2 run -c fbcode.enable_gpu_sections=true -c fbcode.nvcc_arch=h100 @//mode/opt fbcode//caffe2/test/inductor:test_aot_inductor -- -r test_addmm_abi_compatible_cuda ``` ``` AOT_INDUCTOR_DEBUG_INTERMEDIATE_VALUE_PRINTER=2 TORCHINDUCTOR_FORCE_DISABLE_CACHES=1 TORCHINDUCTOR_ABI_COMPATIBLE=1 TORCH_COMPILE_DEBUG=1 TORCH_LOGS="+graph, inductor, +schedule, output_code" buck2 run -c fbcode.enable_gpu_sections=true -c fbcode.nvcc_arch=h100 @//mode/opt fbcode//caffe2/test/inductor:test_aot_inductor -- -r test_aoti_debug_printer_codegen_abi_compatible_cuda ``` Differential Revision: D61415304
4e8cfdc to
1c82cac
Compare
|
This pull request was exported from Phabricator. Differential Revision: D61415304 |
| auto bytes = torch::jit::pickle_save(c10::IValue(*t)); | ||
| std::ofstream fout(tensor_filepath_to_save, std::ios::out | std::ios::binary); | ||
| fout.write(bytes.data(), bytes.size()); | ||
| fout.close(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice! as a follow-up PR let's print the location of where the tensor is saved?
|
@pytorchbot merge -f 'Landed internally' (Initiating merge automatically since Phabricator Diff has merged, using force because this PR might not pass merge_rules.json but landed internally) |
Merge startedYour change will be merged immediately since you used the force (-f) flag, bypassing any CI checks (ETA: 1-5 minutes). Please use Learn more about merging in the wiki. Questions? Feedback? Please reach out to the PyTorch DevX Team |
…ensor (#133871) Differential Revision: D61415304 Pull Request resolved: #133871 Approved by: https://github.com/ColinPeppler
…ensor (pytorch#133871) Differential Revision: D61415304 Pull Request resolved: pytorch#133871 Approved by: https://github.com/ColinPeppler
Differential Revision: D61415304
cc @voznesenskym @penguinwu @EikanWang @jgong5 @Guobing-Chen @XiaobingSuper @zhuhaozhe @blzheng @wenzhe-nrv @jiayisunx @ipiszy @yf225 @chenyang78 @kadeng @muchulee8 @ColinPeppler @amjames @desertfire @chauhang