[AOTI][Tooling][4/n] Add `torch.save()` for individual intermediate tensor #133871

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

Sign up for GitHub

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Jump to bottom

Closed

YUNQIUGUO wants to merge 1 commit into pytorch:main from YUNQIUGUO:export-D61415304

Contributor

YUNQIUGUO commented Aug 19, 2024 •

edited by pytorch-bot bot

Loading

Differential Revision: D61415304

cc @voznesenskym @penguinwu @EikanWang @jgong5 @Guobing-Chen @XiaobingSuper @zhuhaozhe @blzheng @wenzhe-nrv @jiayisunx @ipiszy @yf225 @chenyang78 @kadeng @muchulee8 @ColinPeppler @amjames @desertfire @chauhang

pytorch-bot bot commented Aug 19, 2024 •

edited

Loading

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/133871

📄 Preview Python docs built from this PR
📄 Preview C++ docs built from this PR
❓ Need help or want to give feedback on the CI? Visit the bot commands wiki or our office hours

Note: Links to docs will display an error until the docs builds have been completed.

✅ You can merge normally! (1 Unrelated Failure)

As of commit 1c82cac with merge base 68b1a09 ():

BROKEN TRUNK - The following job failed but were present on the merge base:

👉 Rebase onto the `viable/strict` branch to avoid these failures

trunk / linux-focal-rocm6.1-py3.8 / test (default, 1, 2, linux.rocm.gpu) (gh) (trunk failure)
test_nn.py::TestNNDeviceTypeCUDA::test_transformerencoderlayer_cuda_float32

This comment was automatically generated by Dr. CI and updates every 15 minutes.

pytorch-bot bot added ciflow/inductor module: inductor labels

Contributor

facebook-github-bot commented Aug 19, 2024

This pull request was exported from Phabricator. Differential Revision: D61415304

facebook-github-bot added the fb-exported label

Contributor

facebook-github-bot commented Aug 19, 2024

This pull request was exported from Phabricator. Differential Revision: D61415304

YUNQIUGUO force-pushed the export-D61415304 branch from e1fc6e7 to 58dbfb4 Compare

August 19, 2024 18:53

Contributor

facebook-github-bot commented Aug 19, 2024

This pull request was exported from Phabricator. Differential Revision: D61415304

YUNQIUGUO force-pushed the export-D61415304 branch from 58dbfb4 to b4cfee9 Compare

August 19, 2024 23:02

YUNQIUGUO added a commit to YUNQIUGUO/pytorch that referenced this pull request


          [AOTI][Tooling][4/n] Add torch.save() for individual intermediate t…

b4cfee9

…ensor (pytorch#133871)

Summary: Pull Request resolved: pytorch#133871

Differential Revision: D61415304

YUNQIUGUO force-pushed the export-D61415304 branch from b4cfee9 to f3c54ea Compare

August 20, 2024 04:57

YUNQIUGUO added a commit to YUNQIUGUO/pytorch that referenced this pull request


          [AOTI][Tooling][4/n] Add torch.save() for individual intermediate t…

f3c54ea

…ensor (pytorch#133871)

Summary: Pull Request resolved: pytorch#133871

Differential Revision: D61415304

Contributor

facebook-github-bot commented Aug 20, 2024

This pull request was exported from Phabricator. Differential Revision: D61415304

1 similar comment

Contributor

facebook-github-bot commented Aug 20, 2024

This pull request was exported from Phabricator. Differential Revision: D61415304

YUNQIUGUO force-pushed the export-D61415304 branch from f3c54ea to 0cb5e7e Compare

August 20, 2024 16:02

YUNQIUGUO added a commit to YUNQIUGUO/pytorch that referenced this pull request


          [AOTI][Tooling][4/n] Add torch.save() for individual intermediate t…

0cb5e7e

…ensor (pytorch#133871)

Summary:
Pull Request resolved: pytorch#133871

This diff adds an additional functionality for debug printer tool which saves each arg before/after kernel launch value into individual`.pt` file. which looks like these:
 {F1783443253}
in a format:
`<before/after_launch>_<kernel_name>_<tensor_arg_name>_<device_type>.pt`

The generated `.pt` file can be further loaded in `torch.load()` for debugging purpose.

Sample program to load a `output.pt`:

```
debug_dump_pt_value.py
import torch
# Load the tensor from the .pt file
tensor = torch.jit.load('/data/users/guorachel/pytorch/test/inductor/tmp/aoti_torch/after_launch_aoti_torch_cuda_addmm_out_buf0_cuda:0.pt')

# Print the information of the tensors inside the module
for param in tensor.parameters():
    print(param)

> output:
tensor([ 0.6331,  1.6358, -0.3459,  1.0196, -0.4122,  1.4279], device='cuda:0')
```

Test Plan:
CI

One concern is that
    "torch/csrc/api/src/serialize/input-archive.cpp",
    "torch/csrc/api/src/serialize/output-archive.cpp",

These newly added source files for inductor may introduce duplicate symbol issue during compiling between `liblibtorch` and `libtorch-cpu-cpp` static libs

Differential Revision: D61415304

Contributor

facebook-github-bot commented Aug 20, 2024

This pull request was exported from Phabricator. Differential Revision: D61415304

YUNQIUGUO force-pushed the export-D61415304 branch from 0cb5e7e to 23ccbde Compare

August 20, 2024 21:07

YUNQIUGUO force-pushed the export-D61415304 branch from 23ccbde to 07f884b Compare

August 21, 2024 16:45

Contributor

facebook-github-bot commented Aug 21, 2024

This pull request was exported from Phabricator. Differential Revision: D61415304

YUNQIUGUO added a commit to YUNQIUGUO/pytorch that referenced this pull request


          [AOTI][Tooling][4/n] Add torch.save() for individual intermediate t…

07f884b

…ensor (pytorch#133871)

Summary:
Pull Request resolved: pytorch#133871

This diff adds an additional functionality for debug printer tool which saves each arg before/after kernel launch value into individual`.pt` file. which looks like these:
 {F1783443253}
in a format:
`<before/after_launch>_<kernel_name>_<tensor_arg_name>_<device_type>.pt`

The generated `.pt` file can be further loaded in `torch.load()` for debugging purpose.

Sample program to load a `output.pt`:

```
debug_dump_pt_value.py
import torch
# Load the tensor from the .pt file
tensor = torch.jit.load('/data/users/guorachel/pytorch/test/inductor/tmp/aoti_torch/after_launch_aoti_torch_cuda_addmm_out_buf0_cuda:0.pt')

# Print the information of the tensors inside the module
for param in tensor.parameters():
    print(param)

> output:
tensor([ 0.6331,  1.6358, -0.3459,  1.0196, -0.4122,  1.4279], device='cuda:0')
```

Test Plan:
CI

One concern is that
    "torch/csrc/api/src/serialize/input-archive.cpp",
    "torch/csrc/api/src/serialize/output-archive.cpp",

These newly added source files for inductor may introduce duplicate symbol issue during compiling between `liblibtorch` and `libtorch-cpu-cpp` static libs

Differential Revision: D61415304

Contributor

facebook-github-bot commented Aug 21, 2024

This pull request was exported from Phabricator. Differential Revision: D61415304

YUNQIUGUO force-pushed the export-D61415304 branch from 07f884b to e38a61f Compare

August 21, 2024 18:52

YUNQIUGUO force-pushed the export-D61415304 branch from e38a61f to b6bfb26 Compare

August 21, 2024 20:21

Contributor

facebook-github-bot commented Aug 21, 2024

This pull request was exported from Phabricator. Differential Revision: D61415304

YUNQIUGUO force-pushed the export-D61415304 branch from b6bfb26 to 583cc73 Compare

August 21, 2024 22:30

YUNQIUGUO force-pushed the export-D61415304 branch from e304e17 to b864635 Compare

August 26, 2024 23:59

Contributor

facebook-github-bot commented Aug 26, 2024

This pull request was exported from Phabricator. Differential Revision: D61415304

1 similar comment

Contributor

facebook-github-bot commented Aug 27, 2024

This pull request was exported from Phabricator. Differential Revision: D61415304

YUNQIUGUO force-pushed the export-D61415304 branch from b864635 to 5f26686 Compare

August 27, 2024 00:06

YUNQIUGUO force-pushed the export-D61415304 branch from 5f26686 to 1fe918e Compare

August 27, 2024 01:59

Contributor

facebook-github-bot commented Aug 27, 2024

This pull request was exported from Phabricator. Differential Revision: D61415304

YUNQIUGUO added a commit to YUNQIUGUO/pytorch that referenced this pull request


          [AOTI][Tooling][4/n] Add torch.pickle_save() for individual interme…

1fe918e

…diate tensor (pytorch#133871)

Summary:
Pull Request resolved: pytorch#133871

- This diff adds an additional functionality for debug printer tool which saves each arg before/after kernel launch value into individual`.pt` file. which looks like these:
 {F1817615926}
in a format:
`<before/after_launch>_<kernel_name>_<tensor_arg_name>_<device_type>.pt`

The generated `.pt` file can be further loaded in `torch.load()` for debugging purpose.

Sample program to load a `output.pt`:

```
debug_dump_pt_value.py
import torch
# Load the tensor from the .pt file
tensor = torch.jit.load('/data/users/guorachel/pytorch/test/inductor/tmp/aoti_torch/after_launch_aoti_torch_cuda_addmm_out_buf0_cuda:0.pt')

# Print the information of the tensors inside the module
for param in tensor.parameters():
    print(param)

> output:
tensor([ 0.6331,  1.6358, -0.3459,  1.0196, -0.4122,  1.4279], device='cuda:0')
```

- Also consolidates debug_printer related configs. see below

```

class IntermediateValueDebuggingLevel(Enum):
    # OFF: No intermediate tensor value debug info will be printed or saved.
    OFF = "0"
    # LEVEL 1: Save all intermediate tensor values to individual `.pt` files. No debug printing will be displayed.
    SAVE_ONLY = "1"
    # LEVEL 2: Print all intermediate tensor values by default to the console. (`save` functionality is available)
    DEFAULT_PRINT = "2"
```

Test Plan:
CI

And usage cmd:

```
AOT_INDUCTOR_FILTERED_KERNELS_TO_PRINT="aoti_torch_cuda_addmm_out" AOT_INDUCTOR_DEBUG_INTERMEDIATE_VALUE_PRINTER=2 TORCHINDUCTOR_FORCE_DISABLE_CACHES=1  TORCHINDUCTOR_ABI_COMPATIBLE=1 TORCH_COMPILE_DEBUG=1 TORCH_LOGS="+graph, inductor, +schedule, output_code" buck2 run -c fbcode.enable_gpu_sections=true -c fbcode.nvcc_arch=h100 @//mode/opt fbcode//caffe2/test/inductor:test_aot_inductor -- -r test_addmm_abi_compatible_cuda
```

```
AOT_INDUCTOR_DEBUG_INTERMEDIATE_VALUE_PRINTER=2 TORCHINDUCTOR_FORCE_DISABLE_CACHES=1  TORCHINDUCTOR_ABI_COMPATIBLE=1 TORCH_COMPILE_DEBUG=1 TORCH_LOGS="+graph, inductor, +schedule, output_code" buck2 run -c fbcode.enable_gpu_sections=true -c fbcode.nvcc_arch=h100 @//mode/opt fbcode//caffe2/test/inductor:test_aot_inductor -- -r test_aoti_debug_printer_codegen_abi_compatible_cuda
```

Differential Revision: D61415304

Contributor

facebook-github-bot commented Aug 27, 2024

This pull request was exported from Phabricator. Differential Revision: D61415304

YUNQIUGUO force-pushed the export-D61415304 branch from 1fe918e to 2081f7f Compare

August 27, 2024 15:49

Contributor

facebook-github-bot commented Aug 27, 2024

This pull request was exported from Phabricator. Differential Revision: D61415304

YUNQIUGUO force-pushed the export-D61415304 branch from 2081f7f to 4e8cfdc Compare

August 27, 2024 17:47


          [AOTI][Tooling][4/n] Add torch.pickle_save() for individual interme…

1c82cac

…diate tensor (pytorch#133871)

Summary:
Pull Request resolved: pytorch#133871

- This diff adds an additional functionality for debug printer tool which saves each arg before/after kernel launch value into individual`.pt` file. which looks like these:
 {F1817615926}
in a format:
`<before/after_launch>_<kernel_name>_<tensor_arg_name>_<device_type>.pt`

The generated `.pt` file can be further loaded in `torch.load()` for debugging purpose.

Sample program to load a `output.pt`:

```
debug_dump_pt_value.py
import torch
# Load the tensor from the .pt file
tensor = torch.jit.load('/data/users/guorachel/pytorch/test/inductor/tmp/aoti_torch/after_launch_aoti_torch_cuda_addmm_out_buf0_cuda:0.pt')

# Print the information of the tensors inside the module
for param in tensor.parameters():
    print(param)

> output:
tensor([ 0.6331,  1.6358, -0.3459,  1.0196, -0.4122,  1.4279], device='cuda:0')
```

- Also consolidates debug_printer related configs. see below

```

class IntermediateValueDebuggingLevel(Enum):
    # OFF: No intermediate tensor value debug info will be printed or saved.
    OFF = "0"
    # LEVEL 1: Save all intermediate tensor values to individual `.pt` files. No debug printing will be displayed.
    SAVE_ONLY = "1"
    # LEVEL 2: Print all intermediate tensor values by default to the console. (`save` functionality is available)
    DEFAULT_PRINT = "2"
```

Test Plan:
CI

And usage cmd:

```
AOT_INDUCTOR_FILTERED_KERNELS_TO_PRINT="aoti_torch_cuda_addmm_out" AOT_INDUCTOR_DEBUG_INTERMEDIATE_VALUE_PRINTER=2 TORCHINDUCTOR_FORCE_DISABLE_CACHES=1  TORCHINDUCTOR_ABI_COMPATIBLE=1 TORCH_COMPILE_DEBUG=1 TORCH_LOGS="+graph, inductor, +schedule, output_code" buck2 run -c fbcode.enable_gpu_sections=true -c fbcode.nvcc_arch=h100 @//mode/opt fbcode//caffe2/test/inductor:test_aot_inductor -- -r test_addmm_abi_compatible_cuda
```

```
AOT_INDUCTOR_DEBUG_INTERMEDIATE_VALUE_PRINTER=2 TORCHINDUCTOR_FORCE_DISABLE_CACHES=1  TORCHINDUCTOR_ABI_COMPATIBLE=1 TORCH_COMPILE_DEBUG=1 TORCH_LOGS="+graph, inductor, +schedule, output_code" buck2 run -c fbcode.enable_gpu_sections=true -c fbcode.nvcc_arch=h100 @//mode/opt fbcode//caffe2/test/inductor:test_aot_inductor -- -r test_aoti_debug_printer_codegen_abi_compatible_cuda
```

Differential Revision: D61415304

YUNQIUGUO force-pushed the export-D61415304 branch from 4e8cfdc to 1c82cac Compare

August 27, 2024 17:54

Contributor

facebook-github-bot commented Aug 27, 2024

This pull request was exported from Phabricator. Differential Revision: D61415304

ColinPeppler reviewed

View reviewed changes

torch/csrc/inductor/aoti_torch/shim_common.cpp

Comment on lines +1021 to +1024

    
                auto bytes = torch::jit::pickle_save(c10::IValue(*t));

                std::ofstream fout(tensor_filepath_to_save, std::ios::out | std::ios::binary);

                fout.write(bytes.data(), bytes.size());

                fout.close();

Contributor

ColinPeppler Aug 27, 2024

Nice! as a follow-up PR let's print the location of where the tensor is saved?

ColinPeppler approved these changes

View reviewed changes

pytorch-bot bot added the ciflow/trunk label

Contributor

facebook-github-bot commented Aug 28, 2024

@pytorchbot merge -f 'Landed internally'

(Initiating merge automatically since Phabricator Diff has merged, using force because this PR might not pass merge_rules.json but landed internally)

pytorchmergebot added the merging label

Collaborator

pytorchmergebot commented Aug 28, 2024

Merge started

Your change will be merged immediately since you used the force (-f) flag, bypassing any CI checks (ETA: 1-5 minutes). Please use -f as last resort and instead consider -i/--ignore-current to continue the merge ignoring current failures. This will allow currently pending tests to finish and report signal before the merge.

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging

Check the merge workflow status
here

pytorchmergebot closed this in

89929d9

pytorchmergebot added Merged and removed merging labels

pytorch-bot bot pushed a commit that referenced this pull request


          [AOTI][Tooling][4/n] Add torch.save() for individual intermediate t…

6603e10

…ensor (#133871)

Differential Revision: D61415304

Pull Request resolved: #133871
Approved by: https://github.com/ColinPeppler

Chao1Han pushed a commit to Chao1Han/pytorch that referenced this pull request


          [AOTI][Tooling][4/n] Add torch.save() for individual intermediate t…

f151cfd

…ensor (pytorch#133871)

Differential Revision: D61415304

Pull Request resolved: pytorch#133871
Approved by: https://github.com/ColinPeppler

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ciflow/inductor ciflow/trunk fb-exported Merged module: inductor