Skip to content

Conversation

@YUNQIUGUO
Copy link
Contributor

@YUNQIUGUO YUNQIUGUO commented Aug 19, 2024

@pytorch-bot
Copy link

pytorch-bot bot commented Aug 19, 2024

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/133871

Note: Links to docs will display an error until the docs builds have been completed.

✅ You can merge normally! (1 Unrelated Failure)

As of commit 1c82cac with merge base 68b1a09 (image):

BROKEN TRUNK - The following job failed but were present on the merge base:

👉 Rebase onto the `viable/strict` branch to avoid these failures

This comment was automatically generated by Dr. CI and updates every 15 minutes.

@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D61415304

@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D61415304

@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D61415304

YUNQIUGUO added a commit to YUNQIUGUO/pytorch that referenced this pull request Aug 19, 2024
…ensor (pytorch#133871)

Summary: Pull Request resolved: pytorch#133871

Differential Revision: D61415304
YUNQIUGUO added a commit to YUNQIUGUO/pytorch that referenced this pull request Aug 20, 2024
…ensor (pytorch#133871)

Summary: Pull Request resolved: pytorch#133871

Differential Revision: D61415304
@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D61415304

1 similar comment
@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D61415304

YUNQIUGUO added a commit to YUNQIUGUO/pytorch that referenced this pull request Aug 20, 2024
…ensor (pytorch#133871)

Summary:
Pull Request resolved: pytorch#133871

This diff adds an additional functionality for debug printer tool which saves each arg before/after kernel launch value into individual`.pt` file. which looks like these:
 {F1783443253}
in a format:
`<before/after_launch>_<kernel_name>_<tensor_arg_name>_<device_type>.pt`

The generated `.pt` file can be further loaded in `torch.load()` for debugging purpose.

Sample program to load a `output.pt`:

```
debug_dump_pt_value.py
import torch
# Load the tensor from the .pt file
tensor = torch.jit.load('/data/users/guorachel/pytorch/test/inductor/tmp/aoti_torch/after_launch_aoti_torch_cuda_addmm_out_buf0_cuda:0.pt')

# Print the information of the tensors inside the module
for param in tensor.parameters():
    print(param)

> output:
tensor([ 0.6331,  1.6358, -0.3459,  1.0196, -0.4122,  1.4279], device='cuda:0')
```

Test Plan:
CI

One concern is that
    "torch/csrc/api/src/serialize/input-archive.cpp",
    "torch/csrc/api/src/serialize/output-archive.cpp",

These newly added source files for inductor may introduce duplicate symbol issue during compiling between `liblibtorch` and `libtorch-cpu-cpp` static libs

Differential Revision: D61415304
@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D61415304

@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D61415304

YUNQIUGUO added a commit to YUNQIUGUO/pytorch that referenced this pull request Aug 21, 2024
…ensor (pytorch#133871)

Summary:
Pull Request resolved: pytorch#133871

This diff adds an additional functionality for debug printer tool which saves each arg before/after kernel launch value into individual`.pt` file. which looks like these:
 {F1783443253}
in a format:
`<before/after_launch>_<kernel_name>_<tensor_arg_name>_<device_type>.pt`

The generated `.pt` file can be further loaded in `torch.load()` for debugging purpose.

Sample program to load a `output.pt`:

```
debug_dump_pt_value.py
import torch
# Load the tensor from the .pt file
tensor = torch.jit.load('/data/users/guorachel/pytorch/test/inductor/tmp/aoti_torch/after_launch_aoti_torch_cuda_addmm_out_buf0_cuda:0.pt')

# Print the information of the tensors inside the module
for param in tensor.parameters():
    print(param)

> output:
tensor([ 0.6331,  1.6358, -0.3459,  1.0196, -0.4122,  1.4279], device='cuda:0')
```

Test Plan:
CI

One concern is that
    "torch/csrc/api/src/serialize/input-archive.cpp",
    "torch/csrc/api/src/serialize/output-archive.cpp",

These newly added source files for inductor may introduce duplicate symbol issue during compiling between `liblibtorch` and `libtorch-cpu-cpp` static libs

Differential Revision: D61415304
@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D61415304

@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D61415304

@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D61415304

1 similar comment
@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D61415304

@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D61415304

YUNQIUGUO added a commit to YUNQIUGUO/pytorch that referenced this pull request Aug 27, 2024
…diate tensor (pytorch#133871)

Summary:
Pull Request resolved: pytorch#133871

- This diff adds an additional functionality for debug printer tool which saves each arg before/after kernel launch value into individual`.pt` file. which looks like these:
 {F1817615926}
in a format:
`<before/after_launch>_<kernel_name>_<tensor_arg_name>_<device_type>.pt`

The generated `.pt` file can be further loaded in `torch.load()` for debugging purpose.

Sample program to load a `output.pt`:

```
debug_dump_pt_value.py
import torch
# Load the tensor from the .pt file
tensor = torch.jit.load('/data/users/guorachel/pytorch/test/inductor/tmp/aoti_torch/after_launch_aoti_torch_cuda_addmm_out_buf0_cuda:0.pt')

# Print the information of the tensors inside the module
for param in tensor.parameters():
    print(param)

> output:
tensor([ 0.6331,  1.6358, -0.3459,  1.0196, -0.4122,  1.4279], device='cuda:0')
```

- Also consolidates debug_printer related configs. see below

```

class IntermediateValueDebuggingLevel(Enum):
    # OFF: No intermediate tensor value debug info will be printed or saved.
    OFF = "0"
    # LEVEL 1: Save all intermediate tensor values to individual `.pt` files. No debug printing will be displayed.
    SAVE_ONLY = "1"
    # LEVEL 2: Print all intermediate tensor values by default to the console. (`save` functionality is available)
    DEFAULT_PRINT = "2"
```

Test Plan:
CI

And usage cmd:

```
AOT_INDUCTOR_FILTERED_KERNELS_TO_PRINT="aoti_torch_cuda_addmm_out" AOT_INDUCTOR_DEBUG_INTERMEDIATE_VALUE_PRINTER=2 TORCHINDUCTOR_FORCE_DISABLE_CACHES=1  TORCHINDUCTOR_ABI_COMPATIBLE=1 TORCH_COMPILE_DEBUG=1 TORCH_LOGS="+graph, inductor, +schedule, output_code" buck2 run -c fbcode.enable_gpu_sections=true -c fbcode.nvcc_arch=h100 @//mode/opt fbcode//caffe2/test/inductor:test_aot_inductor -- -r test_addmm_abi_compatible_cuda
```

```
AOT_INDUCTOR_DEBUG_INTERMEDIATE_VALUE_PRINTER=2 TORCHINDUCTOR_FORCE_DISABLE_CACHES=1  TORCHINDUCTOR_ABI_COMPATIBLE=1 TORCH_COMPILE_DEBUG=1 TORCH_LOGS="+graph, inductor, +schedule, output_code" buck2 run -c fbcode.enable_gpu_sections=true -c fbcode.nvcc_arch=h100 @//mode/opt fbcode//caffe2/test/inductor:test_aot_inductor -- -r test_aoti_debug_printer_codegen_abi_compatible_cuda
```

Differential Revision: D61415304
@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D61415304

@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D61415304

…diate tensor (pytorch#133871)

Summary:
Pull Request resolved: pytorch#133871

- This diff adds an additional functionality for debug printer tool which saves each arg before/after kernel launch value into individual`.pt` file. which looks like these:
 {F1817615926}
in a format:
`<before/after_launch>_<kernel_name>_<tensor_arg_name>_<device_type>.pt`

The generated `.pt` file can be further loaded in `torch.load()` for debugging purpose.

Sample program to load a `output.pt`:

```
debug_dump_pt_value.py
import torch
# Load the tensor from the .pt file
tensor = torch.jit.load('/data/users/guorachel/pytorch/test/inductor/tmp/aoti_torch/after_launch_aoti_torch_cuda_addmm_out_buf0_cuda:0.pt')

# Print the information of the tensors inside the module
for param in tensor.parameters():
    print(param)

> output:
tensor([ 0.6331,  1.6358, -0.3459,  1.0196, -0.4122,  1.4279], device='cuda:0')
```

- Also consolidates debug_printer related configs. see below

```

class IntermediateValueDebuggingLevel(Enum):
    # OFF: No intermediate tensor value debug info will be printed or saved.
    OFF = "0"
    # LEVEL 1: Save all intermediate tensor values to individual `.pt` files. No debug printing will be displayed.
    SAVE_ONLY = "1"
    # LEVEL 2: Print all intermediate tensor values by default to the console. (`save` functionality is available)
    DEFAULT_PRINT = "2"
```

Test Plan:
CI

And usage cmd:

```
AOT_INDUCTOR_FILTERED_KERNELS_TO_PRINT="aoti_torch_cuda_addmm_out" AOT_INDUCTOR_DEBUG_INTERMEDIATE_VALUE_PRINTER=2 TORCHINDUCTOR_FORCE_DISABLE_CACHES=1  TORCHINDUCTOR_ABI_COMPATIBLE=1 TORCH_COMPILE_DEBUG=1 TORCH_LOGS="+graph, inductor, +schedule, output_code" buck2 run -c fbcode.enable_gpu_sections=true -c fbcode.nvcc_arch=h100 @//mode/opt fbcode//caffe2/test/inductor:test_aot_inductor -- -r test_addmm_abi_compatible_cuda
```

```
AOT_INDUCTOR_DEBUG_INTERMEDIATE_VALUE_PRINTER=2 TORCHINDUCTOR_FORCE_DISABLE_CACHES=1  TORCHINDUCTOR_ABI_COMPATIBLE=1 TORCH_COMPILE_DEBUG=1 TORCH_LOGS="+graph, inductor, +schedule, output_code" buck2 run -c fbcode.enable_gpu_sections=true -c fbcode.nvcc_arch=h100 @//mode/opt fbcode//caffe2/test/inductor:test_aot_inductor -- -r test_aoti_debug_printer_codegen_abi_compatible_cuda
```

Differential Revision: D61415304
@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D61415304

Comment on lines +1021 to +1024
auto bytes = torch::jit::pickle_save(c10::IValue(*t));
std::ofstream fout(tensor_filepath_to_save, std::ios::out | std::ios::binary);
fout.write(bytes.data(), bytes.size());
fout.close();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice! as a follow-up PR let's print the location of where the tensor is saved?

@pytorch-bot pytorch-bot bot added the ciflow/trunk Trigger trunk jobs on your pull request label Aug 27, 2024
@facebook-github-bot
Copy link
Contributor

@pytorchbot merge -f 'Landed internally'

(Initiating merge automatically since Phabricator Diff has merged, using force because this PR might not pass merge_rules.json but landed internally)

@pytorchmergebot
Copy link
Collaborator

Merge started

Your change will be merged immediately since you used the force (-f) flag, bypassing any CI checks (ETA: 1-5 minutes). Please use -f as last resort and instead consider -i/--ignore-current to continue the merge ignoring current failures. This will allow currently pending tests to finish and report signal before the merge.

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging
Check the merge workflow status
here

pytorch-bot bot pushed a commit that referenced this pull request Sep 13, 2024
…ensor (#133871)

Differential Revision: D61415304

Pull Request resolved: #133871
Approved by: https://github.com/ColinPeppler
Chao1Han pushed a commit to Chao1Han/pytorch that referenced this pull request Sep 20, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants