Skip to content
Merged
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
8 changes: 8 additions & 0 deletions torch_xla/csrc/debug_util.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -16,6 +16,7 @@
#include "torch_xla/csrc/ir.h"
#include "torch_xla/csrc/ir_dump_util.h"
#include "torch_xla/csrc/ir_util.h"
#include "torch_xla/csrc/xla_graph_executor.h"

namespace torch_xla {
namespace {
Expand Down Expand Up @@ -120,6 +121,13 @@ void DebugUtil::SaveTensorsGraphInfo(const char* name,
"XLA_SAVE_TENSORS_FILE", "", GetCurrentDevice().ordinal());
if (!save_file.empty()) {
static std::mutex lock;
if (format == DebugUtil::GraphFormat::kHlo && indices->size() > 0) {
// Dumping the HLO might access the placeholder data created during
// previous execution. We need to wait for last execution to finish before
// proceeding.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder what consequence we get if we don't wait, for example what error would you get when you run the python script

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

PJRT will throw a hasValue error when it access the placeholder.

torch::lazy::BackendDevice device = tensors[(*indices)[0]]->GetDevice();
XLAGraphExecutor::Get()->WaitDeviceOps({device.toString()});
}
std::string info = GetTensorsGraphInfo(tensors, indices, format);
std::lock_guard<std::mutex> guard(lock);
std::ofstream graph_file(save_file, std::ios_base::app);
Expand Down