Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[runtime][hip] Add tracing in graph execution mode #16894

Merged
merged 2 commits into from Apr 15, 2024

Conversation

sogartar
Copy link
Contributor

There are some outstanding concurrency issues where some events don't get recorded during collection.
If we collect only when the device gets destroyed with a 1s delay, the concurrency issues seem to not manifest.

@sogartar
Copy link
Contributor Author

Here is the change that collects events only on device destruction https://github.com/sogartar/SHARK-Runtime/tree/hip-graph-tracing-collect-only-on-device-destruction
There I don't see concurrency issues with respect to tracing.
I think we need to resolve these concurrency issues in separate PR.
Although, I am fairly positive they don't come form the tracing logic introduced here, I urge the reviewers to be alert since synchronization logic is easy to get wrong and hard to debug.

@nithinsubbiah
Copy link
Collaborator

I saw internally that you're testing the latest branch to make sure the concurrency problem isn't with the HIP API. Please update your findings and we can go forward with this patch.

@sogartar
Copy link
Contributor Author

@nithinsubbiah here is an issue #16903.

@nithinsubbiah
Copy link
Collaborator

Thanks Boian, I have seen some failures on the cts earlier. I'm looking into them.

@@ -54,6 +54,9 @@ IREE_HAL_HIP_REQUIRED_PFN_DECL(hipGraphExecDestroy, hipGraphExec_t)
IREE_HAL_HIP_REQUIRED_PFN_DECL(hipGraphInstantiate, hipGraphExec_t *,
hipGraph_t, hipGraphNode_t *, char *, size_t)
IREE_HAL_HIP_REQUIRED_PFN_DECL(hipGraphLaunch, hipGraphExec_t, hipStream_t)
IREE_HAL_HIP_REQUIRED_PFN_DECL(hipGraphAddEventRecordNode, hipGraphNode_t *,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sort these APIs alphabetically

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

@@ -77,12 +76,22 @@ void iree_hal_hip_tracing_zone_begin_external_impl(
const char* file_name, size_t file_name_length, uint32_t line,
const char* function_name, size_t function_name_length, const char* name,
size_t name_length);
void iree_hal_hip_graph_tracing_zone_begin_external_impl(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can up update the existing tracing APIs to include "stream" inside to better differentiate?

@@ -249,7 +275,10 @@ static uint16_t iree_hal_hip_tracing_context_insert_query(
void iree_hal_hip_tracing_zone_begin_impl(
iree_hal_hip_tracing_context_t* context, hipStream_t stream,
const iree_tracing_location_t* src_loc) {
if (!context) return;
if (!context) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Use IREE_ASSERT_ARGUMENT

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

@@ -260,14 +289,31 @@ void iree_hal_hip_tracing_zone_begin_external_impl(
const char* file_name, size_t file_name_length, uint32_t line,
const char* function_name, size_t function_name_length, const char* name,
size_t name_length) {
if (!context) return;
if (!context) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same here

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done


void iree_hal_hip_tracing_zone_end_impl(iree_hal_hip_tracing_context_t* context,
hipStream_t stream);
void iree_hal_hip_graph_tracing_zone_end_impl(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same here

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

@@ -11,11 +11,13 @@
#include <stdint.h>

#include "iree/base/api.h"
#include "iree/base/attributes.h"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is included as part of base/api.h so no need to pull in specifically

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

@@ -593,6 +593,10 @@ static iree_status_t iree_hal_hip_pending_queue_actions_issue_execution(
}

// Then launch all command buffers to the dispatch stream.
IREE_TRACE_ZONE_BEGIN(zone_launch_command_buffers);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No need to prefix the zone name with zone_. Maybe just call it "dispatch_command_buffers".

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

iree_hal_hip_graph_command_buffer_cast(base_command_buffer);
IREE_TRACE_ZONE_BEGIN(z0);

IREE_RETURN_AND_END_ZONE_IF_ERROR(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just do a normal iree_status_t status = and return status later.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

&command_buffer->hip_graph_nodes[command_buffer->graph_node_count++];
size_t dependency_count = command_buffer->hip_barrier_node ? 1 : 0;
IREE_ASSERT(dependency_count > 0 &&
"Ending a zone should at least depend on the beginning.");
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Use IREE_ASSERT_GT and for the error message avoid captialize first char and avoid the end .

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

Copy link
Contributor

@antiagainst antiagainst left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just one final nit. Feel free to land after fixing it.

IREE_ASSERT(dependency_count > 0 &&
"Ending a zone should at least depend on the beginning.");
IREE_ASSERT_GT(dependency_count, 0,
"Ending a zone should at least depend on the beginning.");
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: no captialization and no ending period for assert messages: "end .. beginning".

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

@sogartar
Copy link
Contributor Author

@antiagainst would you like to also mirror this in the CUDA driver in this PR?

There are some outstanding concurrency issues where some events don't get
recorded during collection.
If we collect only when the device gets destroyed with a 1s delay,
the concurrency issues seem to not manifest.
@sogartar
Copy link
Contributor Author

I rebased and mirrored the change into the CUDA driver.

@stellaraccident
Copy link
Collaborator

This is great work, Boian. Thank you.

@antiagainst
Copy link
Contributor

I rebased and mirrored the change into the CUDA driver.

Nice, thanks! LGTM. Please feel free to land. Make sure you update the commit message title and body accordingly given now this also covers cuda.

@sogartar sogartar merged commit 1c49d6a into iree-org:main Apr 15, 2024
53 checks passed
sogartar added a commit to sogartar/SHARK-Runtime that referenced this pull request Apr 15, 2024
)

In the HIP and CUDA driver adds tracing with Tracy in graph mode.

There are some outstanding concurrency issues where some events don't
get recorded during collection.
If we collect only when the device gets destroyed with a 1s delay, the
concurrency issues seem to not manifest.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants