Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement the NVTX3_FUNC_RANGE_IF_IN and NVTX3_FUNC_RANGE_IF macros #28

Closed
AntoineFroger opened this issue Mar 18, 2021 · 8 comments
Closed

Comments

@AntoineFroger
Copy link

The NVTX++ interface currently have the NVTX3_FUNC_RANGE and NVTX3_FUNC_RANGE_IN macros which allow the generation of an NVTX range from the lifetime of the block.

There are scenarios where we only want to conditionally generate NVTX annotations. For example, some developers might want to annotate their libraries but have some kind of verbosity control. In this case, they might want to control whether an annotation is emitted or not dynamically.

One possible implementation would be to add an additional class similar to domain_thread_range which would enable the move constructor. The macro implementation would then be the following:

#define NVTX3_V1_FUNC_RANGE_IF_IN(C, D) \
  ::nvtx3::v1::movable_domain_thread_range<D> const nvtx3_range__{}; \
  if (C) { \
    static ::nvtx3::v1::registered_string<D> const nvtx3_func_name__{__func__}; \
    static ::nvtx3::v1::event_attributes const nvtx3_func_attr__{nvtx3_func_name__}; \
    nvtx3_range__ = ::nvtx3::v1::movable_domain_thread_range<D>{nvtx3_func_attr__}; \
  }

If the user wants the condition to only be evaluated once for the whole duration of the program execution, he can cache the result in a static variable.

The downside of making a class that allows a thread range to be movable is that it can allow misuse of the API. For e.g. a user might create such thread range an move it into a functor which is executed in another thread. If this is problematic, this class could be implemented into the detail namespace and documented to warn the users of those invalid cases.

@AntoineFroger
Copy link
Author

I have some code written to implement those macros. Just need to test it before I submit it for review.

@jrhemstad
Copy link
Contributor

The other thing this would require is for the default ctor of movable_domain_thread_range to emit no range. I think we avoid introducing a whole new range type and avoid making thread_range movable by using an IILE and introducing a tag type in domain_thread_range::no_range with a domain_thread_range::domain_thread_range(domain_thread_range::no_range) ctor that emits no range. Something like this:

#define NVTX3_V1_FUNC_RANGE_IF_IN(C, D) \
  ::nvtx3::v1::movable_domain_thread_range<D> const nvtx3_range__ = [&](){
  if (C) { \
    static ::nvtx3::v1::registered_string<D> const nvtx3_func_name__{__func__}; \
    static ::nvtx3::v1::event_attributes const nvtx3_func_attr__{nvtx3_func_name__}; \
    return ::nvtx3::v1::domain_thread_range<D>{nvtx3_func_attr__}; \
   }  else{
      return ::nvtx3::v1::domain_thread_range<D>{::nvtx3::v1::domain_thread_range::no_range};
   } 
};
}

@jrhemstad
Copy link
Contributor

jrhemstad commented Mar 18, 2021

Hm, on second thought, my IILE idea won't work because it still requires the object to be movable...

Can we just use start_range/end_range here instead?

@AntoineFroger
Copy link
Author

AntoineFroger commented Mar 18, 2021

Hm, on second thought, my IILE idea won't work because it still requires the object to be movable...

If we don't touch domain_thread_range and add the movable_domain_thread_range this might not be an issue. What do you think?

Can we just use start_range/end_range here instead?

I would rather not for three reasons:

  1. Tool might process and visualize push/pop and start/end ranges differently. This is the case for Nsight Systems
  2. Semantically, NVTX3_V1_FUNC_RANGE and NVTX3_V1_FUNC_RANGE_IN represent a thread range, not a process range. So it would be weird to rely on domain_process_range for the IF variant
  3. The overhead for start/end might be slightly higher than for push/pop. This is the case for Nsight Systems

@jrhemstad
Copy link
Contributor

If we don't touch domain_thread_range and add the movable_domain_thread_range this might not be an issue. What do you think?

Adding a whole additional type for this one use case is kind of a lot of code.

@AntoineFroger
Copy link
Author

AntoineFroger commented Mar 18, 2021

Adding a whole additional type for this one use case is kind of a lot of code.

We can create a base domain_thread_range_base class which contains most of the shared code (relevant constructors & deleted operators). The domain_thread_range and movable_domain_thread_range could inherit from it and only implemented the pieces specific to their class (move ctor/operator, destructor)

@jrhemstad
Copy link
Contributor

I still don't love adding a new range type if we can avoid it. Here's how we can do it just by adding a tag type to pass to domain_thread_range that indicates no range should be generated: https://godbolt.org/z/v9MYzc

@jrhemstad
Copy link
Contributor

jrhemstad commented Mar 19, 2021

Also, I'm wondering about if nvtx3.hpp really needs to provide this macro. I'm not doubting if it's useful, I'm just questioning if it really needs to be provided by the library.

There are no doubt numerous variants of this macro some people would find useful, e.g., variants that allow specifying parts of the event_attributes. The lack of overloading on macros makes this nasty as we then have to have different names for all of them. As a rule, I think nvtx3.hpp should provide the bare minimum number of convenience macros to avoid bloat and complexity. The question is, do we think NVTX3_FUNC_RANGE_IF and NVTX3_FUNC_RANGE_IF_IN are essential enough to be considered bare minimum.

Also the naming of NVTX3_FUNC_RANGE_IF_IN is potentially confusing. It sounds like the range would only be generated if it's in something.

AntoineFroger pushed a commit to AntoineFroger/NVTX that referenced this issue Mar 27, 2021
Add the `NVTX3_FUNC_RANGE_IF` and `NVTX3_FUNC_RANGE_IN_IF` macros which
are similar to the `NVTX3_FUNC_RANGE` and `NVTX3_FUNC_RANGE_IN` macros
except that they only generate a range if the boolean expression
passed as parameter evaluates to `true`.

Closes NVIDIA#28
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants