Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
New profiler API #48280
New profiler API #48280
Changes from 27 commits
7896a1c
3697dd0
c8c1e71
3a94803
6f604e4
0700b58
bcf4edf
50fb369
9081309
a4405f8
adae39e
8be3509
07d46bb
0fc16e9
d0ae1f2
48915a5
9b2328c
c249cab
d40387e
5b1b598
5c6a45d
d8db2bf
9fdc30e
a46e065
3af4699
8eca63e
4cc04a7
5d3e080
244b6c9
890562e
cc3b96c
1c6a8ba
4c5734f
File filter
Filter by extension
Conversations
Jump to
There are no files selected for viewing
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What does "EnablePred" stand for?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think "enable predicate". It's probably not the best name, maybe "IterationSchedule" or something like that? (I'm bad at names too)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
or ProfilerSchedule
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This sounds like a callback instead of an "output_fn", is there a name involving "callback" that would fit?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Follow-up question, what is the signature of this function?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
an example of initialization will be useful here
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think both Action and State can inherit from enum. See https://docs.python.org/3.5/library/enum.html.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yeah, i was curious if our minimum supported python version supports enums, also the same for dataclasses
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we're minimum at 3.6 now: https://github.com/pytorch/pytorch/blob/master/cmake/Dependencies.cmake#L963 so you're safe. Please update
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Suggest making the arguments to this function kwarg-only so it's more readable (it would also allow for default values)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
And maybe "output_fn" -> "done_cb" or "on_done". Otherwise I'd expect that "output_fn" specifies where to put results
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The code here is very challenging to read and looks almost like a state machine. I guess I was expecting something simpler (like taking the modulus of the current iteration or epoch). Is this additional complexity really necessary?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it is indeed exactly a finite state machine, yes
and we do need to track the states and transitions between them, so a finite state machine seems reasonable
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you elaborate on why that's necessary (vs taking a modulus)?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I can e.g. give an example showing that it's not so simple.
So you take the modulus and you know your current state, is it enough?
You still need to know what was the previous state and what where in the sequence you are.
If the current state is active, and the previous is warmup - action is start_trace
If the current state is active, and the previous is inactive - action is start_warmup, followed by start_trace
If the current state is active, and the previous state is active, does it mean we don't need to do anything?
It depends, consider (wait=0, warmup=0, active=2) - sometimes we don't do anything, sometimes we do stop, save, warmup, start.
So you still have to consider a lot of possibilities and edge cases, then in that case, why not just encode states and actions explicitly in the map? The original version of the code that didn't use it was way less readable imo.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What actually happens when wait=0, warmup=0 and active=2? I would expect the following pattern:
ACTIVE ACTIVE ACTIVE ACTIVE ACTIVE...
Is this not correct?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"wait=0, warmup=0 and active=2" means don't wait, don't do warm up, trace for two iterations
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
then the sequence repeats, and we provide access to the traces once they are available to print/save
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh interesting. So it's more correctly:
ACTIVE ACTIVE REPORT ACTIVE ACTIVE REPORT
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"REPORT" is implied at the every end of enable_pred cycle, will make sure to make it clear from the docs
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I still feel it can be simplified. Can we replace the entire EnablePred with a function that returns one of the few enums of what to do on this iteration:
Then the profiler doesn't even need to depend on EnablePred class. It can be just an arbitrary callable with that maps
step_num
intoaction
. Of course you'd provide the code you have here as the default implementation but the user can do their own. E.g. the simple one would be:enable_pred = lambda num: WARMUP if num < 100 else RECORD
for simple profile that doesn't dump intermediate results.This logic about state machine transition is best moved to Profiler directly
As for
output_fn
- what benefit does it have over just puttingif p.step % 100 == 0: <my code>
directly in the training loop? You can keep it, but I'd add it as a separate arg in addition toenable_pred
so that both can be easily-passable lambdasThere was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good discussion here and above - I agree we should strive to simplify this as much as possible, and then make it as easy to understand as possible via good names and examples.
Warning: I haven't read the entire code yet so I might be missing the point - my comments are about user experience, not so much implementation details.
In my experience, a default behavior is be good for the majority of cases so I'll use that as example: one warmup iteration followed by three recorded iterations.
What frequently changes is when you want to trigger it (should allow simple modulo approach here I think) and what to do with the result. Less frequently, how many iterations to profile and which activities to include.
Given that, the standard use case ought to be something like:
trigger_when: p.step % 1000 == 100
handle_result: write_chrome_trace / print_summary / extract_metrics .... (list of predefined convenience functions)
i.e. trigger profiling at iteration 100, 1100 and so on. Important to note here is that profiling cannot be enabled while at the same time reporting. We need to warmup, record, then stop and process, in order for the overhead to be manageable during the record window.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To clarify the above, this will be the behavior:
1100: Warmup
1101-1102: Record
1103: Record, then stop and process, calling the handle_result function asynchronously when processing is complete. As much work as possible is deferred to the processing stage to minimize overhead during recording. Recording should not continue in parallel with processing for detailed profiling, but we also need to consider a light-weight continuous profiling use case, possibly streaming a trace or ongoing extraction of metrics.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@dzhulgakov suggestion sounds good, basically let's provide a way to define a sequence of states explicitly through lambda + suggested default impl
one caveat - we might then allow users to specify sequences like: warmup none warmup none, etc - so we'll need to check for that, but i guess that's fine - assuming most users use supplied default and the power users know what they are doing;
@gdankel I think both solutions above should basically allow these kind of use cases
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure... As long as users don't have to deal with this level of detail in the majority of cases, it seems reasonable.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can you describe what enable_pred is, and also give an example below of using profiler with
enable_pred
?There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not really sure what a reader is supposed to take away from this statement. What is an "iteration prediction function"? How is it used with
next_step
? Why is that relevant?There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
next_step
and does not explain what the user should do with it (the docs should say that users should callnext_step
once per training loop iteration, and that an event corresponding to the iteration will be recorded in this call)There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wonder whether we should still support
use_cuda
for backward compatibility?There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
prints are bad, use
warnings
moduleThere was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Give a more detailed error message, with an example of how to instantiate and pass
enable_pred
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe this 'self.next_step()' should put before return? Or else the first step info will loss when self.enable_pre==None.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
should you just have a default
enable_pred
which does this behavior?There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
in this case it's an endless tracing (for all iterations), then we should somehow represent it in EnablePred
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
calling state again here leaks details. Can't profile object remember its current state instead?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why is
keep_profiler
true in this case?There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
so that we can also use it after the context manager is finished