New profiler API #48280

ilia-cher · 2020-11-20T00:00:05Z

Stack from ghstack:

Output stacks (support for SVG visualization) #48438 Output stacks (support for SVG visualization)
New profiler API #48280 New profiler API

Summary:
Adding new API for the kineto profiler that supports enable predicate
function - better support for profiling of training loops (example in test_profiler_kineto_api)

Test Plan:
python test/test_profiler.py -k test_profiler_kineto_api

Differential Revision: D25142220

Summary: Adding new API for the kineto profiler that supports enable predicate function Test Plan: unit test [ghstack-poisoned]

dr-ci · 2020-11-20T00:13:06Z

💊 CI failures summary and remediations

As of commit 4c5734f (more details on the Dr. CI page):

2/2 failures possibly* introduced in this PR
- 1/2 non-CircleCI failure(s)

🕵️ 1 new failure recognized by patterns

The following CI failures do not appear to be due to upstream breakages:

pytorch_linux_bionic_py3_8_gcc9_coverage_test2 (1/1)

Step: "Run tests" (full log | diagnosis details | 🔁 rerun)

Dec 18 17:18:26 [E request_callback_no_python.cpp:636] Received error while processing request type 258: RuntimeError: Can not pickle torch.futures.Future

Dec 18 17:18:26 At:
Dec 18 17:18:26   /opt/conda/lib/python3.8/site-packages/torch/distributed/rpc/internal.py(120): serialize
Dec 18 17:18:26   /opt/conda/lib/python3.8/site-packages/torch/distributed/rpc/internal.py(172): serialize
Dec 18 17:18:26 
Dec 18 17:18:26 [E request_callback_no_python.cpp:636] Received error while processing request type 258: RuntimeError: Can not pickle torch.futures.Future
Dec 18 17:18:26 
Dec 18 17:18:26 At:
Dec 18 17:18:26   /opt/conda/lib/python3.8/site-packages/torch/distributed/rpc/internal.py(120): serialize
Dec 18 17:18:26   /opt/conda/lib/python3.8/site-packages/torch/distributed/rpc/internal.py(172): serialize
Dec 18 17:18:26 
Dec 18 17:18:26 [E request_callback_no_python.cpp:636] Received error while processing request type 258: RuntimeError: Can not pickle torch.futures.Future
Dec 18 17:18:26 
Dec 18 17:18:26 At:
Dec 18 17:18:26   /opt/conda/lib/python3.8/site-packages/torch/distributed/rpc/internal.py(120): serialize
Dec 18 17:18:26   /opt/conda/lib/python3.8/site-packages/torch/distributed/rpc/internal.py(172): serialize
Dec 18 17:18:26 
Dec 18 17:18:26 [W tensorpipe_agent.cpp:547] RPC agent for worker0 encountered error when reading incoming request from worker3: EOF: end of file (this is expected to happen during shutdown)
Dec 18 17:18:26 [W tensorpipe_agent.cpp:547] RPC agent for worker0 encountered error when reading incoming request from worker2: EOF: end of file (this is expected to happen during shutdown)
Dec 18 17:18:27 ok (2.660s)
Dec 18 17:18:29   test_return_future_remote (__main__.TensorPipeRpcTestWithSpawn) ... [W tensorpipe_agent.cpp:547] RPC agent for worker1 encountered error when reading incoming request from worker0: EOF: end of file (this is expected to happen during shutdown)
Dec 18 17:18:30 ok (2.857s)

1 job timed out:

pytorch_linux_bionic_py3_8_gcc9_coverage_test2

This comment was automatically generated by Dr. CI (expand for details).

Follow this link to opt-out of these comments for your Pull Requests.

Please report bugs/suggestions to the (internal) Dr. CI Users group.

This comment has been revised 151 times.

Summary: Adding new API for the kineto profiler that supports enable predicate function Test Plan: unit test [ghstack-poisoned]

Summary: Adding new API for the kineto profiler that supports enable predicate function Test Plan: unit test ghstack-source-id: 5ecf6a8ba4c154af3fb2d8be3259812317ee53de Pull Request resolved: #48280

Summary: Adding new API for the kineto profiler that supports enable predicate function Test Plan: unit test [ghstack-poisoned]

ilia-cher · 2020-11-20T12:08:43Z

cc @gdankel for review too

Summary: Adding new API for the kineto profiler that supports enable predicate function Test Plan: unit test [ghstack-poisoned]

Summary: Adding new API for the kineto profiler that supports enable predicate function Test Plan: unit test ghstack-source-id: b122814f3cc69edf3eba33db751f11fda4f48674 Pull Request resolved: #48280

Summary: Adding new API for the kineto profiler that supports enable predicate function Test Plan: unit test [ghstack-poisoned]

Summary: Adding new API for the kineto profiler that supports enable predicate function Test Plan: unit test ghstack-source-id: 64785f82e3437450f82c1a1dbd42f585c6181696 Pull Request resolved: #48280

Summary: Adding new API for the kineto profiler that supports enable predicate function Test Plan: unit test [ghstack-poisoned]

Summary: Adding new API for the kineto profiler that supports enable predicate function Test Plan: unit test ghstack-source-id: d652f642ab26c20ad3d0a956a6aca69ba707538e Pull Request resolved: #48280

Summary: Adding new API for the kineto profiler that supports enable predicate function Test Plan: unit test [ghstack-poisoned]

mruberry · 2020-12-02T09:11:28Z

@gchanan @mruberry yes if you have thoughts on interface improvement (clearer/simpler) then let us know.

Happy to take a look. I was only able to give this a quick glance for now but the intent makes sense. I'll dedicate the time for a proper review tomorrow (Wednesday).

mruberry · 2020-12-03T00:14:28Z

torch/profiler/profiler.py

+        WARMUP = 1
+        ACTIVE = 2
+
+    def __init__(self, wait: int, warmup: int, active: int, output_fn: Optional[Callable[[prof.profile], None]]):


Suggest making the arguments to this function kwarg-only so it's more readable (it would also allow for default values)

And maybe "output_fn" -> "done_cb" or "on_done". Otherwise I'd expect that "output_fn" specifies where to put results

mruberry · 2020-12-03T00:20:00Z

torch/profiler/profiler.py

+    - profiler then starts actively tracing/collecting stats for the next 'active' steps
+    - after this, profiler returns to the inactive state and cycle repeats
+
+    In case output_fn is specified, it is called every time the trace is ready


This sounds like a callback instead of an "output_fn", is there a name involving "callback" that would fit?

Follow-up question, what is the signature of this function?

mruberry · 2020-12-03T00:25:53Z

torch/profiler/profiler.py

+
+    Arguments:
+        activities - list of activity groups (CPU, CUDA)
+        enable_pred (optional) - iteration predicate function, used together with `next_step` call


I'm not really sure what a reader is supposed to take away from this statement. What is an "iteration prediction function"? How is it used with next_step? Why is that relevant?

torch/profiler/profiler.py

mruberry · 2020-12-03T00:28:55Z

torch/profiler/profiler.py

+        self.active = active
+        self.output_fn = output_fn
+
+        def active_active_fn(step):


The code here is very challenging to read and looks almost like a state machine. I guess I was expecting something simpler (like taking the modulus of the current iteration or epoch). Is this additional complexity really necessary?

it is indeed exactly a finite state machine, yes
and we do need to track the states and transitions between them, so a finite state machine seems reasonable

Can you elaborate on why that's necessary (vs taking a modulus)?

I can e.g. give an example showing that it's not so simple.
So you take the modulus and you know your current state, is it enough?
You still need to know what was the previous state and what where in the sequence you are.
If the current state is active, and the previous is warmup - action is start_trace
If the current state is active, and the previous is inactive - action is start_warmup, followed by start_trace
If the current state is active, and the previous state is active, does it mean we don't need to do anything?
It depends, consider (wait=0, warmup=0, active=2) - sometimes we don't do anything, sometimes we do stop, save, warmup, start.

So you still have to consider a lot of possibilities and edge cases, then in that case, why not just encode states and actions explicitly in the map? The original version of the code that didn't use it was way less readable imo.

What actually happens when wait=0, warmup=0 and active=2? I would expect the following pattern:

ACTIVE ACTIVE ACTIVE ACTIVE ACTIVE...

Is this not correct?

"wait=0, warmup=0 and active=2" means don't wait, don't do warm up, trace for two iterations

then the sequence repeats, and we provide access to the traces once they are available to print/save

Oh interesting. So it's more correctly:

ACTIVE ACTIVE REPORT ACTIVE ACTIVE REPORT

?

"REPORT" is implied at the every end of enable_pred cycle, will make sure to make it clear from the docs

torch/profiler/profiler.py

ngimel · 2020-12-03T00:17:48Z

torch/profiler/profiler.py

+
+    Arguments:
+        activities - list of activity groups (CPU, CUDA)
+        enable_pred (optional) - iteration predicate function, used together with `next_step` call


can you describe what enable_pred is, and also give an example below of using profiler with enable_pred?

ngimel · 2020-12-03T00:20:33Z

torch/profiler/profiler.py

+    - profiler then starts actively tracing/collecting stats for the next 'active' steps
+    - after this, profiler returns to the inactive state and cycle repeats
+
+    In case output_fn is specified, it is called every time the trace is ready


an example of initialization will be useful here

ngimel · 2020-12-03T00:26:08Z

torch/profiler/profiler.py

+     - enable_pred is used for training loop tracing, allowing users to enable profiler on certain
+       iterations and account for the warmup
+     - when enable_pred is not set, profiler is always active
+     - next_step uses record_function api to add information about steps in the trace


The documentation does not explain what is next_step and does not explain what the user should do with it (the docs should say that users should call next_step once per training loop iteration, and that an event corresponding to the iteration will be recorded in this call)

There should be a link to record_function documentation here, otherwise record_function api is not clear

ngimel · 2020-12-03T00:27:42Z

torch/profiler/profiler.py

+
+        if not self.enable_pred:
+            print("Warning: using profiler without enable predicate may result in the skewed " +
+                "results, use enable_pred to control the warmup time")


Give a more detailed error message, with an example of how to instantiate and pass enable_pred

guyang3532 · 2020-12-10T12:00:08Z

torch/profiler/profiler.py

+                "results, use enable_pred to control the warmup time")
+
+    def __enter__(self):
+        self.next_step()


Maybe this 'self.next_step()' should put before return? Or else the first step info will loss when self.enable_pre==None.

dzhulgakov

Looks good, but I think it can be simplified a bit and other comments on docstrings need to be addressed

test/test_profiler.py

dzhulgakov · 2020-12-15T03:57:45Z

torch/profiler/profiler.py

+
+class EnablePred(object):
+    """
+    EnablePred describes on which steps profiler is active:


I think "enable predicate". It's probably not the best name, maybe "IterationSchedule" or something like that? (I'm bad at names too)

dzhulgakov · 2020-12-15T03:58:22Z

torch/profiler/profiler.py

+
+    In case output_fn is specified, it is called every time the trace is ready
+    """
+    class Action(object):


we're minimum at 3.6 now: https://github.com/pytorch/pytorch/blob/master/cmake/Dependencies.cmake#L963 so you're safe. Please update

dzhulgakov · 2020-12-15T03:59:01Z

torch/profiler/profiler.py

+        WARMUP = 1
+        ACTIVE = 2
+
+    def __init__(self, wait: int, warmup: int, active: int, output_fn: Optional[Callable[[prof.profile], None]]):


And maybe "output_fn" -> "done_cb" or "on_done". Otherwise I'd expect that "output_fn" specifies where to put results

dzhulgakov · 2020-12-15T04:03:46Z

torch/profiler/profiler.py

+
+    def __enter__(self):
+        self.next_step()
+        if not self.enable_pred:


should you just have a default enable_pred which does this behavior?

in this case it's an endless tracing (for all iterations), then we should somehow represent it in EnablePred

dzhulgakov · 2020-12-15T04:03:56Z

torch/profiler/profiler.py

+        self.step_rec_fn: Optional[prof.record_function] = None
+
+        if not self.enable_pred:
+            print("Warning: using profiler without enable predicate may result in the skewed " +


prints are bad, use warnings module

dzhulgakov · 2020-12-15T04:15:05Z

torch/profiler/profiler.py

+        def inactive_warmup_fn(_):
+            raise RuntimeError("Incorrect profiler state sequence")
+
+        self.actions_map = {


I still feel it can be simplified. Can we replace the entire EnablePred with a function that returns one of the few enums of what to do on this iteration:

NONE

WARMUP (i.e. enable but not record)

RECORD (i.e. enabled)

RECORD_AND_FLUSH (i.e. enabled but dump a new iteration)

Then the profiler doesn't even need to depend on EnablePred class. It can be just an arbitrary callable with that maps step_num into action. Of course you'd provide the code you have here as the default implementation but the user can do their own. E.g. the simple one would be: enable_pred = lambda num: WARMUP if num < 100 else RECORD for simple profile that doesn't dump intermediate results.

This logic about state machine transition is best moved to Profiler directly

As for output_fn - what benefit does it have over just putting if p.step % 100 == 0: <my code> directly in the training loop? You can keep it, but I'd add it as a separate arg in addition to enable_pred so that both can be easily-passable lambdas

Good discussion here and above - I agree we should strive to simplify this as much as possible, and then make it as easy to understand as possible via good names and examples.

Warning: I haven't read the entire code yet so I might be missing the point - my comments are about user experience, not so much implementation details.

In my experience, a default behavior is be good for the majority of cases so I'll use that as example: one warmup iteration followed by three recorded iterations.
What frequently changes is when you want to trigger it (should allow simple modulo approach here I think) and what to do with the result. Less frequently, how many iterations to profile and which activities to include.

Given that, the standard use case ought to be something like:
trigger_when: p.step % 1000 == 100
handle_result: write_chrome_trace / print_summary / extract_metrics .... (list of predefined convenience functions)

i.e. trigger profiling at iteration 100, 1100 and so on. Important to note here is that profiling cannot be enabled while at the same time reporting. We need to warmup, record, then stop and process, in order for the overhead to be manageable during the record window.

To clarify the above, this will be the behavior:
1100: Warmup
1101-1102: Record
1103: Record, then stop and process, calling the handle_result function asynchronously when processing is complete. As much work as possible is deferred to the processing stage to minimize overhead during recording. Recording should not continue in parallel with processing for detailed profiling, but we also need to consider a light-weight continuous profiling use case, possibly streaming a trace or ongoing extraction of metrics.

@dzhulgakov suggestion sounds good, basically let's provide a way to define a sequence of states explicitly through lambda + suggested default impl

one caveat - we might then allow users to specify sequences like: warmup none warmup none, etc - so we'll need to check for that, but i guess that's fine - assuming most users use supplied default and the power users know what they are doing;

@gdankel I think both solutions above should basically allow these kind of use cases

Sure... As long as users don't have to deal with this level of detail in the majority of cases, it seems reasonable.

dzhulgakov · 2020-12-15T04:18:40Z

torch/profiler/profiler.py

+            self.step_rec_fn.__exit__(None, None, None)
+        if self.profiler:
+            if self.enable_pred:
+                if self.enable_pred._num_state(self.step_num) == EnablePred.State.WARMUP:


calling state again here leaks details. Can't profile object remember its current state instead?

dzhulgakov · 2020-12-15T04:19:56Z

torch/profiler/profiler.py

+            if self.enable_pred:
+                if self.enable_pred._num_state(self.step_num) == EnablePred.State.WARMUP:
+                    self._run_action(EnablePred.Action.START_TRACE)
+            self._run_action(EnablePred.Action.STOP_TRACE, keep_profiler=True)


why is keep_profiler true in this case?

so that we can also use it after the context manager is finished

dzhulgakov · 2020-12-15T04:21:13Z

torch/profiler/profiler.py

+            enable_pred: Optional[EnablePred] = None,
+            record_shapes=False,
+            profile_memory=False,
+            with_stack=False):


I wonder whether we should still support use_cuda for backward compatibility?

Summary: Adding new API for the kineto profiler that supports enable predicate function - better support for profiling of training loops (example in test_profiler_kineto_api) Test Plan: python test/test_profiler.py -k test_profiler_kineto_api Differential Revision: [D25142220](https://our.internmc.facebook.com/intern/diff/D25142220) [ghstack-poisoned]

ilia-cher · 2020-12-16T10:09:23Z

Addressed comments about API and docs cc. @dzhulgakov @ngimel @gdankel

Summary: Adding new API for the kineto profiler that supports enable predicate function - better support for profiling of training loops (example in test_profiler_kineto_api) Test Plan: python test/test_profiler.py -k test_profiler_kineto_api Differential Revision: [D25142220](https://our.internmc.facebook.com/intern/diff/D25142220) [ghstack-poisoned]

ngimel · 2020-12-17T22:39:00Z

torch/profiler/profiler.py

+
+    Arguments:
+        activities - list of activity groups (CPU, CUDA);
+        schedule - a callable takes step (int) as a single parameter and returns


Do we really want to allow people to pass callable here, instead of wait, warmup, active ints? Seems like writing a valid schedule would be hard and no one would do it?

will point to the user to use schedule()

ngimel · 2020-12-17T22:41:04Z

torch/profiler/profiler.py

+        schedule - a callable takes step (int) as a single parameter and returns
+          ProfilerAction value - specifies the profiler action on each step;
+        on_trace_ready (optional) - callable, called each time the trace is ready
+          during the profiling;


trace is ready is not defined. "when trace has been collected for the specified number of steps"?

Summary: Adding new API for the kineto profiler that supports enable predicate function - better support for profiling of training loops (example in test_profiler_kineto_api) Test Plan: python test/test_profiler.py -k test_profiler_kineto_api Differential Revision: [D25142220](https://our.internmc.facebook.com/intern/diff/D25142220) [ghstack-poisoned]

ilia-cher · 2020-12-18T14:20:16Z

website doc:
profiler.html.zip

Summary: Adding new API for the kineto profiler that supports enable predicate function - better support for profiling of training loops (example in test_profiler_kineto_api) Test Plan: python test/test_profiler.py -k test_profiler_kineto_api Differential Revision: [D25142220](https://our.internmc.facebook.com/intern/diff/D25142220) [ghstack-poisoned]

dzhulgakov

Looks great, just the messages can be improved in one of the follow up PRs

dzhulgakov · 2020-12-18T20:47:03Z

torch/profiler/profiler.py

+                warn("Incorrect schedule: RECORD followed by NONE")
+                self._stop_trace()
+            else:
+                assert prev_action == ProfilerAction.RECORD_AND_SAVE


you shouldn't use assert for user-induced errors. Change it for raise and better error message

dzhulgakov · 2020-12-18T20:47:16Z

torch/profiler/profiler.py

+            elif prev_action == ProfilerAction.RECORD:
+                pass
+            else:
+                assert prev_action == ProfilerAction.RECORD_AND_SAVE


facebook-github-bot · 2020-12-18T21:23:43Z

@ilia-cher merged this pull request in daaf932.

facebook-github-bot · 2020-12-18T21:23:51Z

@ilia-cher merged this pull request in daaf932.

Summary: Pull Request resolved: pytorch#48280 Adding new API for the kineto profiler that supports enable predicate function Test Plan: unit test Reviewed By: ngimel Differential Revision: D25142220 Pulled By: ilia-cher fbshipit-source-id: c57fa42855895075328733d7379eaf3dc1743d14

New profiler API

7896a1c

Summary: Adding new API for the kineto profiler that supports enable predicate function Test Plan: unit test [ghstack-poisoned]

ilia-cher requested review from albanD and apaszke as code owners November 20, 2020 00:00

This was referenced Nov 20, 2020

Add Kineto submodule #45887

Closed

Add USE_KINETO build option #45888

Closed

Use libkineto in profiler #46470

Closed

facebook-github-bot added the cla signed label Nov 20, 2020

Update on "New profiler API"

3697dd0

Summary: Adding new API for the kineto profiler that supports enable predicate function Test Plan: unit test [ghstack-poisoned]

Update on "New profiler API"

c8c1e71

Summary: Adding new API for the kineto profiler that supports enable predicate function Test Plan: unit test [ghstack-poisoned]

ilia-cher requested review from ngimel and robieta November 20, 2020 12:06

Update on "New profiler API"

3a94803

Summary: Adding new API for the kineto profiler that supports enable predicate function Test Plan: unit test [ghstack-poisoned]

Update on "New profiler API"

6f604e4

Summary: Adding new API for the kineto profiler that supports enable predicate function Test Plan: unit test [ghstack-poisoned]

ilia-cher requested a review from dzhulgakov November 20, 2020 13:09

Update on "New profiler API"

0700b58

Summary: Adding new API for the kineto profiler that supports enable predicate function Test Plan: unit test [ghstack-poisoned]

Update on "New profiler API"

bcf4edf

Summary: Adding new API for the kineto profiler that supports enable predicate function Test Plan: unit test [ghstack-poisoned]

Update on "New profiler API"

50fb369

Summary: Adding new API for the kineto profiler that supports enable predicate function Test Plan: unit test [ghstack-poisoned]

Update on "New profiler API"

9081309

Summary: Adding new API for the kineto profiler that supports enable predicate function Test Plan: unit test [ghstack-poisoned]

Update on "New profiler API"

a4405f8

Summary: Adding new API for the kineto profiler that supports enable predicate function Test Plan: unit test [ghstack-poisoned]

mruberry reviewed Dec 3, 2020

View reviewed changes

torch/profiler/profiler.py Outdated Show resolved Hide resolved

mruberry reviewed Dec 3, 2020

View reviewed changes

torch/profiler/profiler.py Show resolved Hide resolved

ngimel reviewed Dec 3, 2020

View reviewed changes

guyang3532 reviewed Dec 11, 2020

View reviewed changes

dzhulgakov requested changes Dec 15, 2020

View reviewed changes

guyang3532 mentioned this pull request Dec 15, 2020

Add export_to_tensorboard for auto_grad/profiler #49231

Closed

ilia-cher requested review from ngimel and dzhulgakov December 16, 2020 10:08

ilia-cher added 2 commits December 16, 2020 15:20

ngimel approved these changes Dec 17, 2020

View reviewed changes

ilia-cher added 2 commits December 18, 2020 02:15

facebook-github-bot closed this in daaf932 Dec 18, 2020

dzhulgakov reviewed Dec 18, 2020

View reviewed changes

facebook-github-bot added the Merged label Dec 18, 2020

facebook-github-bot deleted the gh/ilia-cher/82/head branch December 22, 2020 15:17

New profiler API #48280

New profiler API #48280

Conversation

ilia-cher commented Nov 20, 2020 • edited

dr-ci bot commented Nov 20, 2020 • edited by facebook-github-bot

💊 CI failures summary and remediations

🕵️ 1 new failure recognized by patterns

pytorch_linux_bionic_py3_8_gcc9_coverage_test2 (1/1)

ilia-cher commented Nov 20, 2020

mruberry commented Dec 2, 2020

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

ilia-cher Dec 3, 2020 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

ilia-cher Dec 3, 2020 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

dzhulgakov left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

ilia-cher commented Dec 16, 2020

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

ilia-cher commented Dec 18, 2020

dzhulgakov left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

facebook-github-bot commented Dec 18, 2020

facebook-github-bot commented Dec 18, 2020

ilia-cher commented Nov 20, 2020 •

edited

dr-ci bot commented Nov 20, 2020 •

edited by facebook-github-bot

ilia-cher Dec 3, 2020 •

edited

ilia-cher Dec 3, 2020 •

edited