Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allow running with sampling if Intel PT is unavailable #224

Merged
merged 3 commits into from
Jun 24, 2022

Conversation

Lamoreauxaj
Copy link
Contributor

@Lamoreauxaj Lamoreauxaj commented Jun 15, 2022

The goal of this PR is to allow magic-trace to run within an environment which is not on a physical Intel machine. This adds the ability to (if Intel PT is unavailable) to still run magic-trace and have it use perf with sampling instead.

There is now a flag -sampling which can enable this and if Intel PT is not present, then this happens automatically. The interface is mostly the same, however now -timer-resolution can specify as Sample { freq } and -snapshot-size is ignored (with warning) if passed. The traces will default to 512K it seems (approximately ~30ms it seems with max sampling frequency).

Some things to note:

  • I added another Event.t type which is Sample and refactored this type to share information with other existing events types. There are processed in trace_writer.ml in a similar way to before, but just ret / call based on the current state of the callstacks.
  • Perf gives multiple options for reconstructing the callgraphs which are fp, dwarf, and lbr. LBR is only supported on Intel and we check for support and use it if possible. Otherwise it defaults to fp which will work if the binary and associated libraries were compiled with -fno-omit-frame-pointer. Dwarf tends to lead to massive perf.data files and slow decoding, so this was the choice I made.
  • I took out Perf_line from perf_tool_backend.ml and there is now a perf_decode.ml to refactor things a little bit. There is also decoding support which works on the sampling output from perf.
  • In order to take a snapshot, I use --switch-output=signal however this will have perf outputting multiple timestamped perf.data files. Thus decoding now can handle multiple files and return multiple pipes. Additionally this makes it easier to reset trace writing state between each snapshot which ends up fixing -multi-snapshot leads to staircase traces #213 as a result.

@lpw25
Copy link
Member

lpw25 commented Jun 16, 2022

I'm not sure it should fallback automatically. I expect people will often fail to notice they are using the wrong machine if it does that.

@Lamoreauxaj
Copy link
Contributor Author

Yeah that's fair. It does technically warn the user at least though. However if you think it would be better to exit instead, I can change this behavior.

@lpw25
Copy link
Member

lpw25 commented Jun 16, 2022

Maybe a warning is enough, I'm not sure. I'll defer to @Xyene and @cgaebel's judgement.

let process_event index ev =
(* When processing a new snapshot, clear all [Trace_writer] data in order to
avoid sharing callstacks, start times, etc. *)
if index > 0 && not (index = !last_index)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@pbaumbacher, If you do take a look, here is the logic where I reset the Trace_writer state after the end of a single snapshot (given one snapshot per file).

Copy link
Member

@Xyene Xyene left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking good.

I think it'd also be worth it to add a larger test under tests/. Maybe a C hello-world, or similar.

core/backend_intf.ml Outdated Show resolved Hide resolved
core/event.mli Outdated
} (** Represents an event collected from Intel PT. *)
| Sample of { callstack : Location.t list }
(** Represents event collected through sampling. *)
| Power of { freq : int } (** Power event collected by Intel PT. *)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: sort these grouped by sample type, i.e. have Power next to Trace.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could also nest by perf mode, along the lines of Pt (Trace _) | Pt (Power _) | Sample _ since any given pipe will consist only of (Trace and Power) or Sample

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That does seem like a reasonable option. One thought though, is what if we supported power events which aren't part of Intel PT as well and used the same Power event? (it seems like you can get the same information through other perf events). And another thought, although we could group them under Pt, they aren't really being processed or handled together which makes me lean towards leaving them as is since it doesn't seem useful to pattern match only on Pt events.

src/perf_capabilities.ml Outdated Show resolved Hide resolved
src/perf_decode.ml Outdated Show resolved Hide resolved
src/perf_tool_backend.ml Outdated Show resolved Hide resolved
src/perf_tool_backend.ml Outdated Show resolved Hide resolved
src/trace.ml Outdated Show resolved Hide resolved
src/trace.ml Outdated Show resolved Hide resolved
src/trace_writer.ml Outdated Show resolved Hide resolved
test/perf_script.ml Outdated Show resolved Hide resolved
core/backend_intf.ml Outdated Show resolved Hide resolved
Signed-off-by: Aaron Lamoreaux <alamoreaux@janestreet.com>
Signed-off-by: Aaron Lamoreaux <alamoreaux@janestreet.com>
@Lamoreauxaj Lamoreauxaj force-pushed the sampling_backend branch 5 times, most recently from 7b6c9ea to af6b39a Compare June 22, 2022 14:36
core/collection_mode.ml Outdated Show resolved Hide resolved
core/collection_mode.ml Outdated Show resolved Hide resolved
src/perf_tool_backend.ml Outdated Show resolved Hide resolved
src/perf_tool_backend.ml Outdated Show resolved Hide resolved
src/callgraph_mode.ml Outdated Show resolved Hide resolved
src/perf_tool_backend.ml Outdated Show resolved Hide resolved
src/callgraph_mode.ml Outdated Show resolved Hide resolved
@Lamoreauxaj Lamoreauxaj force-pushed the sampling_backend branch 2 times, most recently from 2af1d16 to 2515dcb Compare June 22, 2022 17:16
src/callgraph_mode.ml Outdated Show resolved Hide resolved
src/trace_writer.ml Outdated Show resolved Hide resolved
…ble.

Signed-off-by: Aaron Lamoreaux <alamoreaux@janestreet.com>
Copy link
Member

@Xyene Xyene left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thanks!

@Xyene Xyene merged commit e614913 into janestreet:master Jun 24, 2022
Lamoreauxaj added a commit to Lamoreauxaj/magic-trace that referenced this pull request Jun 29, 2022
This feature has two changes related to `callgraph_mode`.

* Based on user feedback from janestreet#224, I added warnings when the `-callgraph-mode`
  defaults to either `Dwarf` or `Last_branch_record` to notify the user of
  potential drawbacks from such selection. These warnings only show if they do
  not explicitly set the flag.

* Added a fourth option to run with `Last_branch_record { stitched = true }`
  which calls `perf report --call-graph lbr` and `perf script --stitch-lbr`.
  There are a few design decisions here though. First I decided to add it as a
  variant of `Callgraph_mode` for simplicity of interface for the user. However
  this has the awkward effect of this now being a parameter needed for both
  decoding and recording (which `Backend_intf` doesn't support). I would think
  this could perhaps be useful in other scenarios and so instead of pulling this
  out to `trace.ml` I decided to add a way for `decode_events` to have access to
  a `Recording.Data.t` record. Open to alternative ideas though.

I also added a wiki page to document the options for `-callgraph-mode`. This is
not in the PR, but here is the link to the wiki repo:
https://github.com/Lamoreauxaj/magic-trace.wiki.git. See [this
page](https://github.com/Lamoreauxaj/magic-trace/wiki/Sampling-callgraph-modes).
If this PR is merged, this wiki page would need to be merged too (I referenced
it in the cli help documentation). And since I referenced it, the domain would
need to redirect to the wiki page.

Signed-off-by: Aaron Lamoreaux <alamoreaux@janestreet.com>
Xyene pushed a commit that referenced this pull request Jun 30, 2022
This feature has two changes related to `callgraph_mode`.

* Based on user feedback from #224, I added warnings when the `-callgraph-mode`
  defaults to either `Dwarf` or `Last_branch_record` to notify the user of
  potential drawbacks from such selection. These warnings only show if they do
  not explicitly set the flag.

* Added a fourth option to run with `Last_branch_record { stitched = true }`
  which calls `perf report --call-graph lbr` and `perf script --stitch-lbr`.
  There are a few design decisions here though. First I decided to add it as a
  variant of `Callgraph_mode` for simplicity of interface for the user. However
  this has the awkward effect of this now being a parameter needed for both
  decoding and recording (which `Backend_intf` doesn't support). I would think
  this could perhaps be useful in other scenarios and so instead of pulling this
  out to `trace.ml` I decided to add a way for `decode_events` to have access to
  a `Recording.Data.t` record. Open to alternative ideas though.

I also added a wiki page to document the options for `-callgraph-mode`. This is
not in the PR, but here is the link to the wiki repo:
https://github.com/Lamoreauxaj/magic-trace.wiki.git. See [this
page](https://github.com/Lamoreauxaj/magic-trace/wiki/Sampling-callgraph-modes).
If this PR is merged, this wiki page would need to be merged too (I referenced
it in the cli help documentation). And since I referenced it, the domain would
need to redirect to the wiki page.

Signed-off-by: Aaron Lamoreaux <alamoreaux@janestreet.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants