Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add profiler #333

Merged
merged 4 commits into from
Jan 27, 2024
Merged

Add profiler #333

merged 4 commits into from
Jan 27, 2024

Conversation

qwe661234
Copy link
Collaborator

Based on our observation, a high percentage of true hotspots involve loops or backward jumps, but the number of IR is unstable within these true hotspots.

Therefore, we believe our profiler can use three indices to detect hotspots:

  1. Backward jump
  2. Loop
  3. Used frequency

Close: #189

Copy link
Contributor

@jserv jserv left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Check #189 (comment) carefully. You should implement some of the following:

  • Allow Linux perf to profile JITted code as QEMU.
  • Generate profiling logs and use report program to view as mono does.
  • Display in tree-based view as vmprof does.

It is important to provide interactive ways to browse profiling data with external tools. See All my favorite tracing tools: eBPF, QEMU, Perfetto, new ones I built and more.

@qwe661234
Copy link
Collaborator Author

Check #189 (comment) carefully. You should implement some of the following:

  • Allow Linux perf to profile JITted code as QEMU.
  • Generate profiling logs and use report program to view as mono does.
  • Display in tree-based view as vmprof does.

It is important to provide interactive ways to browse profiling data with external tools. See All my favorite tracing tools: eBPF, QEMU, Perfetto, new ones I built and more.

This runtime profiler serves as a trigger for the tier-1 JIT compiler. Do you mean we should provide a log mode for users to dump the profiling information?

@jserv
Copy link
Contributor

jserv commented Jan 21, 2024

This runtime profiler serves as a trigger for the tier-1 JIT compiler. Do you mean we should provide a log mode for users to dump the profiling information?

If you are going to close #189, you should consolidate the profiling facilities.

@qwe661234 qwe661234 requested a review from jserv January 21, 2024 12:15
src/riscv.c Outdated Show resolved Hide resolved
src/riscv.h Outdated Show resolved Hide resolved
@qwe661234 qwe661234 force-pushed the add_profiler branch 2 times, most recently from 16adbc9 to 473babf Compare January 22, 2024 14:24
src/emulate.c Outdated Show resolved Hide resolved
src/riscv_private.h Outdated Show resolved Hide resolved
@qwe661234 qwe661234 force-pushed the add_profiler branch 2 times, most recently from 1b0be9d to 49938f0 Compare January 23, 2024 00:41
@qwe661234
Copy link
Collaborator Author

Currently, we only support for providing the information about branch, branch-untaken, and IRs of a basic block. I think we can create another issue for supporting visualizing graph IR.

src/main.c Outdated Show resolved Hide resolved
Copy link
Contributor

@jserv jserv left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. Separate the processes of generating and rendering profiling data. Specifically, rv32emu (activated with the -p option) will be responsible for generating profiling data, while the rendering will be handled by the newly added tool rv_profile.
  2. Ensure that block information, including the instructions (referencing ELF files for disassembly as needed), is captured as part of the profiling data. This approach will make the data more informative and valuable for analysis.
  3. The proposed changes aim to equip the system with adequate capabilities for depicting runtime profiling information, a crucial aspect for ongoing JIT development.

src/main.c Outdated Show resolved Hide resolved
src/cache.h Outdated Show resolved Hide resolved
src/cache.c Show resolved Hide resolved
@jserv
Copy link
Contributor

jserv commented Jan 23, 2024

The rationale behind segregating the generation and rendering of profiling data includes:

  1. Enabling graphical representations without complicating the main rv32emu program.
  2. Facilitating the use of tools like objdump for disassembling specific blocks, exemplified by commands such as objdump --start-address=0x50c40 --stop-address=0x50c60 -d.
  3. Allowing for the profiling of RISC-V executables in a separate, offline run to collect profiling data. This data can then be utilized to assign accurate weights to the predecessors and successors in superoptimizers.

@qwe661234
Copy link
Collaborator Author

The rationale behind segregating the generation and rendering of profiling data includes:

  1. Enabling graphical representations without complicating the main rv32emu program.
  2. Facilitating the use of tools like objdump for disassembling specific blocks, exemplified by commands such as objdump --start-address=0x50c40 --stop-address=0x50c60 -d.
  3. Allowing for the profiling of RISC-V executables in a separate, offline run to collect profiling data. This data can then be utilized to assign accurate weights to the predecessors and successors in superoptimizers.
  1. We can achieve this by writing a python script to parse generated profiling data.
  2. We need another debug mode for this? because we don't have debug mode currently.

@jserv
Copy link
Contributor

jserv commented Jan 23, 2024

  1. We can achieve this by writing a python script to parse generated profiling data.

Yes, you can write a Python-based renderer for profiling data.

  1. We need another debug mode for this? because we don't have debug mode currently.

No, you can specify the start/stop address for objdump -d without any debugging symbols.

@qwe661234
Copy link
Collaborator Author

Facilitating the use of tools like objdump for disassembling specific blocks, exemplified by commands such as objdump --start-address=0x50c40 --stop-address=0x50c60 -d.

Does you means this function supports for providing the instruction sequence of a basic block starting from address start-address without other profiling data?

@jserv
Copy link
Contributor

jserv commented Jan 24, 2024

Does you means this function supports for providing the instruction sequence of a basic block starting from address start-address without other profiling data?

It depends on the format/layout of profiling data recorded during the execution of RISC-V programs. You can simply record the entry/exit address of each block, so that you can consult binutils.

@qwe661234 qwe661234 force-pushed the add_profiler branch 3 times, most recently from 4cb78b9 to ecd2ef6 Compare January 25, 2024 09:25
@qwe661234
Copy link
Collaborator Author

Currently, the profiler script support for:

  1. printing profiling data from start_address to stop_address
  2. visualizing graph IR of the specficed program counter like example below
    image

@qwe661234 qwe661234 requested a review from jserv January 25, 2024 16:02
README.md Outdated Show resolved Hide resolved
Based on our observation, a high percentage of true hotspots involve
loops or backward jumps, but the number of IR is unstable within these
true hotspots.

Therefore, we believe our profiler can use three indices to detect
hotspots:

1. Backward jump
2. Loop
3. Used frequency

Close: sysprog21#189
@jserv jserv merged commit 6dcfd84 into sysprog21:master Jan 27, 2024
7 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

jit: Implement runtime instruction profiling
2 participants