Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add PyPerf, example of profiling Python using BPF #2239

Merged
merged 1 commit into from
Feb 28, 2019

Conversation

palmtenor
Copy link
Member

@palmtenor palmtenor commented Feb 28, 2019

This is a tool attaches BPF program to CPU Perf Events for profiling. The BPF program understands CPython internal data structure and hence able to walk actual Python stack-trace, as oppose to strac-trace of the CPython runtime itself as we would normally get with Linux perf.

To use the tool, just run the PyPerf binary:

  • Use -d / --duration to specify intended profiling duration, in milliseconds. Default value, if not specified, is 1000ms.
  • Use -c / --sample-rate to specify intended profiling sample rate, same as -c argument of Linux perf. Default value, if not specified, is 1e6.

You can also use -v / --verbose to specify logging verbosity 1 or 2 for more detailed information during profiling.

The tool is a prototype at this point is by no mean mature. It currently has follow limitation:

  • It only runs on CPU Cycles event.
  • It only works on Python 3.6 at this point. In fact all Python version from 3.0 to 3.6 should work, I just need to verify and change the constant value. However in Python 3.7 there are some internal data structure changes that the actual parsing logic needs to be updated.
  • It currently hard-codes the Python internal data structure offsets. It would be better to get a dependency of python-devel and get them directly from the header files.
  • The output is pretty horrible. No de-duplication across same stack, and we always output the GIL state, Thread state and output them in raw integer value. I will need to work on prettifying the output and make better sense of the enum values.

Landing it in C++ example for now, once it's mature enough I will move it to tools/.

Feel free to try it out and let me know what do you think!

@yonghong-song
Copy link
Collaborator

I tried the following example,

-bash-4.4$ vi test.py
-bash-4.4$ cat test.py
def mul_helper(x):
  while True:
    x*x

def mul(x):
  mul_helper(x)

mul(10000)
-bash-4.4$ python3 test.py

And running

-bash-4.4$ pwd
/home/yhs/work/bcc/build/examples/cpp/pyperf
-bash-4.4$ sudo ./PyPerf -d 4

I got the following result:

...
GIL State: 1 Thread State: 2 PthreadID Match State: 1

    test.mul_helper
    test.mul
    test.<module>
PID: 771568 TID: 771568 (python3)

What does "GIL State: 1 Thread State: 2 PthreadID Match State: 1" mean?
Do agree that we need to do a better job to present the meaningful info here.

Anyway, this is a good start. It is in examples/. People can try and the tool can evolve.

thanks!

@yonghong-song yonghong-song merged commit ef9d83f into iovisor:master Feb 28, 2019
@palmtenor palmtenor deleted the pyperf branch March 7, 2019 22:38
palexster pushed a commit to palexster/bcc that referenced this pull request Jul 7, 2019
This is a tool attaches BPF program to CPU Perf Events for profiling. The BPF program understands CPython internal data structure and hence able to walk actual Python stack-trace, as oppose to strac-trace of the CPython runtime itself as we would normally get with Linux perf.

To use the tool, just run the PyPerf binary:

    Use -d / --duration to specify intended profiling duration, in milliseconds. Default value, if not specified, is 1000ms.
    Use -c / --sample-rate to specify intended profiling sample rate, same as -c argument of Linux perf. Default value, if not specified, is 1e6.

You can also use -v / --verbose to specify logging verbosity 1 or 2 for more detailed information during profiling.

The tool is a prototype at this point is by no mean mature. It currently has follow limitation:

    It only runs on CPU Cycles event.
    It only works on Python 3.6 at this point. In fact all Python version from 3.0 to 3.6 should work, I just need to verify and change the constant value. However in Python 3.7 there are some internal data structure changes that the actual parsing logic needs to be updated.
    It currently hard-codes the Python internal data structure offsets. It would be better to get a dependency of python-devel and get them directly from the header files.
    The output is pretty horrible. No de-duplication across same stack, and we always output the GIL state, Thread state and output them in raw integer value. I will need to work on prettifying the output and make better sense of the enum values.

Landing it in C++ example for now, once it's mature enough I will move it to tools/.
CrackerCat pushed a commit to CrackerCat/bcc that referenced this pull request Jul 31, 2024
This is a tool attaches BPF program to CPU Perf Events for profiling. The BPF program understands CPython internal data structure and hence able to walk actual Python stack-trace, as oppose to strac-trace of the CPython runtime itself as we would normally get with Linux perf.

To use the tool, just run the PyPerf binary:

    Use -d / --duration to specify intended profiling duration, in milliseconds. Default value, if not specified, is 1000ms.
    Use -c / --sample-rate to specify intended profiling sample rate, same as -c argument of Linux perf. Default value, if not specified, is 1e6.

You can also use -v / --verbose to specify logging verbosity 1 or 2 for more detailed information during profiling.

The tool is a prototype at this point is by no mean mature. It currently has follow limitation:

    It only runs on CPU Cycles event.
    It only works on Python 3.6 at this point. In fact all Python version from 3.0 to 3.6 should work, I just need to verify and change the constant value. However in Python 3.7 there are some internal data structure changes that the actual parsing logic needs to be updated.
    It currently hard-codes the Python internal data structure offsets. It would be better to get a dependency of python-devel and get them directly from the header files.
    The output is pretty horrible. No de-duplication across same stack, and we always output the GIL state, Thread state and output them in raw integer value. I will need to work on prettifying the output and make better sense of the enum values.

Landing it in C++ example for now, once it's mature enough I will move it to tools/.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants