Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve language detection performance #2823

Merged
merged 4 commits into from
Dec 11, 2023

Conversation

aleks-p
Copy link
Contributor

@aleks-p aleks-p commented Dec 8, 2023

Closes #2777

  1. If the profile has the pyroscope_spy label, we use it to determine the language
  2. If we do need to look into the string table, we use prefix and suffix matching instead of regular expressions

Here is a summary of the gains with 2:

Regexp:

go.cpu.labels.pprof-10         	  330040	     10853 ns/op
heap-10                        	  195819	     18407 ns/op
dotnet.labels.pprof-10         	  143966	     24949 ns/op
profile_java-10                	   61840	     58016 ns/op
profile_nodejs-10              	  109437	     32871 ns/op
profile_python-10              	  391418	      9156 ns/op
profile_ruby-10                	  307896	     11783 ns/op
profile_rust-10                	  636633	      5668 ns/op

String matching (prefix or suffix):

go.cpu.labels.pprof-10         	 3418387	      1060 ns/op
heap-10                        	 2446440	      1483 ns/op
dotnet.labels.pprof-10         	 1751536	      2057 ns/op
profile_java-10                	 2333244	      1541 ns/op
profile_nodejs-10              	 1722144	      2083 ns/op
profile_python-10              	 4395427	       821 ns/op
profile_ruby-10                	 4546932	       793 ns/op
profile_rust-10                	 2960469	      1221 ns/op

benchstat:

goos: darwin
goarch: arm64
pkg: github.com/grafana/pyroscope/pkg/pprof
                                                    │    old.txt    │               new.txt               │
                                                    │    sec/op     │   sec/op     vs base                │
_GetProfileLanguage/testdata/go.cpu.labels.pprof-10    10.589µ ± 0%   1.016µ ± 0%  -90.40% (p=0.000 n=10)
_GetProfileLanguage/testdata/heap-10                   18.034µ ± 1%   1.411µ ± 0%  -92.18% (p=0.000 n=10)
_GetProfileLanguage/testdata/dotnet.labels.pprof-10    24.765µ ± 1%   1.981µ ± 1%  -92.00% (p=0.000 n=10)
_GetProfileLanguage/testdata/profile_java-10           57.873µ ± 1%   1.473µ ± 0%  -97.46% (p=0.000 n=10)
_GetProfileLanguage/testdata/profile_nodejs-10         32.770µ ± 0%   2.010µ ± 1%  -93.87% (p=0.000 n=10)
_GetProfileLanguage/testdata/profile_python-10         9125.0n ± 1%   790.8n ± 0%  -91.33% (p=0.000 n=10)
_GetProfileLanguage/testdata/profile_ruby-10          11754.5n ± 1%   772.7n ± 1%  -93.43% (p=0.000 n=10)
_GetProfileLanguage/testdata/profile_rust-10            5.679µ ± 1%   1.184µ ± 0%  -79.16% (p=0.000 n=10)
geomean                                                 16.49µ        1.253µ       -92.40%

As before, there is a chance of false detection - though this is only used for statistics for now.

@aleks-p aleks-p requested a review from a team as a code owner December 8, 2023 20:04
@aleks-p aleks-p self-assigned this Dec 8, 2023
pkg/model/labels.go Show resolved Hide resolved
@aleks-p aleks-p merged commit a338417 into main Dec 11, 2023
19 checks passed
@aleks-p aleks-p deleted the fix/improve-lang-detection-performance branch December 11, 2023 13:59
@korniltsev
Copy link
Collaborator

There may or may be not a new spy label from grafana agent pyroscope.java component

https://github.com/grafana/agent/pull/5985/files#diff-a0c3d4195b76376947e1f9e0059dc3538e97c3fb10ba70d9000ea74b902c1f8dR244

Let me know if you have any preference for its value or its presence

@aleks-p
Copy link
Contributor Author

aleks-p commented Dec 18, 2023

Thanks @korniltsev, the value makes sense to me. I'll add the mapping for it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Optimize language detection
3 participants