-
Notifications
You must be signed in to change notification settings - Fork 1.9k
Use percpu_counter for obj_alloc counter of Linux-backed caches #10397
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
LGTM, but wondering why you're using the Linux-only percpu_* rather than aggsum_* as we're using in arc.c? |
|
It seems this code uses the counter almost solely for write, reading it only sporadically on user request and not requiring atomicity. For such usage aggsum is a waste of resources, since it does provide atomic reads. I think there should be introduced some platform-independent KPI for write-only counters, likely just a wrapper over whatever provided by each platform. |
|
I did the mistake and opened this as a proper review. I should have opened it as a draft since it seems like I'd need to iterate a bit more to make it compile in all environments. @richardelling I did think about using the aggsum code but that code is licensed under CDDL and the code that I'm changing is in the SPL which is GPL. I think not many people have touched the aggsum code so we could relicense it if we wanted to but it just seemed too much for this change. @amotin I'd be interested in working with others in such counters. That said this undertaking is tangential to this PR. |
|
For reference this is a counter(9) KPI available on FreeBSD to be used in write-mostly cases: https://www.freebsd.org/cgi/man.cgi?query=counter&sektion=9&manpath=freebsd-release-ports |
Codecov Report
@@ Coverage Diff @@
## master #10397 +/- ##
===========================================
- Coverage 79.55% 63.53% -16.03%
===========================================
Files 391 309 -82
Lines 123830 106544 -17286
===========================================
- Hits 98518 67691 -30827
- Misses 25312 38853 +13541
Continue to review full report at Codecov.
|
49a41f2 to
b48052d
Compare
|
cc @tonynguien |
A previous commit enabled the tracking of object allocations in Linux-backed caches from the SPL layer for debuggability. The commit is: 9a170fc Unfortunately, it also introduced minor performance regressions that were highlighted by the ZFS perf test-suite. Within Delphix we found that the regression would be from -1%, all the way up to -8% for some workloads. This commit brings performance back up to par by creating a separate counter for those caches and making it a percpu in order to avoid lock-contention. The initial performance testing was done by myself, and the final round was conducted by @tonynguien who was also the one that discovered the regression and highlighted the culprit. Signed-off-by: Serapheim Dimitropoulos <serapheim@delphix.com>
b48052d to
81e2ff3
Compare
A previous commit enabled the tracking of object allocations in Linux-backed caches from the SPL layer for debuggability. The commit is: 9a170fc Unfortunately, it also introduced minor performance regressions that were highlighted by the ZFS perf test-suite. Within Delphix we found that the regression would be from -1%, all the way up to -8% for some workloads. This commit brings performance back up to par by creating a separate counter for those caches and making it a percpu in order to avoid lock-contention. The initial performance testing was done by myself, and the final round was conducted by @tonynguien who was also the one that discovered the regression and highlighted the culprit. Backported-by: Aron Xu <aron@debian.org> Signed-off-by: Aron Xu <aron@debian.org> Closes openzfs#10397
A previous commit enabled the tracking of object allocations in Linux-backed caches from the SPL layer for debuggability. The commit is: 9a170fc Unfortunately, it also introduced minor performance regressions that were highlighted by the ZFS perf test-suite. Within Delphix we found that the regression would be from -1%, all the way up to -8% for some workloads. This commit brings performance back up to par by creating a separate counter for those caches and making it a percpu in order to avoid lock-contention. The initial performance testing was done by myself, and the final round was conducted by @tonynguien who was also the one that discovered the regression and highlighted the culprit. Reviewed-by: Matt Ahrens <matt@delphix.com> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Serapheim Dimitropoulos <serapheim@delphix.com> Closes openzfs#10397
A previous commit enabled the tracking of object allocations in Linux-backed caches from the SPL layer for debuggability. The commit is: 9a170fc Unfortunately, it also introduced minor performance regressions that were highlighted by the ZFS perf test-suite. Within Delphix we found that the regression would be from -1%, all the way up to -8% for some workloads. This commit brings performance back up to par by creating a separate counter for those caches and making it a percpu in order to avoid lock-contention. The initial performance testing was done by myself, and the final round was conducted by @tonynguien who was also the one that discovered the regression and highlighted the culprit. Reviewed-by: Matt Ahrens <matt@delphix.com> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Serapheim Dimitropoulos <serapheim@delphix.com> Closes openzfs#10397
Motivation
A previous commit enabled the tracking of object allocations
in Linux-backed caches from the SPL layer for debuggability.
The commit is: 9a170fc
Unfortunately, it also introduced minor performance regressions
that were highlighted by the ZFS perf test-suite. Within Delphix
we found that the regression would be from -1%, all the way up
to -8% for some workloads.
Description
This commit brings performance back up to par by creating a
separate counter for those caches and making it a percpu in
order to avoid lock-contention.
Testing
The initial performance testing was done by myself, and the
final round was conducted by @tonynguien who was also the one
that discovered the regression and highlighted the culprit.