Minimize aggsum_compare(&arc_size, arc_c) calls #8901

amotin · 2019-06-13T22:58:26Z

Motivation and Context

For busy ARC situation when arc_size close to arc_c is desired. But
then it is quite likely that aggsum_compare(&arc_size, arc_c) will need
to flush per-CPU buckets to find exact comparison result. Doing that
often in a hot path penalizes whole idea of aggsum usage there, since it
replaces few simple atomic additions with dozens of lock acquisitions.

Description

Replacing aggsum_compare() with aggsum_upper_bound() in code increasing
arc_p when ARC is growing (arc_size < arc_c) according to PMC profiles
allows to save ~5% of CPU time in aggsum code during sequential write
to 12 ZVOLs with 16KB block size on large dual-socket system.

I suppose there some minor arc_p behavior change due to lower precision of
the new code, but I don't think it is a big deal, since it should affect
only very small window in time (aggsum buckets are flushed every second)
and in ARC size (buckets are limited to 10 average ARC blocks per CPU).

How Has This Been Tested?

System with 72 logical cores running FreeBSD head was writing sequentially to 12 ZVOLs with 16KB block size same time. Without the patch profiler shown ~5% of CPU time spent in aggsum_add() and aggsum_compare(), called from arc_get_data_impl(). With the patch applied aggsum_compare() just gone, and aggsum_add() reduced to about 0.5% total over few places. arc_p still growing to almost arc_c during initial warmup, same as before.

Types of changes

Bug fix (non-breaking change which fixes an issue)
New feature (non-breaking change which adds functionality)
Performance enhancement (non-breaking change which improves efficiency)
Code cleanup (non-breaking change which makes code smaller or more readable)
Breaking change (fix or feature that would cause existing functionality to change)
Documentation (a change to man pages or other documentation)

Checklist:

My code follows the ZFS on Linux code style requirements.
I have updated the documentation accordingly.
I have read the contributing document.
I have added tests to cover my changes.
All new and existing tests passed.
All commit messages are properly formatted and contain Signed-off-by.

For busy ARC situation when arc_size close to arc_c is desired. But then it is quite likely that aggsum_compare(&arc_size, arc_c) will need to flush per-CPU buckets to find exact comparison result. Doing that often in a hot path penalizes whole idea of aggsum usage there, since it replaces few simple atomic additions with dozens of lock acquisitions. Replacing aggsum_compare() with aggsum_upper_bound() in code increasing arc_p when ARC is growing (arc_size < arc_c) according to PMC profiles allows to save ~5% of CPU time in aggsum code during sequential write to 12 ZVOLs with 16KB block size on large dual-socket system. I suppose there some minor arc_p behavior change due to lower precision of the new code, but I don't think it is a big deal, since it should affect only very small window in time (aggsum buckets are flushed every second) and in ARC size (buckets are limited to 10 average ARC blocks per CPU). Signed-off-by: Alexander Motin <mav@FreeBSD.org>

chrisrd

LGTM

For busy ARC situation when arc_size close to arc_c is desired. But then it is quite likely that aggsum_compare(&arc_size, arc_c) will need to flush per-CPU buckets to find exact comparison result. Doing that often in a hot path penalizes whole idea of aggsum usage there, since it replaces few simple atomic additions with dozens of lock acquisitions. Replacing aggsum_compare() with aggsum_upper_bound() in code increasing arc_p when ARC is growing (arc_size < arc_c) according to PMC profiles allows to save ~5% of CPU time in aggsum code during sequential write to 12 ZVOLs with 16KB block size on large dual-socket system. I suppose there some minor arc_p behavior change due to lower precision of the new code, but I don't think it is a big deal, since it should affect only very small window in time (aggsum buckets are flushed every second) and in ARC size (buckets are limited to 10 average ARC blocks per CPU). Reviewed-by: Chris Dunlop <chris@onthe.net.au> Reviewed-by: Richard Elling <Richard.Elling@RichardElling.com> Reviewed-by: George Melikov <mail@gmelikov.ru> Reviewed-by: Allan Jude <allanjude@freebsd.org> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Alexander Motin <mav@FreeBSD.org> Closes openzfs#8901

For busy ARC situation when arc_size close to arc_c is desired. But then it is quite likely that aggsum_compare(&arc_size, arc_c) will need to flush per-CPU buckets to find exact comparison result. Doing that often in a hot path penalizes whole idea of aggsum usage there, since it replaces few simple atomic additions with dozens of lock acquisitions. Replacing aggsum_compare() with aggsum_upper_bound() in code increasing arc_p when ARC is growing (arc_size < arc_c) according to PMC profiles allows to save ~5% of CPU time in aggsum code during sequential write to 12 ZVOLs with 16KB block size on large dual-socket system. I suppose there some minor arc_p behavior change due to lower precision of the new code, but I don't think it is a big deal, since it should affect only very small window in time (aggsum buckets are flushed every second) and in ARC size (buckets are limited to 10 average ARC blocks per CPU). Reviewed-by: Chris Dunlop <chris@onthe.net.au> Reviewed-by: Richard Elling <Richard.Elling@RichardElling.com> Reviewed-by: George Melikov <mail@gmelikov.ru> Reviewed-by: Allan Jude <allanjude@freebsd.org> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Alexander Motin <mav@FreeBSD.org> Closes #8901

richardelling approved these changes Jun 13, 2019

View reviewed changes

amotin force-pushed the aggsum branch from d04ecc6 to 4c17f46 Compare June 13, 2019 23:52

chrisrd approved these changes Jun 13, 2019

View reviewed changes

allanjude approved these changes Jun 14, 2019

View reviewed changes

gmelikov approved these changes Jun 14, 2019

View reviewed changes

behlendorf added the Status: Accepted Ready to integrate (reviewed, tested) label Jun 14, 2019

behlendorf approved these changes Jun 14, 2019

View reviewed changes

openzfs deleted a comment from codecov bot Jun 14, 2019

behlendorf merged commit c1b5801 into openzfs:master Jun 14, 2019

amotin deleted the aggsum branch June 14, 2019 21:47

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Minimize aggsum_compare(&arc_size, arc_c) calls #8901

Minimize aggsum_compare(&arc_size, arc_c) calls #8901

amotin commented Jun 13, 2019

chrisrd left a comment

Minimize aggsum_compare(&arc_size, arc_c) calls #8901

Minimize aggsum_compare(&arc_size, arc_c) calls #8901

Conversation

amotin commented Jun 13, 2019

Motivation and Context

Description

How Has This Been Tested?

Types of changes

Checklist:

chrisrd left a comment

Choose a reason for hiding this comment