Commit 41ea28d
mm: memcg: optimize parent iteration in memcg_rstat_updated()
commit 9cee7e8 upstream.
In memcg_rstat_updated(), we iterate the memcg being updated and its
parents to update memcg->vmstats_percpu->stats_updates in the fast path
(i.e. no atomic updates). According to my math, this is 3 memory loads
(and potentially 3 cache misses) per memcg:
- Load the address of memcg->vmstats_percpu.
- Load vmstats_percpu->stats_updates (based on some percpu calculation).
- Load the address of the parent memcg.
Avoid most of the cache misses by caching a pointer from each struct
memcg_vmstats_percpu to its parent on the corresponding CPU. In this
case, for the first memcg we have 2 memory loads (same as above):
- Load the address of memcg->vmstats_percpu.
- Load vmstats_percpu->stats_updates (based on some percpu calculation).
Then for each additional memcg, we need a single load to get the
parent's stats_updates directly. This reduces the number of loads from
O(3N) to O(2+N) -- where N is the number of memcgs we need to iterate.
Additionally, stash a pointer to memcg->vmstats in each struct
memcg_vmstats_percpu such that we can access the atomic counter that all
CPUs fold into, memcg->vmstats->stats_updates.
memcg_should_flush_stats() is changed to memcg_vmstats_needs_flush() to
accept a struct memcg_vmstats pointer accordingly.
In struct memcg_vmstats_percpu, make sure both pointers together with
stats_updates live on the same cacheline. Finally, update
mem_cgroup_alloc() to take in a parent pointer and initialize the new
cache pointers on each CPU. The percpu loop in mem_cgroup_alloc() may
look concerning, but there are multiple similar loops in the cgroup
creation path (e.g. cgroup_rstat_init()), most of which are hidden
within alloc_percpu().
According to Oliver's testing [1], this fixes multiple 30-38%
regressions in vm-scalability, will-it-scale-tlb_flush2, and
will-it-scale-fallocate1. This comes at a cost of 2 more pointers per
CPU (<2KB on a machine with 128 CPUs).
[1] https://lore.kernel.org/lkml/ZbDJsfsZt2ITyo61@xsang-OptiPlex-9020/
[yosryahmed@google.com: fix struct memcg_vmstats_percpu size and alignment]
Link: https://lkml.kernel.org/r/20240203044612.1234216-1-yosryahmed@google.com
Link: https://lkml.kernel.org/r/20240124100023.660032-1-yosryahmed@google.com
Signed-off-by: Yosry Ahmed <yosryahmed@google.com>
Fixes: 8d59d22 ("mm: memcg: make stats flushing threshold per-memcg")
Tested-by: kernel test robot <oliver.sang@intel.com>
Reported-by: kernel test robot <oliver.sang@intel.com>
Closes: https://lore.kernel.org/oe-lkp/202401221624.cb53a8ca-oliver.sang@intel.com
Acked-by: Shakeel Butt <shakeelb@google.com>
Acked-by: Johannes Weiner <hannes@cmpxchg.org>
Cc: Michal Hocko <mhocko@kernel.org>
Cc: Muchun Song <muchun.song@linux.dev>
Cc: Roman Gushchin <roman.gushchin@linux.dev>
Cc: Greg Thelen <gthelen@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>1 parent 6b97ad9 commit 41ea28d
1 file changed
+35
-21
lines changed| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
617 | 617 | | |
618 | 618 | | |
619 | 619 | | |
| 620 | + | |
| 621 | + | |
| 622 | + | |
| 623 | + | |
| 624 | + | |
| 625 | + | |
| 626 | + | |
| 627 | + | |
| 628 | + | |
620 | 629 | | |
621 | 630 | | |
622 | 631 | | |
| |||
628 | 637 | | |
629 | 638 | | |
630 | 639 | | |
631 | | - | |
632 | | - | |
633 | | - | |
634 | | - | |
| 640 | + | |
635 | 641 | | |
636 | 642 | | |
637 | 643 | | |
| |||
694 | 700 | | |
695 | 701 | | |
696 | 702 | | |
697 | | - | |
| 703 | + | |
698 | 704 | | |
699 | | - | |
| 705 | + | |
700 | 706 | | |
701 | 707 | | |
702 | 708 | | |
703 | 709 | | |
704 | 710 | | |
| 711 | + | |
705 | 712 | | |
706 | | - | |
707 | 713 | | |
708 | 714 | | |
709 | 715 | | |
710 | 716 | | |
711 | 717 | | |
712 | | - | |
713 | | - | |
714 | | - | |
715 | | - | |
716 | | - | |
717 | | - | |
| 718 | + | |
| 719 | + | |
| 720 | + | |
| 721 | + | |
718 | 722 | | |
719 | 723 | | |
720 | 724 | | |
721 | 725 | | |
722 | 726 | | |
723 | 727 | | |
724 | | - | |
725 | | - | |
726 | | - | |
| 728 | + | |
| 729 | + | |
| 730 | + | |
| 731 | + | |
727 | 732 | | |
728 | 733 | | |
729 | 734 | | |
| |||
752 | 757 | | |
753 | 758 | | |
754 | 759 | | |
755 | | - | |
| 760 | + | |
756 | 761 | | |
757 | 762 | | |
758 | 763 | | |
| |||
766 | 771 | | |
767 | 772 | | |
768 | 773 | | |
769 | | - | |
| 774 | + | |
770 | 775 | | |
771 | 776 | | |
772 | 777 | | |
| |||
5328 | 5333 | | |
5329 | 5334 | | |
5330 | 5335 | | |
5331 | | - | |
| 5336 | + | |
5332 | 5337 | | |
| 5338 | + | |
5333 | 5339 | | |
5334 | | - | |
| 5340 | + | |
5335 | 5341 | | |
5336 | 5342 | | |
5337 | 5343 | | |
| |||
5354 | 5360 | | |
5355 | 5361 | | |
5356 | 5362 | | |
| 5363 | + | |
| 5364 | + | |
| 5365 | + | |
| 5366 | + | |
| 5367 | + | |
| 5368 | + | |
| 5369 | + | |
| 5370 | + | |
5357 | 5371 | | |
5358 | 5372 | | |
5359 | 5373 | | |
| |||
5399 | 5413 | | |
5400 | 5414 | | |
5401 | 5415 | | |
5402 | | - | |
| 5416 | + | |
5403 | 5417 | | |
5404 | 5418 | | |
5405 | 5419 | | |
| |||
0 commit comments