Skip to content

Commit

Permalink
mm: Fix memcg reclaim on memory tiered systems
Browse files Browse the repository at this point in the history
commit 3f1509c ("Revert "mm/vmscan: never demote for memcg
reclaim"") enabled demotion in memcg reclaim, which is the right thing
to do, but introduced a regression in the behavior of
try_to_free_mem_cgroup_pages().

The callers of try_to_free_mem_cgroup_pages() expect it to attempt to
reclaim - not demote - nr_pages from the cgroup. I.e. the memory usage
of the cgroup should reduce by nr_pages. The callers expect
try_to_free_mem_cgroup_pages() to also return the number of pages
reclaimed, not demoted.

However, try_to_free_mem_cgroup_pages() actually unconditionally counts
demoted pages as reclaimed pages. So in practice when it is called it will
often demote nr_pages and return the number of demoted pages to the caller.
Demoted pages don't lower the memcg usage as the caller requested.

I suspect various things work suboptimally on memory systems or don't
work at all due to this:

- memory.high enforcement likely doesn't work (it just demotes nr_pages
  instead of lowering the memcg usage by nr_pages).
- try_charge_memcg() will keep retrying the charge while
  try_to_free_mem_cgroup_pages() is just demoting pages and not actually
  making any room for the charge.
- memory.reclaim has a wonky interface. It advertises to the user it
  reclaims the provided amount but it will actually demote that amount.

There may be more effects to this issue.

To fix these issues I propose shrink_folio_list() to only count pages
demoted from inside of sc->nodemask to outside of sc->nodemask as
'reclaimed'.

For callers such as reclaim_high() or try_charge_memcg() that set
sc->nodemask to NULL, try_to_free_mem_cgroup_pages() will try to
actually reclaim nr_pages and return the number of pages reclaimed. No
demoted pages would count towards the nr_pages requirement.

For callers such as memory_reclaim() that set sc->nodemask,
try_to_free_mem_cgroup_pages() will free nr_pages from that nodemask
with either demotion or reclaim.

Tested this change using memory.reclaim interface. With this change,

	echo "1m" > memory.reclaim

Will cause freeing of 1m of memory from the cgroup regardless of the
demotions happening inside.

	echo "1m nodes=0" > memory.reclaim

Will cause freeing of 1m of node 0 by demotion if a demotion target is
available, and by reclaim if no demotion target is available.

Signed-off-by: Mina Almasry <almasrymina@google.com>
  • Loading branch information
mina authored and intel-lab-lkp committed Dec 4, 2022
1 parent 490acbc commit 332f41f
Show file tree
Hide file tree
Showing 3 changed files with 31 additions and 6 deletions.
7 changes: 5 additions & 2 deletions include/linux/memory-tiers.h
Expand Up @@ -38,15 +38,18 @@ void init_node_memory_type(int node, struct memory_dev_type *default_type);
void clear_node_memory_type(int node, struct memory_dev_type *memtype);
#ifdef CONFIG_MIGRATION
int next_demotion_node(int node);
void node_get_allowed_targets(pg_data_t *pgdat, nodemask_t *targets);
void node_get_allowed_targets(pg_data_t *pgdat, nodemask_t *targets,
nodemask_t *demote_from_targets);
bool node_is_toptier(int node);
#else
static inline int next_demotion_node(int node)
{
return NUMA_NO_NODE;
}

static inline void node_get_allowed_targets(pg_data_t *pgdat, nodemask_t *targets)
static inline void node_get_allowed_targets(pg_data_t *pgdat,
nodemask_t *targets,
nodemask_t *demote_from_targets)
{
*targets = NODE_MASK_NONE;
}
Expand Down
10 changes: 9 additions & 1 deletion mm/memory-tiers.c
Expand Up @@ -264,7 +264,8 @@ bool node_is_toptier(int node)
return toptier;
}

void node_get_allowed_targets(pg_data_t *pgdat, nodemask_t *targets)
void node_get_allowed_targets(pg_data_t *pgdat, nodemask_t *targets,
nodemask_t *demote_from_targets)
{
struct memory_tier *memtier;

Expand All @@ -280,6 +281,13 @@ void node_get_allowed_targets(pg_data_t *pgdat, nodemask_t *targets)
else
*targets = NODE_MASK_NONE;
rcu_read_unlock();

/*
* Exclude the demote_from_targets from the allowed targets if we're
* trying to demote from a specific set of nodes.
*/
if (demote_from_targets)
nodes_andnot(*targets, *targets, *demote_from_targets);
}

/**
Expand Down
20 changes: 17 additions & 3 deletions mm/vmscan.c
Expand Up @@ -1590,7 +1590,8 @@ static struct page *alloc_demote_page(struct page *page, unsigned long private)
* Folios which are not demoted are left on @demote_folios.
*/
static unsigned int demote_folio_list(struct list_head *demote_folios,
struct pglist_data *pgdat)
struct pglist_data *pgdat,
nodemask_t *demote_from_nodemask)
{
int target_nid = next_demotion_node(pgdat->node_id);
unsigned int nr_succeeded;
Expand All @@ -1614,7 +1615,7 @@ static unsigned int demote_folio_list(struct list_head *demote_folios,
if (target_nid == NUMA_NO_NODE)
return 0;

node_get_allowed_targets(pgdat, &allowed_mask);
node_get_allowed_targets(pgdat, &allowed_mask, demote_from_nodemask);

/* Demotion ignores all cpuset and mempolicy settings */
migrate_pages(demote_folios, alloc_demote_page, NULL,
Expand Down Expand Up @@ -1653,6 +1654,7 @@ static unsigned int shrink_folio_list(struct list_head *folio_list,
LIST_HEAD(free_folios);
LIST_HEAD(demote_folios);
unsigned int nr_reclaimed = 0;
unsigned int nr_demoted = 0;
unsigned int pgactivate = 0;
bool do_demote_pass;
struct swap_iocb *plug = NULL;
Expand Down Expand Up @@ -2085,7 +2087,19 @@ static unsigned int shrink_folio_list(struct list_head *folio_list,
/* 'folio_list' is always empty here */

/* Migrate folios selected for demotion */
nr_reclaimed += demote_folio_list(&demote_folios, pgdat);
nr_demoted = demote_folio_list(&demote_folios, pgdat, sc->nodemask);

/*
* Only count demoted folios as reclaimed if the caller has requested
* demotion from a specific nodemask. In this case pages inside the
* noedmask have been demoted to outside the nodemask and we can count
* these pages as reclaimed. If no nodemask is passed, then the caller
* is requesting reclaim from all memory, which should not count
* demoted pages.
*/
if (sc->nodemask)
nr_reclaimed += nr_demoted;

/* Folios that could not be demoted are still in @demote_folios */
if (!list_empty(&demote_folios)) {
/* Folios which weren't demoted go back on @folio_list */
Expand Down

0 comments on commit 332f41f

Please sign in to comment.