Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add overhead of all DBs and rehashing dict count to info. #12913

Merged
merged 30 commits into from
Mar 1, 2024

Conversation

CharlesChen888
Copy link
Contributor

@CharlesChen888 CharlesChen888 commented Jan 5, 2024

Sometimes we need to make fast judgement about why Redis is suddenly taking more memory. One of the reasons is main DB's dicts doing rehashing.

We may use MEMORY STATS to monitor the overhead memory of each DB, but there still lacks a total sum to show an overall trend. So this PR adds the total overhead of all DBs to INFO MEMORY section, together with the total count of rehashing DB dicts, providing some intuitive metrics about main dicts rehashing.

This PR adds the following metrics to INFO MEMORY

  • mem_overhead_db_hashtable_rehashing - only size of ht[0] in dictionaries we're rehashing (i.e. the memory that's gonna get released soon)

and a similar ones to MEMORY STATS:

  • overhead.db.hashtable.lut (complements the existing overhead.hashtable.main and overhead.hashtable.expires which also counts the dictEntry structs too)
  • overhead.db.hashtable.rehashing - temporary rehashing overhead.
  • db.dict.rehashing.count - number of top level dictionaries being rehashed.

Copy link
Member

@oranagra oranagra left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

good idea. but i think i'd like a few changes:

  1. i don't think we need to have separate metric for the expires dict and the main one, we can count them together in the same metric.
  2. i'd like another metric to count the overhead of the rehashing (i.e. the total memory used by ht[0] in dicts that also have ht[1])

WDYT?

@soloestoy
Copy link
Collaborator

i'd like another metric to count the overhead of the rehashing (i.e. the total memory used by ht[0] in dicts that also have ht[1])

I like this idea, but one concern is that cluster with 16384 slots, in the worst case scenario, we would need to iterate through all the slots to obtain complete information about ht[0] and ht[1]. Perhaps we need to use rehashingStarted and rehashingCompleted to optimize this process.

@oranagra
Copy link
Member

yes, let's do that (keep track of it rather than calculate it on the spot)

src/server.c Outdated Show resolved Hide resolved
src/server.c Outdated Show resolved Hide resolved
Copy link
Member

@oranagra oranagra left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@redis/core-team please approve the new metrics for INFO MEMORY and MEMORY STATS

src/server.c Outdated Show resolved Hide resolved
@oranagra oranagra added state:major-decision Requires core team consensus approval-needed Waiting for core team approval to be merged release-notes indication that this issue needs to be mentioned in the release notes labels Jan 25, 2024
src/db.c Outdated Show resolved Hide resolved
src/db.c Outdated Show resolved Hide resolved
src/server.c Outdated Show resolved Hide resolved
@CharlesChen888
Copy link
Contributor Author

CharlesChen888 commented Feb 19, 2024

Now that "kvstore" is merged, I have to rewrite a lot of things.

mem_overhead_hashtable_rehashing and mem_overhead_hashtable_lut are now maintained in kvstoreDictRehashingStarted and kvstoreDictRehashingCompleted, and therefore it is inconvenient to tell whether the rehashing dict is from DBs or from PUBSUB dicts. But I guess including the overhead of pubsub is somehow OK? Sometimes pubsub also takes a lot of memory and we need to check it.

And since databases_rehashing_dict_count may not be so useful and now we need to iterate all DBs to get the exact count, I temporarily dropped it.

Or we can maintain the overhead inside kvstore structure and when info is called, we gather the overhead from all DBs' kvstore. And since in this case we have to iterate all DBs, we can keep the databases_rehashing_dict_count.

src/kvstore.c Outdated Show resolved Hide resolved
src/object.c Outdated Show resolved Hide resolved
src/object.c Outdated Show resolved Hide resolved
src/server.h Show resolved Hide resolved
src/object.c Show resolved Hide resolved
@CharlesChen888
Copy link
Contributor Author

CharlesChen888 commented Feb 22, 2024

Just realized that in test "Redis can resize empty dict" in other.tcl, it is assumed that memory stats shows only the overhead of non-empty DBs. But now all DBs' overhead is displayed and so we need to change the test.

Copy link
Member

@oranagra oranagra left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just realized that in test "Redis can resize empty dict" in other.tcl, it is assumed that memory stats shows only the overhead of non-empty DBs. But now all DBs' overhead is displayed and so we need to change the test.

i see the test passes, what do you mean?
the deletion of this?

        # Set a key to enable overhead display of db 0
        r set a b

or do you mean we need another way to distinguish between empty and non empty dicts?

@CharlesChen888
Copy link
Contributor Author

CharlesChen888 commented Feb 22, 2024

i see the test passes, what do you mean?

@oranagra

The test should read overhead.hashtable.main of DB9 (the DB whose dict expanded and then emptied). In the test, it use to be that only DB9 has a key (r set a b) and so memory stats shows only DB9's overhead. In this case we only need to find the first overhead.hashtable.main field and read its value.

However in this PR getMemoryOverheadData is modified and memory stats shows not only non-empty DBs but all DBs, (#12913 (comment)), so we need to specify it is the DB9's overhead that we need to read.

The test would also pass if it reads other DB's overhead, but that's meaningless.

@oranagra
Copy link
Member

ohh, you mean that so far it just looked for any overhead.hashtable.main, and you now modified it to be db.9 overhead.hashtable.main.
ok, so that's solved then.

let's wait to see if @soloestoy has anything else to argue and then merge it.

src/object.c Outdated Show resolved Hide resolved
src/commands/memory-stats.json Outdated Show resolved Hide resolved
src/object.c Show resolved Hide resolved
@CharlesChen888
Copy link
Contributor Author

@soloestoy Any more problems?

Copy link
Collaborator

@soloestoy soloestoy left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@oranagra please approve the new naming database.dict.rehashing.count and the change about kvstoreMemUsage

Copy link
Member

@oranagra oranagra left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i'm ok with the new name.
please update the top comment, and prepare a redis-doc PR

src/object.c Show resolved Hide resolved
@soloestoy
Copy link
Collaborator

soloestoy commented Feb 29, 2024

My OCD is kicking... I feel it's best to add "db" to the names of these metrics to avoid the misunderstanding that they refer to all hash data types:

INFO MEMORY

  • mem_overhead_hashtable_rehashing -> mem_overhead_db_hashtable_rehashing

MEMORY STATS

  • overhead.hashtable.lut -> overhead.db.hashtable.lut
  • overhead.hashtable.rehashing -> overhead.db.hashtable.rehashing

@oranagra @madolson WDYT?

about the db prefix error, @CharlesChen888 could you change database.dict.rehashing.count to db.dict.rehashing.count and modify the regular expression in memory-stats.json to "^db\\.\\d+$", then try again?

Copy link
Member

@oranagra oranagra left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@soloestoy soloestoy merged commit 4cae99e into redis:unstable Mar 1, 2024
14 checks passed
soloestoy pushed a commit to redis/redis-doc that referenced this pull request Mar 1, 2024
The corresponding doc PR of redis/redis#12913
---------

Co-authored-by: Oran Agra <oran@redislabs.com>
@madolson
Copy link
Contributor

madolson commented Mar 1, 2024

Just retrospectively acknowledging that the updating wording seems better to me as well.

oranagra pushed a commit that referenced this pull request Mar 2, 2024
…#13103)

The new regular expression break the validator:
```
In file included from commands.c:10:
commands_with_reply_schema.def:14528:72: error: stray ‘\’ in program
14528 | struct jsonObjectElement MEMORY_STATS_ReplySchema_patternProperties__db\_\d+__properties_overhead_hashtable_main_elements[] = {
```

The reason is that special characters are not added to to_c_name,
causes special characters to appear in the structure name, causing
c file compilation to fail.

Broken by #12913
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approval-needed Waiting for core team approval to be merged release-notes indication that this issue needs to be mentioned in the release notes state:major-decision Requires core team consensus state:needs-doc-pr requires a PR to redis-doc repository
Projects
Status: Done
Development

Successfully merging this pull request may close these issues.

None yet

4 participants