Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[NEW] Add rehashing info to command info #12302

Open
JackyWoo opened this issue Jun 12, 2023 · 5 comments
Open

[NEW] Add rehashing info to command info #12302

JackyWoo opened this issue Jun 12, 2023 · 5 comments

Comments

@JackyWoo
Copy link

JackyWoo commented Jun 12, 2023

The problem/use-case that the feature addresses

I want to integration Redis with ClickHouse ClickHouse/ClickHouse#50150. While command scan(other scans) can produce duplicated keys. So I need to filter duplicated keys and I need know whether there maybe some duplication.

I plan to determine it by :

  1. keep previous scan resoult.
  2. keep previous scan stat (whether Redis is rehashing)
  3. get current scan stat
  4. if previous stat is false and current stat is true, filter previous scan resoult and current scan resoult.

I tried to find how to determine Redis scan stat, but did not find it(Please correct me if I was wrong), so I think we should add it.

Descripe the feature

Add a section to info.

info dictionary

And we will get

rehashidx: -1            // -1: not rehashing, 0-n: db is in rehashing, and result is bucket index, 
scaling: 0                 // 0 for scaling and 1 for shrinking
ht_size_mask_0:7  // hash table 0 size mask
ht_size_mask_1:7   // hash table 1 size mask 
ht_used_0:              // hash table 0 used bucket size
ht_used_1:              // hash table 1 used bucket size

Alternative way

Add a flag to scan result if the next scan may contain duplicated keys and application will take the flag directly.

new scan result may like

redis 127.0.0.1:6379> scan 0
1) "17"
2)  1) "key:12"
    2) "key:8"
    3) "key:4"
    4) "key:14"
3) "0"

This is a more strict way for the case.

@JackyWoo
Copy link
Author

If the idea is ok, I'd like to submit a PR.

@yossigo
Copy link
Member

yossigo commented Jun 12, 2023

@JackyWoo We should not expose or rely on such implementation details. Can you describe what you use the key names received by SCAN for and why duplicates are a problem?

@JackyWoo
Copy link
Author

@yossigo Thanks for your reply.
I try to integration Redis with ClickHouse and Redis is treated as a backend of ClickHouse. When I send a query like

select * from table_redis

I will get duplicated data.

@yossigo
Copy link
Member

yossigo commented Jun 12, 2023

I don't have the context, but I imagine that even without duplicate keys a simple SCAN might not be what you're looking for. For example, during the scanning keys could be created and deleted, which means the resulting dataset will not represent a valid point in time.

@JackyWoo
Copy link
Author

JackyWoo commented Jun 13, 2023

@yossigo Thanks for your advice.

ClickHouse is a OLAP database. I just try to take Redis as a storage of it. As a database it is a common case that scan whole dataset. As to Redis there are 2 fashions:

  1. keys *
  2. scan

For keys * is too heavy that I choose scan.

Actually multi scans are not atomic and can not represent a valid point in time. But users may concern duplicated result much more. So I rise the issue.

I hope I have made the context clear.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Status: Backlog
Development

No branches or pull requests

2 participants