[NEW] Add rehashing info to command info #12302

JackyWoo · 2023-06-12T06:58:11Z

The problem/use-case that the feature addresses

I want to integration Redis with ClickHouse ClickHouse/ClickHouse#50150. While command scan(other scans) can produce duplicated keys. So I need to filter duplicated keys and I need know whether there maybe some duplication.

I plan to determine it by :

keep previous scan resoult.
keep previous scan stat (whether Redis is rehashing)
get current scan stat
if previous stat is false and current stat is true, filter previous scan resoult and current scan resoult.

I tried to find how to determine Redis scan stat, but did not find it(Please correct me if I was wrong), so I think we should add it.

Descripe the feature

Add a section to info.

info dictionary

And we will get

rehashidx: -1            // -1: not rehashing, 0-n: db is in rehashing, and result is bucket index, 
scaling: 0                 // 0 for scaling and 1 for shrinking
ht_size_mask_0:7  // hash table 0 size mask
ht_size_mask_1:7   // hash table 1 size mask 
ht_used_0:              // hash table 0 used bucket size
ht_used_1:              // hash table 1 used bucket size

Alternative way

Add a flag to scan result if the next scan may contain duplicated keys and application will take the flag directly.

new scan result may like

redis 127.0.0.1:6379> scan 0
1) "17"
2)  1) "key:12"
    2) "key:8"
    3) "key:4"
    4) "key:14"
3) "0"

This is a more strict way for the case.

The text was updated successfully, but these errors were encountered:

JackyWoo · 2023-06-12T07:57:28Z

If the idea is ok, I'd like to submit a PR.

yossigo · 2023-06-12T11:28:28Z

@JackyWoo We should not expose or rely on such implementation details. Can you describe what you use the key names received by SCAN for and why duplicates are a problem?

JackyWoo · 2023-06-12T12:07:52Z

@yossigo Thanks for your reply.
I try to integration Redis with ClickHouse and Redis is treated as a backend of ClickHouse. When I send a query like

select * from table_redis

I will get duplicated data.

yossigo · 2023-06-12T13:47:41Z

I don't have the context, but I imagine that even without duplicate keys a simple SCAN might not be what you're looking for. For example, during the scanning keys could be created and deleted, which means the resulting dataset will not represent a valid point in time.

JackyWoo · 2023-06-13T01:09:30Z

@yossigo Thanks for your advice.

ClickHouse is a OLAP database. I just try to take Redis as a storage of it. As a database it is a common case that scan whole dataset. As to Redis there are 2 fashions:

keys *
scan

For keys * is too heavy that I choose scan.

Actually multi scans are not atomic and can not represent a valid point in time. But users may concern duplicated result much more. So I rise the issue.

I hope I have made the context clear.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[NEW] Add rehashing info to command info #12302

[NEW] Add rehashing info to command info #12302

JackyWoo commented Jun 12, 2023 •

edited

JackyWoo commented Jun 12, 2023

yossigo commented Jun 12, 2023

JackyWoo commented Jun 12, 2023

yossigo commented Jun 12, 2023

JackyWoo commented Jun 13, 2023 •

edited

[NEW] Add rehashing info to command info #12302

[NEW] Add rehashing info to command info #12302

Comments

JackyWoo commented Jun 12, 2023 • edited

JackyWoo commented Jun 12, 2023

yossigo commented Jun 12, 2023

JackyWoo commented Jun 12, 2023

yossigo commented Jun 12, 2023

JackyWoo commented Jun 13, 2023 • edited

JackyWoo commented Jun 12, 2023 •

edited

JackyWoo commented Jun 13, 2023 •

edited