Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

That is a bad idea to use Redis SCAN command, it works not like you expects... #1040

Closed
Zizitto opened this issue Dec 16, 2022 · 6 comments
Closed

Comments

@Zizitto
Copy link
Contributor

Zizitto commented Dec 16, 2022

Example:
My Redis Instance is also used as cache storage there are 260k keys right now. Cached keys have a big TTL, so they are always there.
Your "addRecord" method always writes a new key to the end of the keys list. (With much lower TTL)

Your "getRecord" method uses a SCAN command with "COUNT" property.
By default "COUNT" property is set to 1000 (by you).

Probably you expect that COUNT property is used to limit the number of rows returned by SCAN command...
But actually, COUNT property limits the number of keys to check with MATCH pattern, by default it will scan only the first 1000 keys stored in Redis.
So it never gets to the rows written by "addRecord" method in my case.

Extending of count property value to something like "300000" is not an option, because it will impact SCAN performance.

I would highly recommend you to replace the SCAN command with something else.

Also, I would highly recommend avoiding the usage of this package until this issue is fixed.
In my case, the throttler was not working on the production environment for a loooong time, until I found this issue.

@Zizitto Zizitto changed the title That is a bad idea to use this Redis SCAN command, it works not like you expects... That is a bad idea to use Redis SCAN command, it works not like you expects... Dec 16, 2022
@Zizitto
Copy link
Contributor Author

Zizitto commented Dec 16, 2022

Here is a quick proof:
I have 6 keys in DB

localhost:6379> KEYS *
1) "kt-apiv2-develop:develop-240446:graphql_schema_shop"
2) "kt-apiv2-test:test-240258:variants-expired-start-date"
3) "kt-apiv2-develop:develop-240446:variants-expired-start-date"
4) "kt-apiv2-test:test-240258:graphql_schema_admin"
5) "kt-apiv2-develop:develop-240446:graphql_schema_admin"
6) "kt-apiv2-test:test-240258:graphql_schema_shop"

Pattern "kt-apiv2-test:test-240258:*" matches with 3 keys, (2, 4, and 6).
Let`s use a scan command with COUNT "2"

localhost:6379> scan 0 MATCH "kt-apiv2-test:test-240258:*" COUNT 2
1) "12"
2) 1) "kt-apiv2-test:test-240258:variants-expired-start-date"

As you can see the result have only 1 row.
because in this case scan processed only the first 2 keys ("kt-apiv2-develop:develop-240446:graphql_schema_shop"
and "kt-apiv2-test:test-240258:variants-expired-start-date")

Let`s increase the COUNT property to 4

localhost:6379> scan 0 MATCH "kt-apiv2-test:test-240258:*" COUNT 4
1) "14"
2) 1) "kt-apiv2-test:test-240258:variants-expired-start-date"
   2) "kt-apiv2-test:test-240258:graphql_schema_admin"

Now we see 2 rows returned. because only the first 4 rows were processed this time.
The third key is still not returned.

@Zizitto
Copy link
Contributor Author

Zizitto commented Dec 17, 2022

@kkoomen Please review, my Pull Request which will replace usage of the SCAN command with sets:

#1041

@kkoomen
Copy link
Owner

kkoomen commented Dec 17, 2022

@Zizitto Hi. Let me start off by answering some of your questions and then discuss further on your suggestions:

Probably you expect that COUNT property is used to limit the number of rows returned by SCAN command...
But actually, COUNT property limits the number of keys to check with MATCH pattern, by default it will scan only the >first 1000 keys stored in Redis.
So it never gets to the rows written by "addRecord" method in my case.

When implementing this, I was aware of this. I added the option to adjust the SCAN limit, because it may vary for applications. I've worked with applications where 1000 is by far enough. I got a feature request once to support 100k or more, so I allowed people to adjust this value. Nonetheless, I am aware that this might raise performance issues.

However, I haven't used Redis clusters and personally don't need it. If people want support for this, then people can do a PR and test it with their own database. I know clusters are a headache so it does require some thorough testing as well.

Your database would be a good test subject, so if you are willing to support this plugin with your PR, I highly appreciate that, because having support for clusters is a nice milestone I'd say.

FYI: I did implement a possible implementation for redis instances as well as clusters in #1038, please have a look at that and let me know, because it might be useful to have a custom class for cluster configuration, but if that isn't needed (like in your PR #1041) then I do prefer that version of course.

Unfortunately, my knowledge about clusters isn't sufficient in order to judge what code could potentially work, so I might learn towards your suggestion if you've tested thoroughly and are confident about your solution. If we all agree on a certain implementation, it is required to write proper tests for this in some kind of way.

@kkoomen kkoomen pinned this issue Dec 17, 2022
@Zizitto
Copy link
Contributor Author

Zizitto commented Dec 18, 2022

Hi @kkoomen.

About your changes at #1038:
Looks like you have changed "addRecord" method arguments, that is a breaking change (at least for me).
I will not able to use this storage with the plugin i`m using now.

Your changes for clusters are useless for me because it will be much more profitable to fix the core issue - performance. SCAN command is still used as I see, you wrapped it with a "while" loop, so now the performance of "addRecord" method is, even more, dependent on the number of keys stored in your DB.

The performance of sets will be much better.
I`ve already tested my implementation with Docker Redis and GCP Redis Memorystore.

FYI: My PR also has a breaking change, I've removed a "scanCount" property from the constructor, this will trigger an exception for people who will still pass this argument to the constructor after upgrading to the new version. Please note that...

I`ve cleaned a few code-style issues. Please review my PR one more time.

@kkoomen
Copy link
Owner

kkoomen commented Dec 18, 2022

Hi @Zizitto

Hi @kkoomen.

About your changes at #1038: Looks like you have changed "addRecord" method arguments, that is a breaking change (at least for me). I will not able to use this storage with the plugin i`m using now.

Your changes for clusters are useless for me because it will be much more profitable to fix the core issue - performance. SCAN command is still used as I see, you wrapped it with a "while" loop, so now the performance of "addRecord" method is, even more, dependent on the number of keys stored in your DB.

The performance of sets will be much better. I`ve already tested my implementation with Docker Redis and GCP Redis Memorystore.

You're right. Thanks for clarifying.

FYI: My PR also has a breaking change, I've removed a "scanCount" property from the constructor, this will trigger an exception for people who will still pass this argument to the constructor after upgrading to the new version. Please note that...

That's not a problem, because the next version bump will be a minor version number.

@kkoomen
Copy link
Owner

kkoomen commented Dec 22, 2022

Resolved by #1041

@kkoomen kkoomen closed this as completed Dec 22, 2022
@kkoomen kkoomen unpinned this issue Dec 26, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants