Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

SORT: use strcoll() when appropriate. #1074

Closed
antirez opened this issue Apr 26, 2013 · 4 comments
Closed

SORT: use strcoll() when appropriate. #1074

antirez opened this issue Apr 26, 2013 · 4 comments
Assignees

Comments

@antirez
Copy link
Contributor

antirez commented Apr 26, 2013

If you search for "change SORT order" thread in the Redis mailing list, you'll find that Redis up to 2.6 version uses binary comparison between strings for everything but when you use BY and ALPHA at the same time in SORT. In that case strcoll() is used, but since no locale is set, the effect is that anyway binary comparison is used since the default locale will be "C".

So well, the net result is that Redis always uses lexicographic comparison, that's bad since SORT is designed in order to return results in a way that is ready to display to the user, so proper collation should be used.

The no brainer would be to use strcoll() everywhere in the context of the SORT command, and use:

setlocale(LC_COLLATE,""); // in order to active the environment locale for strcoll().

However SORT + STORE is a write command, and as such should behave in a predictable way so that AOF and the replication stream are consistent. There is no easy way to understand if a master and a slave are using the same collation, nor I want to introduce an handshake thing for this, so the temporary solution for 2.8 could be the following:

  1. Use setlocale as specified above.
  2. Use strcoll() everywhere in the scope of SORT>
  3. However if STORE is used, sort just with strcmp() in a binary way.

This at least means that for many uses (everytime SORT is not involved) you can expect SORT + ALPHA to return results in the expected order.

Comments welcomed!

@ghost ghost assigned antirez Apr 26, 2013
@dspezia
Copy link
Contributor

dspezia commented Jul 10, 2013

If strcoll is used for sorting, what will happen with strings with \0 characters? Currently, most sort comparisons are done using compareStringObjects which results in sdscmp if both parameters are really strings. sdscmp is using memcmp which makes it binary safe.

We loose this property with strcoll/strcmp.

Now, does it make sense to sort strings containing binary data?

@antirez
Copy link
Contributor Author

antirez commented Jul 12, 2013

Hello Didier,

IMHO the point is your last sentence, it does not make a lot of sense to sort binary data in this context, and this sorting is only done in read only mode, so no effect is created in the database. I expect that when somebody does a SORT ALPHA, the goal is to display the output to some kind of human being :-)

@antirez
Copy link
Contributor Author

antirez commented Jul 12, 2013

Fixed by

Closing.

@antirez antirez closed this as completed Jul 12, 2013
@tsee
Copy link

tsee commented May 12, 2014

I'd like to see this get some further consideration. If I have binary data in a list, I do think it's perfectly legitimate to want to sort it. If I understand things correctly, right now there is no way to get totally predictable sort order out of "SORT mylist ..." if there's binary data. Using the perl Redis client (since I know first-hand it handles NULs correctly):

$ perl -MRedis -le '$r = Redis->new; $r->del("mylist"); $r->lpush("mylist", $_) for ("a", "b", "a\0b", "a\0c", "a\0a", "a\0d", "c"); print for $r->sort("mylist", "alpha");'
ad
aa
ac
ab
a
b
c

oranagra added a commit that referenced this issue Aug 21, 2022
Till now Redis officially supported tuning it via environment variable see #1074.
But we had other requests to allow changing it at runtime, see #799, and #11041.

Note that `strcoll()` is used as Lua comparison function and also for comparison of
certain string objects in Redis, which leads to a problem that, in different regions,
for some characters, the result may be different. Below is an example.
```
127.0.0.1:6333> SORT test alpha
1) "<"
2) ">"
3) ","
4) "*"
127.0.0.1:6333> CONFIG GET locale-collate
1) "locale-collate"
2) ""
127.0.0.1:6333> CONFIG SET locale-collate 1
(error) ERR CONFIG SET failed (possibly related to argument 'locale')
127.0.0.1:6333> CONFIG SET locale-collate C
OK
127.0.0.1:6333> SORT test alpha
1) "*"
2) ","
3) "<"
4) ">"
```
That will cause accidental code compatibility issues for Lua scripts and some
Redis commands. This commit creates a new config parameter to control the
local environment which only affects `Collate` category. Above shows how it
affects `SORT` command, and below shows the influence on Lua scripts.
```
127.0.0.1:6333> CONFIG GET locale-collate
1) " locale-collate"
2) "C"
127.0.0.1:6333> EVAL "return ',' < '*'" 0
(nil)
127.0.0.1:6333> CONFIG SET locale-collate ""
OK
127.0.0.1:6333> EVAL "return ',' < '*'" 0
(integer) 1
```

Co-authored-by: calvincjli <calvincjli@tencent.com>
Co-authored-by: Oran Agra <oran@redislabs.com>
sundb added a commit to sundb/redis that referenced this issue Sep 7, 2022
commit bdf7696
Author: sundb <sundbcn@gmail.com>
Date:   Tue Sep 6 19:50:14 2022 +0800

    Fix test fail in eextern test mode

commit 6ada91d
Author: sundb <sundbcn@gmail.com>
Date:   Tue Sep 6 17:39:52 2022 +0800

    Optimize comment in test

commit b972e06
Author: sundb <sundbcn@gmail.com>
Date:   Tue Sep 6 17:30:29 2022 +0800

    Fix crash due to wrongly split quicklist node

commit 8e51c95
Author: sundb <sundbcn@gmail.com>
Date:   Tue Sep 6 16:01:08 2022 +0800

    Fix crash due to delete entry from  compress quicklist node

commit 9022375
Author: Ariel Shtul <ariel.shtul@redislabs.com>
Date:   Tue Aug 23 09:37:59 2022 +0300

    [PERF] use snprintf once in addReplyDouble (redis#11093)

    The previous implementation calls `snprintf` twice, the second time used to
    'memcpy' the output of the first, which could be a very large string.
    The new implementation reserves space for the protocol header ahead
    of the formatted double, and then prepends the string length ahead of it.

    Measured improvement of simple ZADD of some 25%.

commit 407b5c9
Author: Itamar Haber <itamar@redis.com>
Date:   Mon Aug 22 15:05:01 2022 +0300

    Replaces a made-up term with a real one (redis#11169)

commit a534983
Author: Itamar Haber <itamar@redis.com>
Date:   Sun Aug 21 18:15:53 2022 +0300

    Changes "lower" to "capital" in GEO units history notes (redis#11164)

    A overlooked mistake in the redis#11162

commit ca6aead
Author: yourtree <56780191+yourtree@users.noreply.github.com>
Date:   Sun Aug 21 22:55:45 2022 +0800

    Support setlocale via CONFIG operation. (redis#11059)

    Till now Redis officially supported tuning it via environment variable see redis#1074.
    But we had other requests to allow changing it at runtime, see redis#799, and redis#11041.

    Note that `strcoll()` is used as Lua comparison function and also for comparison of
    certain string objects in Redis, which leads to a problem that, in different regions,
    for some characters, the result may be different. Below is an example.
    ```
    127.0.0.1:6333> SORT test alpha
    1) "<"
    2) ">"
    3) ","
    4) "*"
    127.0.0.1:6333> CONFIG GET locale-collate
    1) "locale-collate"
    2) ""
    127.0.0.1:6333> CONFIG SET locale-collate 1
    (error) ERR CONFIG SET failed (possibly related to argument 'locale')
    127.0.0.1:6333> CONFIG SET locale-collate C
    OK
    127.0.0.1:6333> SORT test alpha
    1) "*"
    2) ","
    3) "<"
    4) ">"
    ```
    That will cause accidental code compatibility issues for Lua scripts and some
    Redis commands. This commit creates a new config parameter to control the
    local environment which only affects `Collate` category. Above shows how it
    affects `SORT` command, and below shows the influence on Lua scripts.
    ```
    127.0.0.1:6333> CONFIG GET locale-collate
    1) " locale-collate"
    2) "C"
    127.0.0.1:6333> EVAL "return ',' < '*'" 0
    (nil)
    127.0.0.1:6333> CONFIG SET locale-collate ""
    OK
    127.0.0.1:6333> EVAL "return ',' < '*'" 0
    (integer) 1
    ```

    Co-authored-by: calvincjli <calvincjli@tencent.com>
    Co-authored-by: Oran Agra <oran@redislabs.com>

commit 31ef410
Author: Itamar Haber <itamar@redis.com>
Date:   Sun Aug 21 17:01:17 2022 +0300

    Adds historical note about lower-case geo units support (redis#11162)

    This change was part of redis#9656 (Redis 7.0)

commit c3a0253
Author: Wen Hui <wen.hui.ware@gmail.com>
Date:   Sun Aug 21 00:52:57 2022 -0400

    Add 2 test cases for XDEL and XGROUP CREATE command (redis#11137)

    This PR includes 2 missed test cases of XDEL and XGROUP CREATE command

    1. one test case: XDEL delete multiply id once
    2. 3 test cases:  XGROUP CREATE has ENTRIESREAD parameter,
       which equal 0 (special positive number), 3 and negative value.

    Co-authored-by: Ubuntu <lucas.guang.yang1@huawei.com>
    Co-authored-by: Oran Agra <oran@redislabs.com>
    Co-authored-by: Binbin <binloveplay1314@qq.com>
Mixficsol pushed a commit to Mixficsol/redis that referenced this issue Apr 12, 2023
Till now Redis officially supported tuning it via environment variable see redis#1074.
But we had other requests to allow changing it at runtime, see redis#799, and redis#11041.

Note that `strcoll()` is used as Lua comparison function and also for comparison of
certain string objects in Redis, which leads to a problem that, in different regions,
for some characters, the result may be different. Below is an example.
```
127.0.0.1:6333> SORT test alpha
1) "<"
2) ">"
3) ","
4) "*"
127.0.0.1:6333> CONFIG GET locale-collate
1) "locale-collate"
2) ""
127.0.0.1:6333> CONFIG SET locale-collate 1
(error) ERR CONFIG SET failed (possibly related to argument 'locale')
127.0.0.1:6333> CONFIG SET locale-collate C
OK
127.0.0.1:6333> SORT test alpha
1) "*"
2) ","
3) "<"
4) ">"
```
That will cause accidental code compatibility issues for Lua scripts and some
Redis commands. This commit creates a new config parameter to control the
local environment which only affects `Collate` category. Above shows how it
affects `SORT` command, and below shows the influence on Lua scripts.
```
127.0.0.1:6333> CONFIG GET locale-collate
1) " locale-collate"
2) "C"
127.0.0.1:6333> EVAL "return ',' < '*'" 0
(nil)
127.0.0.1:6333> CONFIG SET locale-collate ""
OK
127.0.0.1:6333> EVAL "return ',' < '*'" 0
(integer) 1
```

Co-authored-by: calvincjli <calvincjli@tencent.com>
Co-authored-by: Oran Agra <oran@redislabs.com>
yossigo added a commit to yossigo/redis that referenced this issue May 30, 2023
b6a052fe0 Helper for setting TCP_USER_TIMEOUT socket option (redis#1188)
3fa9b6944 Add RedisModule adapter (redis#1182)
d13c091e9 Fix wincrypt symbols conflict
5d84c8cfd Add a test ensuring we don't clobber connection error.
3f95fcdae Don't attempt to set a timeout if we are in an error state.
aacb84b8d Fix typo in makefile.
563b062e3 Accept -nan per the RESP3 spec recommendation.
04c1b5b02 Fix colliding option values
4ca8e73f6 Rework searching for openssl
cd208812f Attempt to find the correct path for openssl.
011f7093c Allow specifying the keepalive interval
e9243d4f7 Cmake static or shared (redis#1160)
1cbd5bc76 Write a version file for the CMake package (redis#1165)
6f5bae8c6 fix typo
acd09461d CMakeLists.txt: respect BUILD_SHARED_LIBS
97fcf0fd1 Add sdevent adapter
ccff093bc Bump dev version for the next release cycle.
c14775b4e Prepare for v1.1.0 GA
f0bdf8405 Add support for nan in RESP3 double (redis#1133)
991b0b0b3 Add an example that calls redisCommandArgv (redis#1140)
a36686f84 CI updates (redis#1139)
8ad4985e9 fix flag reference
7583ebb1b Make freeing a NULL redisAsyncContext a no op.
2c53dea7f Update version in dev branch.
f063370ed Prepare for v1.1.0-rc1
2b069573a CI fixes in preparation of release
e1e9eb40d Add author information to release-drafter template.
afc29ee1a Update for mingw cross compile
ceb8a8815 fixed cpp build error with adapters/libhv.h
3b15a04b5 Fixup of PR734: Coverage of hiredis.c (redis#1124)
c245df9fb CMake corrections for building on Windows (redis#1122)
9c338a598 Fix PUSH handler tests for Redis >= 7.0.5
6d5c3ee74 Install on windows fixes (redis#1117)
68b29e1ad Add timeout support to libhv adapter. (redis#1109)
722e3409c Additional include directory given by pkg-config (redis#1118)
bd9ccb8c4 Use __attribute__ when building with clang on windows
5392adc26 set default SSL certificate directory
560e66486 Minor refactor
d756f68a5 Add libhv example to our standard Makefile
a66916719 Add adapters/libhv
855b48a81 Fix pkgconfig for hiredis_ssl
79ae5ffc6 Fix protocol error (redis#1106)
61b5b299f Use a windows specific keepalive function. (redis#1104)
fce8abc1c Introduce .close method for redisContextFuncs
cfb6ca881 Add REDIS_OPT_PREFER_UNSPEC (redis#1101)
cc7c35ce6 Update documentation to explain redisConnectWithOptions.
bc8d837b7 fix heap-buffer-overflow (redis#957)
ca4a0e850 uvadapter: reduce number of uv_poll_start calls
35d398c90 Fix cmake config path on Linux. CMake config files were installed to `/usr/local/share/hiredis`, which is not recognizable by `find_package()`. I'm not sure why it was set that way. Given the commit introducing it is for Windows, I keep that behavior consistent there, but fix the rest.
10c78c6e1 Add possibility to prefer IPv6, IPv4 or unspecified
1abe0c828 fuzzer: No alloc in redisFormatCommand() when fail
329eaf9ba Fix heap-buffer-overflow issue in redisvFormatCommad
eaae7321c Polling adapter requires sockcompat.h
0a5fa3dde Regression test for off-by-one parsing error
9e174e8f7 Add do while(0) protection for macros
4ad99c69a Rework asSleep to be a generic millisleep function.
75cb6c1ea Do store command timeout in the context for redisSetTimeout (redis#593)
c57cad658 CMake: remove dict.c form hiredis_sources
8491a65a9 Add Github Actions CI workflow for hiredis: Arm, Arm64, 386, windows. (redis#943)
77e4f09ea Merge pull request redis#964 from afcidk/fix-createDoubleObject
9219f7e7c Merge pull request redis#901 from devnexen/illumos_test_fix
810cc6104 Merge pull request redis#905 from sundb/master
df8b74d69 Merge pull request redis#1091 from redis/ssl-error-ub-fix
0ed6cdec3 Fix some undefined behaviour
507a6dcaa Merge pull request redis#1090 from Nordix/subscribe-oom-error
b044eaa6a Copy error to redisAsyncContext when finding subscribe cb
e0200b797 Merge pull request redis#1087 from redis/const-and-non-const-callback
6a3e96ad2 Maintain backward compatibiliy withour onConnect callback.
e7afd998f Merge pull request redis#1079 from SukkaW/drop-macos-10.15-runner
17c8fe079 Merge pull request redis#931 from kristjanvalur/pr2
b808c0c20 Merge pull request redis#1083 from chayim/ck-drafter
367a82bf0 Merge pull request redis#1085 from stanhu/ssl-improve-options-setting
71119a71d Make it possible to set SSL verify mode
dd7979ac1 Merge pull request redis#1084 from stanhu/sh-improve-ssl-docs
c71116178 Improve example for SSL initialization in README.md
5c9b6b571 Release drafter
a606ccf2a CI: use recommended `vmactions/freebsd-vm@v0`
0865c115b Merge pull request redis#1080 from Nordix/readme-corrections
f6cee7142 Fix README typos
06be7ff31 Merge pull request redis#1050 from smmir-cent/fix-cmake-version
7dd833d54 CI: bump macos runner version
f69fac769 Drop `const` on redisAsyncContext in redisConnectCallback Since the callback is now re-entrant, it can call apis such as redisAsyncDisconnect()
005d7edeb Support calling redisAsyncDisconnect from the onConnected callback, by deferring context deletion
6ed060920 Add async regression test for issue redis#931
eaa2a7ee7 Merge pull request redis#932 from kristjanvalur/pr3
2ccef30f3 Add regression test for issue redis#945
4b901d44a Initial async tests
31c91408e Polling adapter and example
8a15f4d65 Merge pull request redis#1057 from orgads/static-name
902dd047f Merge pull request redis#1054 from kristjanvalur/pr08
c78d0926b Merge pull request redis#1074 from michael-grunder/kristjanvalur-pr4
2b115d56c Whitespace
1343988ce Fix typos
47b57aa24 Add some documentation on connect/disconnect callbacks and command callbacks
a890d9ce2 Merge pull request redis#1073 from michael-grunder/kristjanvalur-pr1
f246ee433 Whitespace, style
94c1985bd Use correct type for getsockopt()
5e002bc21 Support failed async connects on windows.
5d68ad2f4 Merge pull request redis#1072 from michael-grunder/fix-redis7-unit-tests
f4b6ed289 Fix tests so they work for Redis 7.0
95a0c1283 Merge pull request redis#1058 from orgads/win64
eedb37a65 Fix warnings on Win64
47c3ecefc Merge pull request redis#1062 from yossigo/fix-push-notification-order
e23d91c97 Merge pull request redis#1061 from yossigo/update-redis-apt
34211ad54 Merge pull request redis#1063 from redis/fix-windows-tests
9957af7e3 Whitelist hiredis repo path in cygwin
b455b3381 Handle push notifications before or after reply.
aed9ce446 Use official repository for redis package.
d7683f35a Merge pull request redis#1047 from Nordix/unsubscribe-handling
7c44a9d7e Merge pull request redis#1045 from Nordix/sds-updates
dd4bf9783 Use the same name for static and shared libraries
ff57c18b9 Embed debug information in windows static lib, rather than create a .pdb file
8310ad4f5 fix cmake version
7123b87f6 Handle any pipelined unsubscribe in async
b6fb548fc Ignore pubsub replies without a channel/pattern
00b82683b Handle overflows as errors instead of asserting
64062a1d4 Catch size_t overflows in sds.c
066c6de79 Use size_t/long to avoid truncation
c6657ef65 Merge branch 'redis:master' into master
50cdcab49 Fix potential fault at createDoubleObject
fd033e983 Remove semicolon after do-while in _EL_CLEANUP
664c415e7 Illumos test fixes, error message difference fot bad hostname test.

git-subtree-dir: deps/hiredis
git-subtree-split: b6a052fe0959dae69e16b9d74449faeb1b70dbe1
enjoy-binbin pushed a commit to enjoy-binbin/redis that referenced this issue Jul 31, 2023
Till now Redis officially supported tuning it via environment variable see redis#1074.
But we had other requests to allow changing it at runtime, see redis#799, and redis#11041.

Note that `strcoll()` is used as Lua comparison function and also for comparison of
certain string objects in Redis, which leads to a problem that, in different regions,
for some characters, the result may be different. Below is an example.
```
127.0.0.1:6333> SORT test alpha
1) "<"
2) ">"
3) ","
4) "*"
127.0.0.1:6333> CONFIG GET locale-collate
1) "locale-collate"
2) ""
127.0.0.1:6333> CONFIG SET locale-collate 1
(error) ERR CONFIG SET failed (possibly related to argument 'locale')
127.0.0.1:6333> CONFIG SET locale-collate C
OK
127.0.0.1:6333> SORT test alpha
1) "*"
2) ","
3) "<"
4) ">"
```
That will cause accidental code compatibility issues for Lua scripts and some
Redis commands. This commit creates a new config parameter to control the
local environment which only affects `Collate` category. Above shows how it
affects `SORT` command, and below shows the influence on Lua scripts.
```
127.0.0.1:6333> CONFIG GET locale-collate
1) " locale-collate"
2) "C"
127.0.0.1:6333> EVAL "return ',' < '*'" 0
(nil)
127.0.0.1:6333> CONFIG SET locale-collate ""
OK
127.0.0.1:6333> EVAL "return ',' < '*'" 0
(integer) 1
```

Co-authored-by: calvincjli <calvincjli@tencent.com>
Co-authored-by: Oran Agra <oran@redislabs.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants