Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix LC_COLLATE issue #176 #799

Closed
wants to merge 1 commit into from
Closed

Conversation

charsyam
Copy link
Contributor

sortCompare has a bug in sort.c

if server.sort_alpha is true, sort_alpha has nothing to do with sort_bypattern.

so it has to compare with strcoll instead of compareStringObject.

and add locale config parameter to set locale easily.

@charsyam charsyam mentioned this pull request Nov 29, 2012
@baina
Copy link

baina commented Nov 29, 2012

The Redis server itself stores all data as a binary objects, so it is not dependent on the encoding. The server will just store what is sent by the client (including UTF-8 chars).

Here are a few experiments:

$ echo téléphone | hexdump -C
00000000 74 c3 a9 6c c3 a9 70 68 6f 6e 65 0a |t..l..phone.|
c3a9 is the representation of the 'é' char.

$ redis-cli

set t téléphone
OK
get t
"t\xc3\xa9l\xc3\xa9phone"
Actually the data is correctly stored in the Redis server. However, when it is launched in a terminal, the Redis client interprets the output and applies the sdscatrepr function to transform non printable chars (whose definition is locale dependent, and may be broken for multibyte chars anyway).

A simple workaround is to launch redis-cli with the 'raw' option:

$ redis-cli --raw

get t
téléphone
Your own application will probably use one of the client libraries rather than redis-cli, so it should not be a problem in practice.

@charsyam
Copy link
Contributor Author

@bania. I don't think so. of course, redis server itself stores all data as a binary. but this problem is not about just storing.. for example. in Polish
redis> LPUSH test adam
(integer) 1
redis> LPUSH test łukasz
(integer) 2
redis> LPUSH test zuzanna
(integer) 3

redis> SORT test ALPHA

  1. "adam"
  2. "zuzanna"
  3. "\xc5\x82ukasz"

łukasz has to be ahead of zuzanna.
but SORT result is wrong. we expect below result.

redis> SORT test ALPHA

  1. "adam"
  2. "\xc5\x82ukasz"
  3. "zuzanna"

and my patch is about this problem :) Thank you.

@baina
Copy link

baina commented Nov 29, 2012

@charsyam You're right! Does your patch support redis hash sort(sorting by hash values)?
Thank you.

@charsyam
Copy link
Contributor Author

@baina I don't know. I just found miss cases of sortCompare in sortCommand.

@charsyam
Copy link
Contributor Author

charsyam commented Jan 9, 2013

@antirez I think this issue is obviously bug in special environment to use special letters.

@charsyam charsyam mentioned this pull request Nov 21, 2013
@yoav-steinberg
Copy link
Contributor

Was fixed with similar approach just using env variable instead of configuration. See 123b221.

oranagra added a commit that referenced this pull request Aug 21, 2022
Till now Redis officially supported tuning it via environment variable see #1074.
But we had other requests to allow changing it at runtime, see #799, and #11041.

Note that `strcoll()` is used as Lua comparison function and also for comparison of
certain string objects in Redis, which leads to a problem that, in different regions,
for some characters, the result may be different. Below is an example.
```
127.0.0.1:6333> SORT test alpha
1) "<"
2) ">"
3) ","
4) "*"
127.0.0.1:6333> CONFIG GET locale-collate
1) "locale-collate"
2) ""
127.0.0.1:6333> CONFIG SET locale-collate 1
(error) ERR CONFIG SET failed (possibly related to argument 'locale')
127.0.0.1:6333> CONFIG SET locale-collate C
OK
127.0.0.1:6333> SORT test alpha
1) "*"
2) ","
3) "<"
4) ">"
```
That will cause accidental code compatibility issues for Lua scripts and some
Redis commands. This commit creates a new config parameter to control the
local environment which only affects `Collate` category. Above shows how it
affects `SORT` command, and below shows the influence on Lua scripts.
```
127.0.0.1:6333> CONFIG GET locale-collate
1) " locale-collate"
2) "C"
127.0.0.1:6333> EVAL "return ',' < '*'" 0
(nil)
127.0.0.1:6333> CONFIG SET locale-collate ""
OK
127.0.0.1:6333> EVAL "return ',' < '*'" 0
(integer) 1
```

Co-authored-by: calvincjli <calvincjli@tencent.com>
Co-authored-by: Oran Agra <oran@redislabs.com>
sundb added a commit to sundb/redis that referenced this pull request Sep 7, 2022
commit bdf7696
Author: sundb <sundbcn@gmail.com>
Date:   Tue Sep 6 19:50:14 2022 +0800

    Fix test fail in eextern test mode

commit 6ada91d
Author: sundb <sundbcn@gmail.com>
Date:   Tue Sep 6 17:39:52 2022 +0800

    Optimize comment in test

commit b972e06
Author: sundb <sundbcn@gmail.com>
Date:   Tue Sep 6 17:30:29 2022 +0800

    Fix crash due to wrongly split quicklist node

commit 8e51c95
Author: sundb <sundbcn@gmail.com>
Date:   Tue Sep 6 16:01:08 2022 +0800

    Fix crash due to delete entry from  compress quicklist node

commit 9022375
Author: Ariel Shtul <ariel.shtul@redislabs.com>
Date:   Tue Aug 23 09:37:59 2022 +0300

    [PERF] use snprintf once in addReplyDouble (redis#11093)

    The previous implementation calls `snprintf` twice, the second time used to
    'memcpy' the output of the first, which could be a very large string.
    The new implementation reserves space for the protocol header ahead
    of the formatted double, and then prepends the string length ahead of it.

    Measured improvement of simple ZADD of some 25%.

commit 407b5c9
Author: Itamar Haber <itamar@redis.com>
Date:   Mon Aug 22 15:05:01 2022 +0300

    Replaces a made-up term with a real one (redis#11169)

commit a534983
Author: Itamar Haber <itamar@redis.com>
Date:   Sun Aug 21 18:15:53 2022 +0300

    Changes "lower" to "capital" in GEO units history notes (redis#11164)

    A overlooked mistake in the redis#11162

commit ca6aead
Author: yourtree <56780191+yourtree@users.noreply.github.com>
Date:   Sun Aug 21 22:55:45 2022 +0800

    Support setlocale via CONFIG operation. (redis#11059)

    Till now Redis officially supported tuning it via environment variable see redis#1074.
    But we had other requests to allow changing it at runtime, see redis#799, and redis#11041.

    Note that `strcoll()` is used as Lua comparison function and also for comparison of
    certain string objects in Redis, which leads to a problem that, in different regions,
    for some characters, the result may be different. Below is an example.
    ```
    127.0.0.1:6333> SORT test alpha
    1) "<"
    2) ">"
    3) ","
    4) "*"
    127.0.0.1:6333> CONFIG GET locale-collate
    1) "locale-collate"
    2) ""
    127.0.0.1:6333> CONFIG SET locale-collate 1
    (error) ERR CONFIG SET failed (possibly related to argument 'locale')
    127.0.0.1:6333> CONFIG SET locale-collate C
    OK
    127.0.0.1:6333> SORT test alpha
    1) "*"
    2) ","
    3) "<"
    4) ">"
    ```
    That will cause accidental code compatibility issues for Lua scripts and some
    Redis commands. This commit creates a new config parameter to control the
    local environment which only affects `Collate` category. Above shows how it
    affects `SORT` command, and below shows the influence on Lua scripts.
    ```
    127.0.0.1:6333> CONFIG GET locale-collate
    1) " locale-collate"
    2) "C"
    127.0.0.1:6333> EVAL "return ',' < '*'" 0
    (nil)
    127.0.0.1:6333> CONFIG SET locale-collate ""
    OK
    127.0.0.1:6333> EVAL "return ',' < '*'" 0
    (integer) 1
    ```

    Co-authored-by: calvincjli <calvincjli@tencent.com>
    Co-authored-by: Oran Agra <oran@redislabs.com>

commit 31ef410
Author: Itamar Haber <itamar@redis.com>
Date:   Sun Aug 21 17:01:17 2022 +0300

    Adds historical note about lower-case geo units support (redis#11162)

    This change was part of redis#9656 (Redis 7.0)

commit c3a0253
Author: Wen Hui <wen.hui.ware@gmail.com>
Date:   Sun Aug 21 00:52:57 2022 -0400

    Add 2 test cases for XDEL and XGROUP CREATE command (redis#11137)

    This PR includes 2 missed test cases of XDEL and XGROUP CREATE command

    1. one test case: XDEL delete multiply id once
    2. 3 test cases:  XGROUP CREATE has ENTRIESREAD parameter,
       which equal 0 (special positive number), 3 and negative value.

    Co-authored-by: Ubuntu <lucas.guang.yang1@huawei.com>
    Co-authored-by: Oran Agra <oran@redislabs.com>
    Co-authored-by: Binbin <binloveplay1314@qq.com>
Mixficsol pushed a commit to Mixficsol/redis that referenced this pull request Apr 12, 2023
Till now Redis officially supported tuning it via environment variable see redis#1074.
But we had other requests to allow changing it at runtime, see redis#799, and redis#11041.

Note that `strcoll()` is used as Lua comparison function and also for comparison of
certain string objects in Redis, which leads to a problem that, in different regions,
for some characters, the result may be different. Below is an example.
```
127.0.0.1:6333> SORT test alpha
1) "<"
2) ">"
3) ","
4) "*"
127.0.0.1:6333> CONFIG GET locale-collate
1) "locale-collate"
2) ""
127.0.0.1:6333> CONFIG SET locale-collate 1
(error) ERR CONFIG SET failed (possibly related to argument 'locale')
127.0.0.1:6333> CONFIG SET locale-collate C
OK
127.0.0.1:6333> SORT test alpha
1) "*"
2) ","
3) "<"
4) ">"
```
That will cause accidental code compatibility issues for Lua scripts and some
Redis commands. This commit creates a new config parameter to control the
local environment which only affects `Collate` category. Above shows how it
affects `SORT` command, and below shows the influence on Lua scripts.
```
127.0.0.1:6333> CONFIG GET locale-collate
1) " locale-collate"
2) "C"
127.0.0.1:6333> EVAL "return ',' < '*'" 0
(nil)
127.0.0.1:6333> CONFIG SET locale-collate ""
OK
127.0.0.1:6333> EVAL "return ',' < '*'" 0
(integer) 1
```

Co-authored-by: calvincjli <calvincjli@tencent.com>
Co-authored-by: Oran Agra <oran@redislabs.com>
enjoy-binbin pushed a commit to enjoy-binbin/redis that referenced this pull request Jul 31, 2023
Till now Redis officially supported tuning it via environment variable see redis#1074.
But we had other requests to allow changing it at runtime, see redis#799, and redis#11041.

Note that `strcoll()` is used as Lua comparison function and also for comparison of
certain string objects in Redis, which leads to a problem that, in different regions,
for some characters, the result may be different. Below is an example.
```
127.0.0.1:6333> SORT test alpha
1) "<"
2) ">"
3) ","
4) "*"
127.0.0.1:6333> CONFIG GET locale-collate
1) "locale-collate"
2) ""
127.0.0.1:6333> CONFIG SET locale-collate 1
(error) ERR CONFIG SET failed (possibly related to argument 'locale')
127.0.0.1:6333> CONFIG SET locale-collate C
OK
127.0.0.1:6333> SORT test alpha
1) "*"
2) ","
3) "<"
4) ">"
```
That will cause accidental code compatibility issues for Lua scripts and some
Redis commands. This commit creates a new config parameter to control the
local environment which only affects `Collate` category. Above shows how it
affects `SORT` command, and below shows the influence on Lua scripts.
```
127.0.0.1:6333> CONFIG GET locale-collate
1) " locale-collate"
2) "C"
127.0.0.1:6333> EVAL "return ',' < '*'" 0
(nil)
127.0.0.1:6333> CONFIG SET locale-collate ""
OK
127.0.0.1:6333> EVAL "return ',' < '*'" 0
(integer) 1
```

Co-authored-by: calvincjli <calvincjli@tencent.com>
Co-authored-by: Oran Agra <oran@redislabs.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants