Skip to content

When querying a contact in non-English language (UTF-8) it always treats it as case-sensitive #3

aldanor opened this Issue Apr 4, 2013 · 6 comments

2 participants

aldanor commented Apr 4, 2013

So, "abc" will trigger case-insensitive search, but "абв" will not - looks like there's a bug with checking capitalization properly.


That’s a locale specific collating issue, I guess. Could you provide me with two bits of nformation?

Your locale settings, as returned by

env | grep -E 'LC_|LANG'

and a few example strings to test against (as I don’t speak any language using a non-Latin alphabet.

aldanor commented Apr 4, 2013

I have everything but LC_ALL set to en_US.UTF-8, which I have also set to en_US.UTF-8:

$ locale

However, the case-insensitive matching still doesn't seem to work (although it works out of the box e.g. with Python regex module with flags re.IGNORECASE | re.UNICODE).

Several examples: Путин Вор should be matched by both путин and вор, АБВГД should be matched by абвгд.

aldanor commented Apr 4, 2013


$ echo Путин Вор | awk '{print tolower($0)}'
Путин Вор

$ echo $a | tr '[A-Z]' '[a-z]'
Путин Вор

$ echo $a | perl -e 'print lc <>;'
Путин Вор

$ python -c "import sys; print sys.argv[1].lower()" "Путин Вор"
Путин Вор


$ python -c "import sys; print sys.argv[1].decode('UTF-8').lower()" "Путин Вор"
путин вор

Related question on SO:

P.S. just in case, the Russian UTF-8 locale is ru_RU.UTF-8.

aldanor commented Apr 8, 2013

@kopischke Hi again!

I don't know if this would be helpful but I tried and rewrote the whole thing in Python, works perfectly with correct unicode handling:

Usage: put the script in the workflow folder and replace ./numbers "{query}" with ./ "{query}".

It is very simple (i.e., without too much hassle) to add advanced regex matching (mix and match first/last names and company/nickname in any order) and/or sort results by the quality of the match (i.e. full first/last names get matched first), and you can also easily implement fuzzy matching kinda like Alfred does.

@kopischke kopischke added a commit that closed this issue Apr 9, 2013
@kopischke Set case comparison to use en_US.UTF-8 locale
Brings UTF-8 case awareness to string operations. Fixes #3
@kopischke kopischke closed this in c89fa42 Apr 9, 2013

The issue is solvable in bash: all that was needed was setting the locale inside the Alfred script, as it defaults to “C” otherwise, where case comparison only works on the ASCII 7 bit range. Also: thanks for the suggested Python code, but it (like code in my preferred language, Ruby) is far too slow to be useful in a script filter. The current bash script beats it by an order of magnitude (on my elderly iMac, average lookup time is below 1 s on first run, below 0.5 sec after that; the Python version takes over 4 sec on first run, over 2.5 sec after that).


Fixed in release 1.1.0

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Something went wrong with that request. Please try again.