Skip to content
This repository

HTTPS clone URL

Subversion checkout URL

You can clone with HTTPS or Subversion.

Download ZIP

Fork of pyuca, originally developed by James Tauber

branch: master

initial commit

latest commit 9a0ca45084
Denis Krienbühl authored December 16, 2011
Octocat-spinner-32 pyuca initial commit December 16, 2011
Octocat-spinner-32 .gitignore initial commit December 16, 2011
Octocat-spinner-32 README.md initial commit December 16, 2011
Octocat-spinner-32 setup.py initial commit December 16, 2011
README.md

Python Unicode Collation Algorithm (pyuca)

Originally developed by James Tauber this module provides a limited way of sorting unicode strings in the way humans expect it.

I stumpled on this module while looking for a sorting solution for a Plone module. While pyuca is not as thorough as UCA it does sorting better than the default sorted function in Python and it does it without having to rely on the locale module, which is not very useful in a webserver environment as it isn't threadsafe.

In fact, the nice thing about pyuca is that it does not need to know about the language of the text (unlike locale). It simply provides a sort function relying on the Default Unicode Collation Element Table.

I decided to put the module up on github as the original from the author's site was down. I notified the author and I do not claim to have done any work :)

Installation

Simply run python setup.py install

Usage

  1. Get the element table from the following link:

    http://www.unicode.org/Public/UCA/latest/allkeys.txt

  2. Try it

    >>> words = [u'Cafe', u'Café', u'Caff']
    
    >>> from pyuca import Collator
    >>> c = Collator('allkeys.txt')
    
    # standard sort
    >>> sorted(words)
    >>> [u'Cafe', u'Caff', u'Café']
    
    # pyuca sort
    >>> sorted(words, key=c.sort_key)
    >>> [u'Cafe', u'Café', u'Caff']
    

More

Original post by James Tauber:

http://jtauber.com/blog/2006/01/27/python_unicode_collation_algorithm/

Something went wrong with that request. Please try again.