Question: Using in SQLite #8

honzajavorek · 2023-07-08T20:36:49Z

SQLite offers a way to easily add custom functions written in Python. I wondered if I could register czech_sort.key as a custom function, because then I could have Czech sort on any field in my database with ease and portability. However, SQLite doesn't seem to be happy with tuples, raising an exception:

sqlite3.ProgrammingError: User-defined functions cannot return 'tuple' values to SQLite

I looked at the tuple if I can somehow convert it to one of the supported types, but I got scared, the tuple is enormous and undocumented.

Any ideas how this could be done, do I overlook something, is this out of scope, or better done different way?

The text was updated successfully, but these errors were encountered:

encukou · 2023-07-09T09:18:53Z

Yup, this isn't trivial: the way tuples sort matches the sorting rules nicely, and czech_sort takes advantage of that.

The tuple is documented in the code.

You could use something like this, but maybe key could be adapted to make things easier.

def bytes_key(string):
    def _recurse(key):
        match key:
            case tuple():
                # Turn individual items into bytes keys, and join them.
                # After each item, put a `1` byte if there are more items,
                # and a `0` byte if not.
                return b'\x01'.join(_recurse(e) for e in key) + b'\0'
            case str():
                # Encode to UTF-8, add a zero marker at the end.
                # The marker needs to be "smaller" than anything in the string,
                # including embedded `0` bytes.
                # So, the marker is a doubled `0` byte, and any `0` bytes in
                # the string are "escaped" to `0`-`1`.
                return key.encode().replace(b'\0', b'\0\x01') + b'\0\0'
            case int():
                if key < 0:
                    # Negative numbers are smallest, start with a `0` byte.
                    return bytes([0]) + key.to_bytes(8, signed=True)
                if key >= 254:
                    # Large numbers start with a full byte.
                    return bytes([255]) + key.to_bytes(8)
                # Small non-negative numbers use the remaining byte values.
                return (key + 1).to_bytes(1)
            case _:
                raise TypeError(key)
    return _recurse(czech_sort.key(string))

PR with tests welcome :)

encukou · 2023-07-11T18:07:07Z

Thanks! Fixed in #9, out in 1.1.0.

WARNING: Do not store the results of bytes_key in the database. The format can change in future versions of czech_sort.

honzajavorek · 2023-07-11T18:31:28Z

Amazing! Thanks! This will make something convoluted and diffucult very simple and portable.

Btw, I see you dropped Python v2. I think the README will need an update in the compatibility section.

honzajavorek pushed a commit to honzajavorek/czech-sort that referenced this issue Jul 9, 2023

serialize key as bytes, see encukou#8

9e7e22d

encukou closed this as completed Jul 11, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Question: Using in SQLite #8

Question: Using in SQLite #8

honzajavorek commented Jul 8, 2023

encukou commented Jul 9, 2023

encukou commented Jul 11, 2023

honzajavorek commented Jul 11, 2023

Question: Using in SQLite #8

Question: Using in SQLite #8

Comments

honzajavorek commented Jul 8, 2023

encukou commented Jul 9, 2023

encukou commented Jul 11, 2023

honzajavorek commented Jul 11, 2023