Support for collating function (sequence)? #767

zingchen · 2021-07-27T02:23:52Z

The current version (v1.6) represents collating functions as an enum of 3 values: binary, nocase and rtrim, which corresponds to the 3 pre-built collating functions (also known as collating sequences in SQlite terminology). For many languages, the pre-built collating functions don't work.

Let's consider for example Czech. Two Czech words are compared lexicographically, i.e. letter-by-letter. However, the letter ch (pronounced kh) is represented as a sequence of two characters (Unicode codepoints) c and h, and takes place between h and i in the Latin alphabet. Thus Christina (Czech version of Christine, Khristina) is an eight-letter word, and takes place after Hana and before Klara.

Another example is Vietnamese. A diacriticized letter, e.g. é, ế, is represented as either a single pre-composed character or a sequence of two characters, the un-diacriticized letter (e, ê) followed by the diacritical mark. (Note that ê may be considered diacriticized letter in other contexts.) Diacritical marks represent tone of words. Every word has zero or one diacritical mark. For comparison between two words, the words are first translated into un-diacriticized form, possibly with extra traling spaces so as they have the same length, followed by the diacritical mark. The two words, in translated form, are then compared lexicographically. (In other words, a diacritical mark is always compared against a diacritical mark and such comparison takes place if and only if the un-diacriticized version of the words equal.)

The text was updated successfully, but these errors were encountered:

fnc12 · 2021-07-27T03:43:44Z

Ok you've described the problem and what solution do you offer? How would you solve this issue with raw SQLite? Why don't you use custom collations instead of built-in?

zingchen · 2021-07-27T04:39:10Z

With raw SQLite I'd register custom collating function myCompare() under name "MYCOMPARE" by a C API call to sqlite3_create_collation(myDbConnection, "MYCOMPARE", SQLITE_UTF8, NULL, myCompare), then write COLLATE MYCOMPARE in queries.

So, I think that SQLite-ORM should provide a wrapper for sqlite3_create_collation() which, besides others, maps the collating function (myCompare) to its name ("MYCOMPARE") and a version of collate_xxxx() family, e.g. collate_custom() that uses the mapping to translate the function's address to name for constructing the COLLATE clause.

fnc12 · 2021-07-27T04:42:11Z

there is custom collation support. Just call storage.create_collation(...)

zingchen · 2021-07-27T04:58:20Z

OOPS, I must have mis-read the source code. I'm sorry.

Any hint on how to specify the collating_function myCompare to construct a query? I've searched the examples but didn't find one using collating_function or create_collation.

Thanks.

fnc12 · 2021-07-27T05:07:53Z

auto rows = storage.select(&User::name, where(is_equal(&User::name, "Mercury").collate("ototo")));

zingchen · 2021-07-27T05:48:23Z

Thank you! Out of curiosity, why "ototo" and not &ototo or simply ototo?

The later seem to be more consistent with select(&User::name,...) and collate(nocase) syntax, which is one of the very principles of SQLite-ORM.

fnc12 · 2021-07-27T08:45:48Z

@zingchen several reasons:

custom collation is not very often used feature so it is not bad if you can inject custom string using .collate(...) member function
I was young and silly 😅

If you want I can add strong typed API just like user defined functions have

zingchen · 2021-07-27T13:43:47Z

Please please do it. As ototo is treated equally to nocase and rtrim in the SQLite language, it'd better be so in C++. After all, that's why we love SQLite-ORM. Right?

fnc12 · 2021-07-27T16:52:05Z

Probably I did not understand you quite right. Do you mean that custom collation feature must have a strong typed API just like user defined functions feature has to eliminate the opportunity of typing collation name as string at many places? Like that:

struct MyUnicodeCollation {
    int operator()(int leftLength, const void *lhs, int rightLength, const void *rhs) const {
        //  cute comparison code here..
    }

    static std::string_view name() {
        return "MY_UNICODE_COLLATION";
    }
};
storage.create_collation<MyUnicodeCollation>();
auto rows = storage.select(&User::name, where(is_equal(&User::name, "Mercury").collate<MyUnicodeCollation>()));

?

zingchen · 2021-07-27T17:18:41Z

Yes. The syntax collate<MyUnicodeCollation>() would be the most intuitive one from user's perspective, as it resembles the pattern collate_binary(), collate_nocase() and collate_rtrim().

fnc12 · 2021-07-27T17:19:39Z

Ok I'll add it to the TODO list

zingchen · 2021-07-27T17:21:08Z

Thank you very much!

fnc12 · 2021-07-27T17:24:21Z

You are welcome! https://github.com/fnc12/sqlite_orm/blob/dev/TODO.md please review the latest line in the list. Is the issue actual?

zingchen · 2021-07-27T17:50:57Z

Not so actual, compared to other items in the list. I can live with collate("MY_UNICODE_COLLATION") for now.

Some years ago, a student, who took an intro Ada course from me, told me that he prefers C++ to Ada because operator + (int x, int y) "is better" than function "+" (x, y : integer). The difference is + vs "+". I disagree with him, but since then I understand why Ada failed.

fnc12 · 2021-07-27T17:53:08Z

Cool story!

fnc12 · 2021-08-07T19:06:15Z

@zingchen new template API for custom collations is here #771

fnc12 · 2021-08-08T04:06:30Z

@zingchen merged. Please check it out

zingchen · 2021-08-08T11:22:14Z

GCC encountered errors trying to compile the example collate.cpp with the new sqlite_orm.h (from the dev branch). Here are some errors.

C:\C-lib\sqlite_orm\include/sqlite_orm/sqlite_orm.h:5529:15: error: '::strncpy' has not been declared
 5529 |             ::strncpy(dataCopy, stringChars, stringDataLength + 1);
      |               ^~~~~~~
C:\C-lib\sqlite_orm\include/sqlite_orm/sqlite_orm.h:5431:1: note: 'strncpy' is defined in header '<cstring>'; did you forget to '#include <cstring>'?
 5430 | #include <locale>  //  std::wstring_convert
  +++ |+#include <cstring>
 5431 | 

C:\C-lib\sqlite_orm\include/sqlite_orm/sqlite_orm.h: In member function 'std::tuple<const char*, int> sqlite_orm::statement_binder<V, typename std::enable_if<(std::is_same<V, std::__cxx11::basic_string<char> >::value || std::is_same<V, const char*>::value), void>::type>::string_data(const char*) const':
C:\C-lib\sqlite_orm\include/sqlite_orm/sqlite_orm.h:5542:31: error: there are no arguments to 'strlen' that depend on a template parameter, so a declaration of 'strlen' must be available [-fpermissive]
 5542 |             auto length = int(strlen(s));
      |                               ^~~~~~
C:\C-lib\sqlite_orm\include/sqlite_orm/sqlite_orm.h:5542:31: note: (if you use '-fpermissive', G++ will accept your code, but allowing the use of an undeclared name is deprecated)

C:\C-lib\sqlite_orm\include/sqlite_orm/sqlite_orm.h:5542:37: error: 'strlen' was not declared in this scope, and no declarations were found by argument-dependent lookup at the point of instantiation [-fpermissive]
 5542 |             auto length = int(strlen(s));
      |                               ~~~~~~^~~
In file included from C:/msys64/mingw64/include/c++/10.3.0/cstring:42,
                 from C:\C-lib\sqlite_orm\include/sqlite_orm/sqlite_orm.h:5741,
                 from D:\duong\testCpp\test1\sqlite_orm_examples\collate.cpp:6:
C:/msys64/mingw64/x86_64-w64-mingw32/include/string.h:64:18: note: 'size_t strlen(const char*)' declared here, later in the translation unit
   64 |   size_t __cdecl strlen(const char *_Str);
      |                  ^~~~~~
Process terminated with status 1 (0 minute(s), 1 second(s))
4 error(s), 12 warning(s) (0 minute(s), 1 second(s))

fnc12 · 2021-08-08T18:59:06Z

@zingchen oh damn! The fix is here #772

zingchen · 2021-08-09T01:12:30Z

It works flawlessly now.

(A variation of collate.cpp with added example for the new construct collate<CustomCol>().)

#include <ctime>

#include <string>
#include <iostream>

#include <sqlite_orm/sqlite_orm.h>

using std::cout;
using std::endl;

struct User {
    int id;
    std::string name;
    time_t createdAt;
};

struct Foo {
    std::string text;
    int baz;
};


int main(int, char **) {

    using namespace sqlite_orm;

    // custom collation
    struct CustomCol {
        int operator()(int leftLength, const void* lhs, int rightLength, const void* rhs) const {
            std::cout << static_cast<const char*>(lhs) << ":" << static_cast<const char*>(rhs) << std::endl;
            return 0;
        }

        static const char* name() {
            return "CUSTOMCOL";
        }
    };

    auto storage = make_storage(
        "collate_custom.sqlite",
        make_table("users",
                   make_column("id", &User::id, primary_key()),
                   make_column("name", &User::name),
                   make_column("created_at", &User::createdAt)),
        make_table("foo", make_column("text", &Foo::text, collate_nocase()), make_column("baz", &Foo::baz)));
    storage.sync_schema();
    storage.remove_all<User>();
    storage.remove_all<Foo>();

    storage.insert(User{0, "Lil Kim", std::time(nullptr)});
    storage.insert(User{0, "lil kim", std::time(nullptr)});
    storage.insert(User{0, "Nicki Minaj", std::time(nullptr)});

    storage.create_collation<CustomCol>();

    //  SELECT COUNT(*)
    //  FROM users
    //  WHERE name = 'lil kim'
    auto preciseLilKimsCount = storage.count<User>(where(is_equal(&User::name, "lil kim")));
    cout << "preciseLilKimsCount = " << preciseLilKimsCount << endl;

    //  SELECT COUNT(*) FROM users WHERE name = 'lil kim' COLLATE NOCASE
    auto nocaseCount = storage.count<User>(where(is_equal(&User::name, "lil kim").collate_nocase()));
    cout << "nocaseCount = " << nocaseCount << endl;

    //  SELECT COUNT(*) FROM users WHERE name = 'lil kim' COLLATE CUSTOMCOL
    auto customCount = storage.count<User>(where(is_equal(&User::name, "lil kim").collate<CustomCol>()));
    cout << "customCount = " << customCount << endl;


    //  SELECT COUNT(*) FROM users
    cout << "total users count = " << storage.count<User>() << endl;

    storage.insert(Foo{"Touch", 10});
    storage.insert(Foo{"touch", 20});

    cout << "foo count = " << storage.count<Foo>(where(c(&Foo::text) == "touch")) << endl;

    //  SELECT id
    //  FROM users
    //  ORDER BY name COLLATE RTRIM ASC
    auto rows = storage.select(&User::id, order_by(&User::name).collate_rtrim().asc());
    cout << "rows count = " << rows.size() << endl;

    return 0;
}

Thank you very much.

fnc12 · 2021-08-09T04:16:13Z

@zingchen you are welcome!

fnc12 assigned zingchen Jul 27, 2021

fnc12 added the question label Jul 27, 2021

zingchen changed the title ~~Incomplete support for collating function (sequence)~~ Support for collating function (sequence)? Jul 27, 2021

fnc12 added the in progress label Aug 7, 2021

fnc12 closed this as completed Aug 9, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Support for collating function (sequence)? #767

Support for collating function (sequence)? #767

zingchen commented Jul 27, 2021 •

edited

Loading

fnc12 commented Jul 27, 2021

zingchen commented Jul 27, 2021 •

edited

Loading

fnc12 commented Jul 27, 2021

zingchen commented Jul 27, 2021 •

edited

Loading

fnc12 commented Jul 27, 2021

zingchen commented Jul 27, 2021

fnc12 commented Jul 27, 2021

zingchen commented Jul 27, 2021

fnc12 commented Jul 27, 2021

zingchen commented Jul 27, 2021

fnc12 commented Jul 27, 2021

zingchen commented Jul 27, 2021

fnc12 commented Jul 27, 2021

zingchen commented Jul 27, 2021

fnc12 commented Jul 27, 2021

fnc12 commented Aug 7, 2021

fnc12 commented Aug 8, 2021

zingchen commented Aug 8, 2021 •

edited

Loading

fnc12 commented Aug 8, 2021

zingchen commented Aug 9, 2021

fnc12 commented Aug 9, 2021

Support for collating function (sequence)? #767

Support for collating function (sequence)? #767

Comments

zingchen commented Jul 27, 2021 • edited Loading

fnc12 commented Jul 27, 2021

zingchen commented Jul 27, 2021 • edited Loading

fnc12 commented Jul 27, 2021

zingchen commented Jul 27, 2021 • edited Loading

fnc12 commented Jul 27, 2021

zingchen commented Jul 27, 2021

fnc12 commented Jul 27, 2021

zingchen commented Jul 27, 2021

fnc12 commented Jul 27, 2021

zingchen commented Jul 27, 2021

fnc12 commented Jul 27, 2021

zingchen commented Jul 27, 2021

fnc12 commented Jul 27, 2021

zingchen commented Jul 27, 2021

fnc12 commented Jul 27, 2021

fnc12 commented Aug 7, 2021

fnc12 commented Aug 8, 2021

zingchen commented Aug 8, 2021 • edited Loading

fnc12 commented Aug 8, 2021

zingchen commented Aug 9, 2021

fnc12 commented Aug 9, 2021

zingchen commented Jul 27, 2021 •

edited

Loading

zingchen commented Jul 27, 2021 •

edited

Loading

zingchen commented Jul 27, 2021 •

edited

Loading

zingchen commented Aug 8, 2021 •

edited

Loading