Skip to content

[RFC] Case-insensitive ordering — standardize approach across drivers #70

@michalbiarda

Description

@michalbiarda

Problem

When ordering entities by a text column, users expect alphabetical ordering to be case-insensitive — "apple" before "Banana" before "cherry". This works out of the box on MySQL/MariaDB (default utf8mb4_unicode_ci collation) but not on PostgreSQL, where the default collation puts uppercase before lowercase — "Banana" sorts before "apple".

Example: a product catalog ordered by name ASC produces different results depending on the driver, even with identical data.

Options considered

1. Wrap in LOWER() at query time
Simple, no schema changes. But breaks index usage — requires a separate functional index (LOWER(name)) to avoid full sequential scans on large tables.

2. Per-column ICU collation in migrations

CREATE COLLATION IF NOT EXISTS case_insensitive (
  provider = icu,
  locale = 'und-u-ks-level2',
  deterministic = false
);
ALTER TABLE products ALTER COLUMN name TYPE text COLLATE case_insensitive;

Indexes work transparently. Caveat: deterministic = false also makes equality comparisons case-insensitive (WHERE name = 'Apple' matches 'apple'), which may be undesirable.

3. Database-level collation
Must be set at CREATE DATABASE time — not practical for existing databases.

4. Denormalized sort column
Store a pre-normalized name_sort column. Best performance and most flexible, but doubles write burden and requires framework-level conventions.

Questions for discussion

  • Should the framework have an opinion here, or leave it to the application?
  • If we standardize on ICU collation: should the schema generator apply it automatically to string columns, or opt-in via an attribute (e.g. #[CaseInsensitive])?
  • Should query builders expose a caseInsensitive ordering flag that emits LOWER() for portability, leaving DB collation as an advanced opt-in?
  • How should this interact with WHERE clauses — do we want case-insensitive equality too, or only ordering?

Prior art

Django 4.2+ recommends ICU collations on PostgreSQL. Rails leans on DB collation. Doctrine leaves it to the user. None enforce a convention at the ORM level.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions