Maltese MySQL Collation
Switch branches/tags
Nothing to show
Clone or download
Fetching latest commit…
Cannot retrieve the latest commit at this time.
Permalink
Failed to load latest commit information.
.gitignore
Makefile
README.md
install.py
test-table.sql
test.py
uninstall.py
utf8_maltese_ci-mysql_5.1.xml
utf8_maltese_ci-mysql_5.5.xml
utf8_maltese_ci-mysql_5.6.xml

README.md

Maltese MySQL Collation (utf8_maltese_ci)

By John J. Camilleri

About

This is a custom collation for MySQL which correctly sorts strings according to the Maltese alphabet:

a b ċ d e f ġ g għ h ħ i ie j k l m n o p q r s t u v w x ż z

MySQL's utf8_unicode_ci collation treats g and ġ etc. as interchangeable, and utf8_bin places ċ, ġ, ħ, ż after the letter z — neither of which is correct. This collation was written to provide a solution to this.

Notes

  • Prior to version MySQL 5.6, collations cannot handle double character sequences. So, is treated as two separate letters and is sorted after the character sequence gh, which is strictly incorrect. This also means that ie would be sorted after io, which again is wrong.

  • I have only tested this with data in Maltese and English. Compatibility with characters from other languages may not be what you expect. Let me know if you find issues when your data contains other characters which don't sort well with respect to Maltese under this collation.

Installation

Requirements

  • MySQL 5.1+
  • Root/administritive access on the machine where MySQL is running
  • Basic command line, text-editor, and MySQL skills

Installing the collation is relatively simple. It doesn't require recompiling anything, however you will require administrative access on your machine. Adding a new collation does not affect any existing tables; you need to explicitly specify the collation in your create statements and/or queries in order to benefit from it.

1. Preparation

  1. Identify your exact MySQL version with the command SHOW VARIABLES LIKE 'version'
  2. Find an available collation ID on your MySQL server by following the steps here: 5.1, 5.5, 5.6.
    The IDs I chose for the Maltese collation are 225 and 1356 for MySQL 5.1 and 5.5+ respectively. However you should make sure the chosen ID is not in use on your system by running the following:
    SELECT * FROM INFORMATION_SCHEMA.COLLATIONS WHERE ID=1356
    and making sure it turns up no results.
  3. Find the location of MySQL's Index.xml file on your system with the command
    SHOW VARIABLES LIKE 'character_sets_dir'
    On an Ubuntu system this returns /usr/share/mysql/charsets/

2. Installation

Script-based installation

An install script is provided which will automatically patch your Index.xml file for you (you will first need the details above). Here is an example of it in use:

$ make install
sudo ./install.py
[sudo] password for user: 
Script for installing utf8_maltese_ci in MySQL
MySQL version (5.5): ↵
Location of charsets file (/usr/share/mysql/charsets/Index.xml): ↵
Backup existing file [Y/n]? ↵
Backed up to Index.xml.bak
Done

Notes about the install script:

  • You will require Python ≥ 2.5
  • If you do not have Python's lxml library installed, you will lose any XML comments in your Index.xml file.

There is also a corresonding uninstall script, which can be invoked using make uninstall.

Manual installation

If the script above doesn't work for you (or you just want to do things manually) follow these steps:

  1. Open the Index.xml file in a text editor (you will need to be root).
  2. Copy the Maltese <collation ...> section from the correct utf8_maltese_ci-mysql_5.x.xml file for your version of MySQL.
  3. Paste the copied XML into the <charset name="utf8">...</charset> section of your Index.xml file.
  4. Save the file and exit.

3. Restart MySQL

On a Unix system you can usually restart MySQL in one of the following ways:

  • service mysql restart
  • service mysqld restart
  • /etc/init.d/mysql restart
  • /etc/init.d/mysqld restart

4. Testing

Quick test

To make sure the collation has been registered with MySQL, you can execute the supplied test.py script:

$ ./test.py
Script for checking that utf8_maltese_ci is registered in MySQL
MySQL hostname (localhost): ↵
Username (root): ↵
Password (): mypassword↵
OK

Alternatively, you can manually perform the quick test by executing the following query:
SELECT * FROM INFORMATION_SCHEMA.COLLATIONS WHERE COLLATION_NAME='utf8_maltese_ci'

Full test

To test that the collation actually behaves as correctly, one should try it out on some test data:

  1. Create a simple test table by running test-table.sql
  2. Run the following query and check how the results are sorted as you'd expect:
    SELECT s FROM maltese_collation_test ORDER BY s ASC COLLATE 'utf8_maltese_ci'