Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[docs] postgres collation warning #1017

Merged
merged 1 commit into from
Nov 11, 2022

Conversation

lzap
Copy link
Contributor

@lzap lzap commented Nov 10, 2022

Check your Postgres collation for Mastodon instance now. A story about incorrectly ordered toots (long).

I spent three evenings investigating why my instance stopped updating notifications and statuses correctly. I figured out that statuses were not gone, but not ordered correctly. Like if something shaked them a bit, but not much, just a bit.

I was debugging goroutines, learning about Universally Unique Lexicographically Sortable Identifier (ULID) which is the ID that is used in the ActivityPub protocol. No luck. This is how they look like btw:

01GHGAC5EHKSQQ0YRPXNWVZ7EJ
01GHGA78BHHQ8A3T6SFVYXAV4Y

These ULIDs are used as unique identifiers and because they are lexicographically sortable, Mastodon implementations take advantage of that and sort by this database column.

Now it might be clear, but jeeez I spent some time until I finally figured: I created my Postgres database on a system with cs_CZ.UTF-8 locale. Therefore my database was created with cs_CZ collation.

See, in Czech, we have one special character "CH" and Czech collation it goes between "H" and "I". That was the problem and this is the big lesson that I learned.

Always create SQL database for Mastodon instances with "neutral" (English, none or C) collation: C.UTF-8. In case of Postgres, what you need to do is:

create database xxx with locale C.UTF-8 template template0

To check your collate, on Postgres do:

SELECT datcollate AS collation FROM pg_database WHERE datname = current_database();

Czech is not the only language that might bring problems I suppose. Check your databases now! Boost it. Thanks! Have fun.

https://social.zapletalovi.com/@lukas/statuses/01GHHJQKMCGSB8TV1SMGE6JDM0

@tsmethurst
Copy link
Contributor

Thank you so much for your debugging work!

@tsmethurst tsmethurst merged commit b755906 into superseriousbusiness:main Nov 11, 2022
@lzap lzap deleted the pg-collate-doc branch November 11, 2022 10:13
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants