[docs] postgres collation warning #1017
Merged
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Check your Postgres collation for Mastodon instance now. A story about incorrectly ordered toots (long).
I spent three evenings investigating why my instance stopped updating notifications and statuses correctly. I figured out that statuses were not gone, but not ordered correctly. Like if something shaked them a bit, but not much, just a bit.
I was debugging goroutines, learning about Universally Unique Lexicographically Sortable Identifier (ULID) which is the ID that is used in the ActivityPub protocol. No luck. This is how they look like btw:
01GHGAC5EHKSQQ0YRPXNWVZ7EJ
01GHGA78BHHQ8A3T6SFVYXAV4Y
These ULIDs are used as unique identifiers and because they are lexicographically sortable, Mastodon implementations take advantage of that and sort by this database column.
Now it might be clear, but jeeez I spent some time until I finally figured: I created my Postgres database on a system with cs_CZ.UTF-8 locale. Therefore my database was created with cs_CZ collation.
See, in Czech, we have one special character "CH" and Czech collation it goes between "H" and "I". That was the problem and this is the big lesson that I learned.
Always create SQL database for Mastodon instances with "neutral" (English, none or C) collation: C.UTF-8. In case of Postgres, what you need to do is:
create database xxx with locale C.UTF-8 template template0
To check your collate, on Postgres do:
SELECT datcollate AS collation FROM pg_database WHERE datname = current_database();
Czech is not the only language that might bring problems I suppose. Check your databases now! Boost it. Thanks! Have fun.
https://social.zapletalovi.com/@lukas/statuses/01GHHJQKMCGSB8TV1SMGE6JDM0