Join GitHub today
Globaly Unique Identifiers for PMP Documents
Clone this wiki locally
Why Do We Need Globally (Universally) Unique Identifiers?
For any distributed computing environment it is essential to use globally (universally) unique identifiers for the content/data items in the system. The intent is to allow participants of the systems to uniquely identify data items without centralized coordination. (Wikipedia)
If we used simple, "traditional"-for-database-systems, numeric identifiers for data items, we would be in constant race conditions when the data items (such as: news stories) are published to PMP. There's nothing preventing Publisher A to use the same numeric identifier for a story that say Publisher B uses for a completely different story. Resolving such conflicts would have become a significant, and unnecessary hurdle. Globally Unique Identifiers are a well-known, simple and reliable solution for such problems.
A great feature of using GUIDs is that, once we agree on a GUID-generation algorithm, GUIDs allow publishers to generate identifiers locally (without central coordination) and still guarantee the uniqueness of the identifier, avoiding any integration conflicts.
For globally unique document identifiers PMP uses UUID version 4 identifiers, represented as 32 hexadecimal digits with optional dashes after the 8th, 12th, 16th, and 20th digits.
ATTENTION: Publishers that generate identifiers locally are required to properly implement UUID v4 generation (per: RFC4122) and are requested to use cryptographically strong random number generation.
The UUID v4 generator that PMP uses is based on the following open-source code: https://github.com/broofa/node-uuid, is RFC4122 compliant and does use cryptographically strong random number generation.
Formatting Notes for GUIDs in PMP
We accept GUIDs with or without dashes, normalizing them before validation.
We output GUIDs with dashes, for example: "d7c37b07-7cf2-4558-8b45-7e5694e818fe".
We store GUIDs with dashes, to avoid having to check on output to ensure that they have them.
In search documents, we index GUIDs without dashes, so that we can use them in text fields.