Globaly Unique Identifiers for PMP Documents

inadarei edited this page Apr 28, 2013 · 11 revisions

Why Do We Need Globally (Universally) Unique Identifiers?

For any distributed computing environment it is essential to use globally (universally) unique identifiers for the content/data items in the system. The intent is to allow participants of the systems to uniquely identify data items without centralized coordination. (Wikipedia)

If we used simple, "traditional"-for-database-systems, numeric identifiers for data items, we would be in constant race conditions when the data items (such as: news stories) are published to PMP. There's nothing preventing Publisher A to use the same numeric identifier for a story that say Publisher B uses for a completely different story. Resolving such conflicts would have become a significant, and unnecessary hurdle. Globally Unique Identifiers are a well-known, simple and reliable solution for such problems.

A great feature of using GUIDs is that, once we agree on a GUID-generation algorithm, GUIDs allow publishers to generate identifiers locally (without central coordination) and still guarantee the uniqueness of the identifier, avoiding any integration conflicts.

For globally unique document identifiers PMP uses UUID version 4 identifiers, represented as 32 hexadecimal digits with optional dashes after the 8th, 12th, 16th, and 20th digits.

ATTENTION: Publishers that generate identifiers locally are required to properly implement UUID v4 generation (per: RFC4122) and are requested to use cryptographically strong random number generation. If you do not want to go through the trouble of implementing your own generator, PMP API provides an API endpoint that can generate proper UUIDs for any client.

The UUID v4 generator that PMP uses is based on the following open-source code: https://github.com/broofa/node-uuid, is RFC4122 compliant and does use cryptographically strong random number generation.

Formatting Notes for GUIDs in PMP

Input

We accept GUIDs with or without dashes, normalizing them before validation.

Output

We output GUIDs with dashes, for example: "d7c37b07-7cf2-4558-8b45-7e5694e818fe".

Storage

We store GUIDs with dashes, to avoid having to check on output to ensure that they have them.

Search

In search documents, we index GUIDs without dashes, so that we can use them in text fields.