Join GitHub today
GitHub is home to over 28 million developers working together to host and review code, manage projects, and build software together.Sign up
Unique IDs #35
Opening up a discussion around a universal unique ID system for records, that all vendors can participate it, allowing single records to have a unique IDs across sytems. Need to conduct more research on best practices, and engage in conversation with vendors.
Our IDs are a combination of a 3-letter code identifying the managing Agency (effort is made to avoid Agency Code re-use) and a 4- or 5-digit number, that we call a NUM. Like so: ABC0001. This numbering system itself has been in use for about 40 years, and we've used it for bidirectional data exchange in CIOC (including between several 3rd party systems) for 13 years. Internally, as an implementation detail, we also keep an unchangeable auto-number integer ID.
There are a few advantages to this system:
We do import from systems that don't support the NUM. For those cases, we hold an "external ID" field that holds a quantity of non-typed data as text (for flexibility, so we can accept any variety of ID). If a record comes in for import without a NUM-format ID, a new NUM is generated and the incoming ID is kept in the external ID field for future matching and updates. The 3-digit Agency code is still required for all records being imported, even for non-NUM ID records, and used to generate the NUM. We also keep a source database field separate from the Agency code, meaning that ID, source database, and Agency owner code (maintainer) can all change independently as needed and still keep traceability of the record over time.
I would expect that, if this concept moved beyond Canada, there would be a need to a) extend the size of the Agency codes and therefore the available pool and b) maintain a master list of codes somewhere for exchange, where people could register their code, look up codes, etc. Historically people ping me to ask if anyone is using a code, but I couldn't do that for thousands of additional record owner agencies! lol
Having imported data now from many hundreds of external databases and systems, I can say that we need to support the possibility of long alphanumeric UIDs because there is so much variation in what people use.
If we want the UIDs to be universally unique then we either need some centralized arbiter handing out IDs (can't see that happening anytime soon) or each vendor and/or dataset instance needs its own UID to be prefixed before record IDs, or concatenated across those fields to create one UUID per record.
Relatedly not all software systems (including mine) give UID's to child record types in HSDS like Accessibiity and Eligibility that the spec calls for. For now we've just been populating those fields with the UID of the parent record but that is not a UUID.
And I'll resist the temptation to assert that best practice in choosing a UID type is to use a monotonically increasing integer. Let's save that for the happy hour session at the First Annual Open Referral conference that @greggish needs to organize in Hawaii someday.
On another note, when one does an Insert, is it expected to provide a UID, or should the Response include one that is autoassigned by the receiving system?
What if the request provides a UID that conflicts with one that already exists in that system? Should it yield an error message and also maybe suggest a new UID that would not conflict, rather than have the poor requestor systematically guess at alternative UIDs ?
@kinlane I think the spec needs a position on this.
I think the key question here is whether data providers or data consumers/aggregators/API backends are responsible for ensuring the uniqueness of identifiers
If data providers are responsible, there are three broad patterns we could adopt:
The GUID pattern is the most distributed. The registered prefix approach the most centralised - but that centralisation can be good for keeping track of the community of users of a common spec and standard, and need not involve heavy processes.
If data consumers are responsible, then they need to assign each incoming source of data with a prefix, and use that in their internal records of the ID, whilst recognising that two sources might both have identical identifiers for different resources.
On the question of whether the data provider, or the API, should assign an identifier for an INSERT, unless we go with a strict GUID approach, I would suggest that both options need to be available, as in some cases a provider may be synchronising existing identified records: in other cases, creating brand new records.