Skip to content

general discussion of unduplication of new client records

Eric Jahn edited this page Jul 27, 2020 · 1 revision

When a report (APR/AHAR) is run, we have to report unduplicated counts of clients. If all the unduplication must be done for every report, that is going to be tedious and time consuming. But if the unduplication is done constantly (as a separate microservice?), the duplicates will have already been located and give the same global client id.

Duplicates mainly arise because there are many agencies in the system. They may or may not share data with each other. So, for all they know, they think they are adding a new client, when it's already in the system. Some HMIS systems, by default, require the case manager human being to first search, and all basic identifiers are shared, regardless.

Here is a 2005 document from HUD for implementing a weighted unduplication process: https://www.hudexchange.info/resource/1314/guidelines-unduplicating-and-deidentifying-hmis-client-records/

And by first searching for a preexisting client (see this additional information on client record creation requirements), most dups can be spotted and it can be avoided they ever are created in the first place.

But, we only have an API, so when someone tries to add a new client, the http POST implementation needs to first locate any preexisting clients, and apply the preexisting ID. Well, there need to be two client IDs. One is like a global probabilistic match ID, and one is the unique ID within that agency. If two agencies specifically share data with each other, they can share a unique id (but the global one will still exist, and may be tagged to clients outside the two agency sharing agreement).

Clone this wiki locally