Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Same patient id assigned to multiple patients in a same instance #7670

Closed
1yuv opened this issue Jul 9, 2022 · 25 comments
Closed

Same patient id assigned to multiple patients in a same instance #7670

1yuv opened this issue Jul 9, 2022 · 25 comments

Comments

@1yuv
Copy link
Member

1yuv commented Jul 9, 2022

Describe the bug
I've seen that with same patient_id has been assigned to more than one patient. This causes a huge impact on the service being delivered as two patients will be treated as a single patient. This also has impact on impact monitoring and analytics.

To Reproduce
Steps to reproduce the behavior:

  1. Go to dho baitadi's production instance.
  2. Open document d935a8c916709b7b13a46e01b97fb9c2
  3. Open document 64644cba05d134cef802c0202a61c855
  4. They are assigned a same patient_id 97032.

Expected behavior
Each patient should have an unique patient id generated for them.

Environment

  • Instance: baitadi production instance
  • App: Web App
  • Version: 3.7.1
@1yuv 1yuv added the Type: Bug Fix something that isn't working as intended label Jul 9, 2022
@dianabarsan
Copy link
Member

Checking the instance, I'm noticing that the two documents were created in Sat Feb 17 2018 and Thu Sep 13 2018 respectively.
3.0.0 was released in Nov 14, 2018.
Can you check which version the app was using at the dates when these documents were created?
Can you find any more recent duplicates?

@dianabarsan
Copy link
Member

Very many things have changed in the code that affects how transitions work, that it would be impossible to know what the problem was without knowing exactly which version the server was running at the time the duplicate id documents were created.

@1yuv
Copy link
Member Author

1yuv commented Jul 9, 2022

Hi @dianabarsan , per this comment, upgrade from 2.13.x to 3.7 was completed on December 5, 2019. We saw this document f87bfb6b-b5f7-4b55-bec7-173d10770be1 assigning previously assigned patient id on December 7, 2019.

@dianabarsan
Copy link
Member

dianabarsan commented Jul 9, 2022

I'm suspecting something went wrong with a migration.
I checked the two documents linked in the initial issue, and failed to find the corresponding patients. We have a migration that was supposed to create those patient contacts for every registration. At the same time that migration was added, we switched the way we search for patient_ids for duplicates, from registration searches to contact searches.
I'm suspecting the migration did not finish, given the missing contacts.

It was weekend and I was wrong, I was able to find the patient document for both duplicates.
What I am noticing is that the patients were registered correctly in the more recent document.

For the initial example we have:

  • report d935a8c916709b7b13a46e01b97fb9c2 created on Sat Feb 17 2018
  • report 64644cba05d134cef802c0202a61c855 created on Thu Sep 13 2018
  • patient 64644cba05d134cef802c0202a61d0f0 created on Thu Sep 13 2018

For the second example, we have:

  • report b064e446ca0781548030213eb9b7eb95 created on Wed Apr 11 2018
  • report f87bfb6b-b5f7-4b55-bec7-173d10770be1 created on Sat Dec 07 2019
  • patient b8b45c1522ffdd29322e0a98b3020b71 created on Sat Dec 07 2019

In both these examples, the "later" report ended up creating the patient.

@dianabarsan
Copy link
Member

is it possible that the initially created contacts were accidentally deleted?

@dianabarsan
Copy link
Member

I'm asking because I found a couple of issues that talk about deleting training data on this instance.
For example this issue requests a rollback because real data was deleted by mistake, along with training data: https://github.com/medic/medic-projects/issues/3481 . This issue is from February 2018.

@1yuv
Copy link
Member Author

1yuv commented Jul 13, 2022

is it possible that the initially created contacts were accidentally deleted?

Hi @dianabarsan, I can find older documents and patients as well on production instance and so I doubt that previous id was reassigned because old one got deleted. Even if there was a logic to reuse id as such, wouldn't it be a problem ?

@dianabarsan
Copy link
Member

Even if there was a logic to reuse id as such, wouldn't it be a problem ?

Then we have 2 problems here:

  1. Is it possible for the app to generate a patient_id on a registration without actually creating the patient contact document?
  2. If a patient document was deleted, should we allow the patient_id to be reassigned?
  3. How to fix an instance that already has patient_ids duplicates.

I can find older documents and patients as well on production instance and so I doubt that previous id was reassigned because old one got deleted

I'm not so sure, since I found issues about training data being mixed with real data, and even being deleted by mistake.

@1yuv
Copy link
Member Author

1yuv commented Jul 13, 2022

I'm not so sure, since I found issues about training data being mixed with real data, and even being deleted by mistake

I think the issue you linked is for another instance, and that might have happened for that particular instance.

But, I don't believe same happened for 7 different project instances I listed here. I also searched for likely issues that you linked in above comment.

@dianabarsan
Copy link
Member

Looking at these projects can we:

  1. have a list of duplicates with their reported_date
  2. know which version the app was running at the time of duplicate creation

@1yuv
Copy link
Member Author

1yuv commented Jul 13, 2022

Hi @dianabarsan , I've shared you the requested information. Apart from this case, most of the issues appeared to have occurred within 2.x version.

@dianabarsan
Copy link
Member

I think we have some evidence about why this is happening.

Both API and Sentinel generate a list of patient_ids at the beginning and check for uniqueness, and don't re-check when attaching the patient_id to the new contact. This means that API and Sentinel can have the same patient_id in their "cache" (because at the moment of generating this cache, the patient_id is not used) and then as contacts are created, produce collisions.

Thanks @binokaryg and @1yuv for the investigation!

@1yuv
Copy link
Member Author

1yuv commented Oct 3, 2023

Hi @dianabarsan , as we saw similar problem in 4.x deployment medic/config-moh-nepal#1034 for the reason that you mentioned here. Is it posisble to make it configurable for only one (API or Sentinel) to generate and attach IDs? Or any other options for us to avoid collision ?

@garethbowen
Copy link
Member

I think we should allow both API and Sentinel to generate patient IDs to allow for hybrid SMS and smartphone deployments and instead change the algorithm somehow to not generate conflicts. Some ways I can think of are...

  1. Add a check before saving the doc to ensure the patient_id hasn't been used. This is wasteful most of the time, and only shortens the window a conflict can occur.
  2. Add another digit to the patient id so sentinel IDs start with 1 and API IDs start with 2. This makes IDs harder to type and more error prone.
  3. Modify the checksum... the last digit of the patient id is a checksum digit which is designed to detect if someone has typoed the patient ID. I don't think we ever actually check this digit it just means the chance of a typo matching a different patient is 10% of what it would be. If we change API or sentinel to add 5 to this digit then make it more likely a typoed ID will match, but make it impossible for API and Sentinel to generate the same ID.

@garethbowen
Copy link
Member

@1yuv What's the priority on this? Do you have a workaround?

@1yuv
Copy link
Member Author

1yuv commented Oct 3, 2023

Hi @garethbowen , we haven't found the workaround for this yet. We've mToT starting 12th October for province and districts. Production date is not finalized yet, but it'll be around end of October. We expect workaround of fix to be available by the production time if not possible during ToT.

@garethbowen
Copy link
Member

On the path to horizontal scalability and high availability for API and/or we're going to have to solve this properly because every API will have its own cache and conflicts will get progressively more common.

A more complete solutions would be to use a shared cache instead of a per-process one. This could be a doc in couchdb (which has its own set of problems) or a separate service (eg: redis) which has complexity cost.

I can't help but feel this cache is a premature optimisation - making the view request to find a unique shortcode should be quick enough. Just generate 100 and query the view (like we do now), but just use the first one and throw the rest away. There's still a small chance of conflict if two patients are registered at the same time, but it's now a very small window.

@dianabarsan I'd love your input here, both on a patch for backporting on a tight deadline, but also the more complete solution for when we have time to think it through...

@1yuv
Copy link
Member Author

1yuv commented Oct 5, 2023

On throwing away rest of the codes and using few numbers to avoid collision, are we not already doing that? Because I checked in couple of instances and I see less than 10,000 patient ids that are 5 digit long, same with 6 digit long, and now 7 digit long patient IDs are being generated. Only on one instance there are 16,000 IDs that are 6 digit long and now 7 digit long ids are being generated now. Given, only 10 out of available 100 (10,000 from 99,999) ranges of codes are used already, I doubt on how much it will help in reducing collision by making 1 out of 100.

@garethbowen
Copy link
Member

On throwing away rest of the codes and using few numbers to avoid collision, are we not already doing that?

When the ID generator gets too many collisions it bumps the length automatically. This is a little random so it could happen prematurely.

less than 10,000 patient ids that are 5 digit long

Remember 5 digit long numbers only have 4 digits of random and 1 checksum, so you would expect less than 10k patient IDs that are 5 digits long. Once the random collisions become too frequent, the length just gets bumped.

I doubt on how much it will help in reducing collision by making 1 out of 100.

The change I'm suggesting is instead of finding 100 unique IDs and storing them in memory until they're all used up (which could take days), just generate one when needed and check if it's unique right away. That way we reduce the window of possible collision down from days to milliseconds.

@garethbowen
Copy link
Member

This was introduced in 3.5.0 in this issue but has only become a problem with projects running combined smartphone and sms deployments.

@garethbowen
Copy link
Member

Marking as high priority because the consequence of a duplicate ID can be serious for the patient.

@garethbowen
Copy link
Member

Merged and backported to 3.17.2, 4.2.4, 4.3.2, 4.4.1

@1yuv
Copy link
Member Author

1yuv commented Mar 18, 2024

I am reopenning this issue since we've seen same patient id assigned to multiple patients on the fix release (4.5.0) as well. medic/config-moh-nepal#1197.

@1yuv 1yuv reopened this Mar 18, 2024
@garethbowen
Copy link
Member

garethbowen commented Apr 22, 2024

@1yuv From https://github.com/medic/config-moh-nepal/issues/1197 as Diana says it looks like one of the docs had a patient_id not generated by sentinel. Furthermore the one that replicated first was the one that ran the generate ID code. So we need to figure out how the patient_id got generated on the doc that replicated to the server second. Do you have any context that could help here? Is it possibly being generated offline? Or is the record imported from somewhere else?

At this point I think the cause of the duplicate id is not the same as this original issue.

@dianabarsan
Copy link
Member

I believe the reported issue was closed, it was training data that had duplicate ids.
https://github.com/medic/config-moh-nepal/issues/1197#issuecomment-2025837651

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment