Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Potential Idempotency Issues in the NormalizeDeviceData Function #271

Closed
NullPointer4096 opened this issue Apr 24, 2023 · 4 comments
Closed

Comments

@NullPointer4096
Copy link

APIs Involved:
EventHubTrigger, EventHubMeasurementCollector, and IDataNormalizationService

Description:
The NormalizeDeviceData function does not currently handle conditions of unwanted retries. As a result, it may process the same input events multiple times and store duplicate normalized measurements in the output Event Hub. To address this issue, we need to ensure that each input event is processed only once, preventing the insertion of duplicate normalized measurements.

Steps to reproduce:
For each API listed above:

  1. Simulate a failure/error exit after that API but before the function returns.
  2. Run the function for the first time and let it endure that failure/error exit.
  3. Let the function automatically retry with the same input.

Expected behavior:
The NormalizeDeviceData function should process each input event only once and skip any subsequent calls with the same events.

Proposed solution:
Introduce idempotency checks in the NormalizeDeviceData function to ensure that activities are sent only once for each unique message. This can be achieved by:
Though the deviation of the prevalence counter is not a huge problem, such an issue can be resolved by maintaining a LastRequestId field in each table entry and blob entry, which stores the invocation id of the last NormalizeDeviceData function call. The invocation id is constant across Azure Function retries. Before updating a table/blob entry, if the comparison finds item.LastRequestId == FunctionContext.InvocationId, so no more redundant action would be required. Otherwise, if item.LastRequestId != FunctionContext.InvocationId, update the prevalence counter and the LastRequestId field with FunctionContext.InvocationId.

By implementing these changes, the NormalizeDeviceData function will become idempotent and avoid processing the same input events multiple times, ensuring proper usage of the APIs listed above.

Thank you for your contribution to the Github community and I really appreciate your effort in going through this issue.

@namalu
Copy link
Member

namalu commented May 1, 2023

@NullPointer4096 - Just to clarify this issue, The NormalizeDeviceData function can add duplicate normalized measurements to the Event Hub when an error occurs. Is this correct?

I ask because the data processing has not completed at this point. After this first stage, the duplicate normalized measurements will be read by the MeasurementCollectionToFhir function and de-duping takes place there. The data output is idempotent at the FHIR Observation, not the normalized measurements.

Is this a performance improvement you are suggesting? Or are you using the normalized measurements for something that requires no duplicates be emitted?

@NullPointer4096
Copy link
Author

NullPointer4096 commented May 3, 2023

Dear @namalu ,

Thanks for getting back to me. Pardon me for my ignorance, but I was not able to follow the invocation flow from the NormalizeDeviceData function to the MeasurementCollectionToFhir function that you have mentioned. The former seems to be outputting to event hub, and the latter seems to be taking http post triggers. In addition, I don't find the output event hub used in other functions within this repo. I wonder how these two functions are connected so that the de-dup action can happen. I would really appreciate if you can elaborate on this flow of functions. Thanks!

Best wishes

@namalu
Copy link
Member

namalu commented May 3, 2023

@NullPointer4096,

The solution uses Azure Stream Analytics as a processing step before the MeasurementCollectionToFhir Function is invoked. Azure Stream Analytics reads from the Event Hub and groups normalized measurements by identifiers and sends the normalized measurements to the MeasurementCollectionToFhir Function (via the HttpTrigger).

@dustinburson
Copy link
Member

As designed. Eventual consistency and replay is expected.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants