Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Changing to create entities on receipt: reprocessing submissions that resulted in an entity.error #547

Closed
matthew-white opened this issue Nov 17, 2023 · 4 comments · Fixed by getodk/central-backend#1049
Assignees
Labels
backend Requires a change to the API server entities Multiple Encounter workflows

Comments

@matthew-white
Copy link
Member

matthew-white commented Nov 17, 2023

Right now, when you configure an entity list to create entities on submission receipt, you have the option to process pending submissions. Submissions that have already created or updated an entity are not reprocessed. However, I'm noticing that submissions that were processed, but resulted in a processing error (an entity.error event) are reprocessed. I don't think that matters much for submissions that create entities, but I think it has more potential for unexpected outcomes for submissions that update entities.

As noted in getodk/central-backend#1044, it's not known before a submission is processed for entities whether it will have an effect on entities at all, and if it will, whether it's to create or update an entity. Processing pending submissions may process submissions that update entities. I agree with the reasoning in the PR that that's not an issue. However, I think it's more of an issue if such a submission is reprocessed, and that reprocessing is successful even when the first attempt at processing failed.

Steps to reproduce the problem

Set things up:

  • Upload a form that creates an entity list. Publish it.
  • Configure the entity list to create entities on submission approval.

Set up a problematic submission that tries to create an entity:

  • Create a submission that fails to create an entity (e.g., the UUID is invalid, or the label is blank).
  • Approve the submission.
  • Observe that the submission results in an entity.error.
  • Mark the submission as "has issues".

Set up up a problematic submission that tries to update an entity:

  • Create a second submission that would create an entity on approval. Don't approve it yet.
  • Create a third submission to update the entity that the second submission would create.
  • The entity doesn't exist yet, so the third submission should quickly result in an entity.error.

Observe that problematic submissions are reprocessed:

  • Change the entity list to create entities on submission receipt. Select yes for processing pending submissions.
  • Observe that the submission that failed to create an entity has been reprocessed. There are now two entity.errors.
  • Observe that the submission that failed to update an entity has been reprocessed. The second update attempt was successful.

URL of the page

Expected behavior

I'm not sure what the right behavior is. Maybe it's actually nice that if an update failed because an entity didn't exist, the update will be reprocessed. However, I also think that makes it harder to reason about the effects of a change to an entity list's workflow (changing it to create entities on receipt and having it process pending submissions).

Two ideas:

  1. Reprocess submissions with errors, but mention that behavior in Frontend.
  2. Don't reprocess submissions with errors. I think there's a case to be made that the only thing that should fix a submission with an entity.error is a submission edit.

Part of me thinks that it'd be nice to completely separate a change to an entity list workflow from entity updates. In other words, changing an entity list to create entities on submission receipt and processing pending submissions may create entities, but it should never update them. However, I don't know if that separation would play well with the corner case of a submission that either updates or creates entities (create="1" update="1").

Central version shown in version.txt

versions:
15e519a04eefc41049c30d53b6779682f7890764 (v2023.4.0-3-g15e519a)
+f4a1367ed71b2b79306247c30080d42d988418e4 client (v2023.4.0-27-gf4a1367e)
+d444524255f2ab0eadacb82e3436dcec0ae3d198 server (v2023.4.0-37-gd4445242)
@matthew-white matthew-white added needs discussion Discussion needed before work can begin entities Multiple Encounter workflows labels Nov 17, 2023
@ktuite
Copy link
Member

ktuite commented Nov 20, 2023

Not sure where to note this, but this issues seems like an OK place. While investigating this comment from another PR about testing that the last version of a submission is what is used in the convert-pending-submissions process, I noted this scenario:

  • The entity list is configured to create entities on submission approval.
  • A submission is created that would create an entity on approval.
  • The submission is edited to create a different entity.
  • WORKER DOES NOT RUN YET
  • The entity list is reconfigured to create entities on submission receipt. Pending submissions are processed (?convert=true is specified).
  • WHEN WORKER FINALLY RUNS, it processes the FIRST submission event and creates the first entity, not the second entity like it should.

Maybe the event should know the state of the dataset's approval flag at the time the submission came in? Maybe the previous events should get voided or something?

@ktuite
Copy link
Member

ktuite commented Nov 21, 2023

Trying to capture some thoughts I had while writing tests to emulate the scenarios @matthew-white described above.

  • In some cases, it seems nice to be able to process a submission a second time after it yields an error.

    • maybe there was a problem with the uuid or label that can be fixed (but that would probably involve a submission edit)
    • maybe it is the scenario above where an update didn't work the first time because the entity didn't exist yet, but it could be reprocessed later after the entity was created. this feels like it is getting into "offline entity" territory... makes me think of applying updates where the expected base version isn't processed yet or something.
  • In other cases, it seems like unexpected submissions get picked up by Datasets.getUnprocessedSubmissions.

    • the submission above that failed but had its review state changed (hasIssues is one of the review states looked for in the pending submission query)
  • In my note above, it felt like submission were getting double-processed... no, not double-processed but they were getting processed in a different way than I intended.

  • getUnprocessedSubmissions doesn't quite account for this new entity update world

  • I like Matt's comment

    Part of me thinks that it'd be nice to completely separate a change to an entity list workflow from entity updates. In other words, changing an entity list to create entities on submission receipt and processing pending submissions may create entities, but it should never update them.

  • But in the world of entity updates, sometimes reprocessing entity errors could be good?

To be discussed more later!

@ktuite
Copy link
Member

ktuite commented Nov 29, 2023

We decided in the Central meeting today to try to exclude submissions that led to an entity.error event from the pending submissions that get processed.

I can add that to my PR (getodk/central-backend#1049) that currently just tests the scenarios above.

@ktuite ktuite self-assigned this Nov 29, 2023
@matthew-white
Copy link
Member Author

We also decided in the meeting not to worry too much about a change to the approvalRequired flag when there is a backlog of entity events. It would be unusual for someone to change their workflow in a big way after there has already been a large amount of data collection. We're not going to worry about the order in which events are processed (submission.create before dataset.update or vice versa). We're also not going to worry about the possibility that when an older submission.create event from the backlog is processed, the new value of the approvalRequired flag is used.

@matthew-white matthew-white added backend Requires a change to the API server and removed needs discussion Discussion needed before work can begin labels Dec 4, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
backend Requires a change to the API server entities Multiple Encounter workflows
Projects
Status: ✅ done
Development

Successfully merging a pull request may close this issue.

2 participants