Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Purge: possible issue with attachments when using date ranges #6207

Closed
1 task done
ebruchez opened this issue Feb 27, 2024 · 4 comments
Closed
1 task done

Purge: possible issue with attachments when using date ranges #6207

ebruchez opened this issue Feb 27, 2024 · 4 comments

Comments

@ebruchez
Copy link
Collaborator

ebruchez commented Feb 27, 2024

In the implementation of completeAttachments(), we compute the difference between some attachment paths, and then force-delete some attachments.

We need to make sure that, especially when date ranges are used, we do not delete attachments that we shouldn't delete.

  • review logic
@ebruchez ebruchez changed the title Purge: possible issue with attac Purge: possible issue with attachments when using date ranges Feb 27, 2024
@ebruchez ebruchez self-assigned this Feb 27, 2024
@ebruchez
Copy link
Collaborator Author

ebruchez commented Feb 28, 2024

As a reminder, we intend the following (see #5209):

  • deleting current data ⇒ deleting all historical data
    • because otherwise we'd have orphan or resuscitated historical data
  • similar to the above: we cannot delete data + historical data (DataRevisionHistoryAdt.Include) newer than a certain date unless all historical data is in the range
    • same consequence as the above otherwise
  • we can delete historical data older than a certain date
    • this is the main use case
  • we could delete a range of historical data
    • no problem here either, but I am not yet sure we allow it, and there is no real use case

Things to check:

  • What about purging form definition? It's expected that we don't check dates, right? Check behavior.
    • processFormDefinition()calls processXmlDocument(), which also will purge the form definition, but without using the date range
    • A: We don't allow purging with FormOrData.Form. In that case, we throw a StatusCode.BadRequest.
  • also check that processAttachments() makes sense for purging form definitions
    • A: We don't allow purging with FormOrData.Form. In that case, we throw a StatusCode.BadRequest.

@ebruchez
Copy link
Collaborator Author

ebruchez commented Feb 28, 2024

Independently from the question of attachments, here are more thoughts about things to check regarding the handling of date ranges, when present (dateRangeGteOpt.isDefined || dateRangeLtOpt.isDefined).

Historical data ADT:

  • DataRevisionHistoryAdt.Only
    • current data will remain no matter what
    • so in any case it is ok to delete any portion of historical data without worry, and there is nothing particular to check
  • DataRevisionHistoryAdt.Include
    • if the range doesn't intersect with historical or current data, we are ok: nothing is purged
    • if the range includes all data and historical data, we are ok, we can purge everything
    • if the range intersects with historical data, but doesn't intersect with current data, we are ok
    • if the range intersects with some but not all historical data, and intersects with current data, this is prohibited
  • DataRevisionHistoryAdt.Exclude
    • disallowed for purge, throws StatusCode.BadRequest if requested

In order to detect the disallowed DataRevisionHistoryAdt.Include case above:

  • dateRangeGteOpt.isEmpty: no problem
  • dateRangeGteOpt.isDefined: we need to run through all the historical resultset before purging, so that we can know if there would be any orphan historical entries
    • UPDATE: Actually we only need to know the time of the oldest historical entry.

Tasks:

  • check that code rejects request for purge with DataRevisionHistoryAdt.Exclude
  • maybe make drawing of the DataRevisionHistoryAdt.Include case above
  • update revision history API to return min/max last modified in addition to total count
  • unit tests for DataRevisionHistoryAdt.Include scenarios above
  • unit tests for purging and attachments

@ebruchez
Copy link
Collaborator Author

With date range conditions handled properly (previous comment), we make sure that we eliminate the case of orphan historical data. However, after a purge, we can have:

  • no remaining current data and no remaining historical data
  • remaining current data, but no remaining historical data
  • remaining current data, and some remaining historical data
  • but what we cannot have is some remaining historical data but no current data

In the case where we exclude purging for a document id due to the data ranges condition (previous comment), we do not purge anything for the current document id. So attachments remain and we are ok.

In the other cases, we iterate all current data and historical data, and gather:

  • the attachments processed, that is, for data that we are purging
  • the attachments not processed, that is, for data that is remaining
  • then we delete for processedAttachmentPaths -- otherAttachmentPaths, which achieves the garbagbe collection

In other words, the algorithm was correct, modulo unit tests and the date range issue.

@ebruchez
Copy link
Collaborator Author

So I had written code to not purge data that would leave orphan historical data. That code is in, but with #6210 that code will not actually run. We could, in the future, enable this if desired.

@ebruchez ebruchez moved this from In progress to Done in Orbeon Forms 2023.1.1 Mar 18, 2024
@ebruchez ebruchez added this to Done in Orbeon Forms 2024.1 Mar 18, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
Development

No branches or pull requests

1 participant