New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Versioning: support basic data migration #1523

Closed
avernet opened this Issue Jan 31, 2014 · 23 comments

Comments

2 participants
@avernet
Collaborator

avernet commented Jan 31, 2014

As has been suggested, simply removing the fields that have been deleted and adding the field that have been added might be good enough as a first shot. This would not guarantee a perfectly good migration, but might be good enough for most scenarios. Some kind of user interaction to review the migration would be good to have.

@ebruchez

This comment has been minimized.

Collaborator

ebruchez commented Jun 11, 2014

Two aspects:

  • associate data with new version
  • migrate data

@ebruchez ebruchez added this to the Review milestone Jun 26, 2014

@avernet avernet modified the milestone: Review Oct 17, 2014

@ebruchez

This comment has been minimized.

Collaborator

ebruchez commented Mar 8, 2016

+1 from me today for adding/removing fields. Would already cover quite a bit of ground.

@ebruchez

This comment has been minimized.

Collaborator

ebruchez commented Mar 8, 2016

What would the inputs of the migration process be? Probably:

  • the old form definition
  • the new form definition
  • possibly, data associated with the old form definition

We don't have a access to the history of changes done by the user in Form Builder.

An obvious initial idea is to identify what's changed and not by looking at control names only.

For migration purposes, we can consider that the relative order of binds/data/controls within their container does not constitute an incompatible change. Migration could still update the order of elements.

What would the algorithm look like?

  • easy case
    • bind hierarchy is the same (section names)
    • names added/removed within sections, maybe even moved between sections (not across repeats)
    • no validation/required changes
  • harder cases
    • section added
    • section becomes repeated

If validation/required changes, there is the issue of whether the old data will be valid per the new validation rules.

What would a UI do?

  • try to match if the migration follows an existing, handled scenario
  • summarize the detected changes to the user
  • if needed, ask the user whether to check the data too (e.g. to see if validation rules still apply)
  • proceed to migrate only upon user confirmation
@ebruchez

This comment has been minimized.

Collaborator

ebruchez commented Mar 18, 2016

One question that was asked recently is why do we store enclosing elements for (non-repeated) sections? That prevents moving controls between sections and keeping data that is compatible.

My current answer is that:

  • the section information might be useful to some users
  • data migration should take care of that anyway (see comment above)
  • these elements are needed in the internal format for controlling visibility and readonly, and removing/adding them when writing/reading adds complexity (we already do some of that for repeated grids)

We might give a little bit more thought to the idea that maybe our data format in the database might not need the enclosing section elements.

@ebruchez

This comment has been minimized.

Collaborator

ebruchez commented Mar 21, 2016

@ebruchez

This comment has been minimized.

Collaborator

ebruchez commented Mar 24, 2016

+1 from customer for a comparison feature telling the user what differences there are between the form definitions.

@ebruchez

This comment has been minimized.

Collaborator

ebruchez commented Apr 7, 2016

If we start by showing differences, what can we realistically do?

  • structural changes
    • controls added/removed within sections, maybe even moved between sections
    • sections added/removed/moved
    • changes to/from repeated
  • value changes
    • metadata changes: form title, etc. (also: i18n)
    • resources changes
    • other changes (min/max, etc.)

The value changes are easy to handle. The structural changes harder, because it gets down to a structural diff algorithm, and that's open-ended. But what we could do is start simple and handle the simple cases.

The diff would output a user-friendly list of changes in a Form Builder dialog.

@ebruchez ebruchez added the 4 Points label Apr 7, 2016

@ebruchez

This comment has been minimized.

Collaborator

ebruchez commented Apr 7, 2016

Here is a Java library to compute tree edit distance. Their latest library implements very recent algorithms. It is called APTED and is under MIT license.

@ebruchez

This comment has been minimized.

Collaborator

ebruchez commented Apr 7, 2016

A first step would be the following:

  • write simple form-to-bracket-notation-tree
  • try out APTED to see what results it provides in terms of delete-insert-rename

There is no "move" operation in these tree edit distance problems, so I wonder if that's a good approach.

@ebruchez

This comment has been minimized.

Collaborator

ebruchez commented Apr 7, 2016

Mmh, it seems that this APTED thing only produces a number, not a list of steps.

@ebruchez

This comment has been minimized.

Collaborator

ebruchez commented Apr 26, 2016

Our case is a little easier than a general-purpose tree diff because we have constraints:

  • nesting only happens for sections and grids, where grids cannot contain any grids or sections
  • while in theory infinite, in practice section nesting is limited to maybe 3 levels (and in most cases there will be 1 or 2 levels of sections)
  • controls can be seen as a collection of attributes, including their name, LHHA, validations, etc., and they are always leaf items

As a first step we could, for each control, repeated grid, and section, compute a hash based on their properties. This is, BTW, what most tree diff algorithms start by doing anyway. This allows us to determine:

  • whether form definitions are identical (except ordering of sections, grids and controls)
  • whether there were simple additions/deletions of sections, grids and controls
  • whether there were simple moves of sections, grids and controls

This would detect a change in, say, control name or label as an addition and deletion, which of course is not great. But it's a start, and reconciliation of differences between non-identical sections, grids and controls can follow.

@ebruchez

This comment has been minimized.

Collaborator

ebruchez commented Apr 27, 2016

Interestingly, all XML diff material is hopelessly out of date, but there is more hope if you search "DOM diff". For example this seems reasonably current and also has a paper.

@ebruchez

This comment has been minimized.

Collaborator

ebruchez commented Apr 27, 2016

And by "reasonably current", that meant 2012. Not great. And while the paper talks about handling moves as a post-processing step, their simple XML diff example online doesn't seem to show a move.

@avernet

This comment has been minimized.

Collaborator

avernet commented Apr 16, 2018

@ebruchez

This comment has been minimized.

Collaborator

ebruchez commented Aug 9, 2018

@ebruchez

This comment has been minimized.

Collaborator

ebruchez commented Aug 9, 2018

For missing XML elements, we could add missing elements automatically without migrating the entire data in the database. How would this work?

  1. One possibility: upon loading the form data, the static tree of controls is traversed and compared with the data. If the data is missing an XML element at a given level, it is automatically added.
  2. Another possibility could be to produce migration information at form definition publish time: the previous form definition would be looked up, and local migration directives would be stored in the form definition, such as "add this element if missing". When loading data, the upgrade script would run.
@ebruchez

This comment has been minimized.

Collaborator

ebruchez commented Aug 10, 2018

Given a hierarchy of controls (either the form definition or another representation), it is easy to create an algorithm that goes through XML data and inserts missing elements.

@ebruchez

This comment has been minimized.

Collaborator

ebruchez commented Aug 10, 2018

Should there be modes/options when publishing and overriding, if the format just adds or even remove such as:

  • do nothing (current behavior)
  • show an error to the user if the data is incompatible
  • migrate (add/remove elements)
@ebruchez

This comment has been minimized.

Collaborator

ebruchez commented Aug 16, 2018

@ebruchez ebruchez added the Dogfood label Aug 29, 2018

@ebruchez

This comment has been minimized.

Collaborator

ebruchez commented Sep 11, 2018

  • check of code from XML Schema generator can be used
    • Rejected, as this code works on the source of the form definition.
  • hook-up at the same spot we have fr-get-document-submission and dataMaybeMigratedFromDatabaseFormat
  • use source of truth and iterate through it to identify missing elements compared to the data provided
  • handle element templates
  • handle section templates
  • handle field removal
    • Q: Anything to do?
    • Q: Should data be preserved or pruned?
    • RESOLUTION: We remove the data.
  • Form Runner option
    • property and metadata setting
    • oxf.fr.detail.data-migration or oxf.fr.data-migration?
  • check whether, when section templates are present, data templates could be found within the main instance (which would be incorrect), as we take the first element with the given name
  • initial tests
  • first commit
  • option of throwing an error if data is not compatible
  • case of POSTed data
  • case of data from service
    • Nothing to do: this is only for the new mode.
  • decide whether/how to handle the correct element order (see also #1361)
  • doc
  • P2: Form Builder options at publication time

What is the source of truth? We could look at:

  • the data template before loading data
  • binds

But there is also the question of section templates. In this case, the data template does not include any data at all, so we would need to go search for the data template in nested section template components. The same would go for binds.

@ebruchez

This comment has been minimized.

Collaborator

ebruchez commented Sep 11, 2018

For element templates, like for attachments:

<instance filename="" mediatype="" size=""/>
  • we cannot use just the bind name, clearly
  • we cannot look into the XBL template, because we are not in Form Builder and metadata is not available
  • we cannot just look into the default instance, because for example repeats might have 0 iterations

What we can do is look at a combination of the default instance AND repeat templates.

@ebruchez

This comment has been minimized.

Collaborator

ebruchez commented Sep 13, 2018

Scenario for section templates:

  • form definition with section templates published and data is created
  • section template is modified to add fields
  • form definition is republished to include new section template
  • existing data is loaded with updated form definition

Now, instance data is handled differently with section templates:

  • Data from bound instance is copied in.
  • If data is missing in the bound instance upon xforms-model-construct-done:
    • The section template's fr-form-template is copied and then mirrored to the bound instance.
    • This case is ok, as fr-form-template is up to date`.
    • This should only happen in new mode.
  • If data is present within the section template:
    • It was copied in from the bound instance.
    • It needs to be migrated if needed.

NOTE: The migration map for grids handles section templates.

ebruchez added a commit that referenced this issue Sep 20, 2018

For #1523: initial support for adding missing fields
- insert/remove elements when data is loaded from persistence
- handle element templates
- handle section templates
- use form property
- logging
- test

ebruchez added a commit that referenced this issue Sep 21, 2018

For #1523: initial support for adding missing fields
- insert/remove elements when data is loaded from persistence
- handle element templates
- handle section templates
- use form property
- logging
- test

ebruchez added a commit that referenced this issue Sep 21, 2018

ebruchez added a commit that referenced this issue Sep 22, 2018

ebruchez added a commit that referenced this issue Sep 24, 2018

@ebruchez

This comment has been minimized.

Collaborator

ebruchez commented Sep 25, 2018

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment