Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Work out what we want to do with Enketo XML, and do that thing #5549

Closed
SCdF opened this issue Mar 28, 2019 · 11 comments
Closed

Work out what we want to do with Enketo XML, and do that thing #5549

SCdF opened this issue Mar 28, 2019 · 11 comments

Comments

@SCdF
Copy link
Contributor

SCdF commented Mar 28, 2019

When creating reports we store the user-entered data effectively twice:

  • Once as the raw XML that Enketo returns to us, for use by subsequent Enketo invocations (i.e editing docs)
  • Once as extract JSON in doc.fields.{}, for use by everything else (on-screen display, configurable logic like rules and schedules, impact and analytics via couch2pg, etc)

Originally we stored the XML in the doc.xml property, as a string.

We later realised that this is inefficient / unhelpful, because CouchDB's view generation performance is a function of the size of a JSON document, and nothing refers to it except editing. So, we moved it into an attachment.

However, this has caused other performance problems:

  • Replication is less efficient in general, because the algorithm requests attachments separately
  • Replication is far less efficient relevant to CouchDB 2.0, because we can now batch document GETs, but not attachments GETs, so the disparity is far more obvious
  • For a reason yet to be investigated, PouchDB view generation is far slower with attachments present (it should be at worst identical).

We need to work out what to do here.

Obvious options include:

  • Prioritise replication speed and PouchDB view generation performance over CouchDB view generation performance and move the XML back into a string against the JSON
  • Delete the XMl altogether, and re-generate the XML from the doc.fields object when we want to edit (hat-tip @dianabarsan for that brilliant suggestion)
  • Do any amount of PouchDB investigation and performance improvements to solve any or all of the problems listed above

Diana's suggestion sounds ideal, and IMO should be investigated first, but it can be up for discussion.

Context: https://docs.google.com/document/d/1kXjsF-orndz0-V5UFf_Esx5J-ogeQ6_tALCwE1K0tvA/edit#heading=h.29bkt8nfi7kw

@garethbowen garethbowen added Type: Performance Make something faster Priority: 2 - Medium Normal priority labels Mar 28, 2019
@garethbowen garethbowen added this to To do in 4.0.0 via automation Sep 3, 2019
@garethbowen
Copy link
Member

Scheduling for 4.0.0 because that's the ideal time to do any data migrations that may be necessary for this.

@garethbowen garethbowen added this to Needs triage in Core Engineers Backlog via automation Sep 18, 2019
@garethbowen garethbowen moved this from Needs triage to To do in Core Engineers Backlog Sep 18, 2019
@garethbowen
Copy link
Member

@dianabarsan assures me that we can reverse engineer from JSON to XML so we should just do that.

@garethbowen garethbowen moved this from To do to Scheduled in Core Engineers Backlog Mar 3, 2020
@kennsippell
Copy link
Member

kennsippell commented Mar 30, 2020

Is it possible to split this issue into two?

  1. Convert JSON to XML and stop attaching the XML to new reports.
  2. Perform the migration to remove the attachment for all legacy reports?

If I understand the approach correctly, this would allow us to discuss scheduling the first issue in a 3.x timeframe and would be beneficial to new projects (and old).

@garethbowen
Copy link
Member

Yes, I should have updated this issue so it now only covers step 1. We've decided not to run any major migrations in 4.0.0 so step 2 will have to wait until 5.0.0, and/or be provided as a custom script that each project can choose to run, or not.

@kennsippell
Copy link
Member

So if we are just converting JSON to XML and not attaching the XML to new reports, is this regarded as a breaking change?

@garethbowen
Copy link
Member

It depends on the implementation, because we may need to store more information to completely reproduce the xml, but in all likelihood it's not a breaking change.

@garethbowen garethbowen removed this from Ready for dev in 4.0.0 Mar 31, 2021
@garethbowen garethbowen added this to Scheduled in 4.0.0 via automation Mar 31, 2021
@dianabarsan dianabarsan self-assigned this May 6, 2021
@abbyad abbyad added this to the 4.0.0 milestone May 18, 2021
@jonathanvq jonathanvq added this to Reviewing or Needs Review in Allies Workstream - DEPRECATED via automation Mar 22, 2022
@garethbowen garethbowen moved this from Reviewing or Needs Review to Ready for dev in Allies Workstream - DEPRECATED Mar 22, 2022
@dianabarsan dianabarsan moved this from Ready for dev to Dev in progress in Allies Workstream - DEPRECATED Apr 12, 2022
@dianabarsan
Copy link
Member

dianabarsan commented May 5, 2022

This is ready for AT on 5549-no-report-xml-attachment. The branch is built on top of the Enketo uplift branch.

There should be no difference for the user when filling or editing forms. You should be able to:

  1. Edit a report, generated previous to this change (and that has a content attachment), and have all form fields filled. The attachment should be removed when saving.
  2. Create a new report. The new report should not have an attachment.
  3. Edit a report, generated after this change, and have all fields filled.

There are a few caveats:

  1. If the form itself is changed, the new/edited fields won't get populated. This behavior existed before, but should be tested for consistency.
  2. Editing a report that created extra docs duplicates all extra docs #7594 - this behavior existed before
  3. Reports created as extra docs in forms are not editable #7605 - the additional context part
    Reports created as extra docs might not always be faithful to the model of the form itself.

@kennsippell
Copy link
Member

Curious - after getting this fix via an upgrade to 4.0, would projects be able to remove attachments from the reports created via 3.x versions without breaking anything?

@dianabarsan
Copy link
Member

dianabarsan commented May 5, 2022

Mixed. Yes for the most part, with one exception.

Some changes have been required:
db-doc fields will be saved in the main report fields. To maintain old report content display, these fields will become "hidden".
Because of (1), it's not safe to default to using fields when editing a report that has a content attachment, so if the content attachment exists, it will be used to populate the form, but it will be deleted upon saving.

Quote from PR description: #7596

So, for reports that created other docs (extra docs), if the attachment was removed, the information about the "extra docs" is lost (the docs themselves exist in the database already). Editing those reports is actually already broken (one of the caveats).

Deleting attachments from existent reports and saving them, if partners decide to do this, should be one of the test scenarios!

@ngaruko ngaruko self-assigned this Aug 31, 2022
@ngaruko
Copy link
Contributor

ngaruko commented Sep 6, 2022

  1. Edit a report, generated previous to this change (and that has a content attachment), and have all form fields filled. The attachment should be removed when saving.

Before : image
After:
image

  1. Create a new report. The new report should not have an attachment.

image

  1. Edit a report, generated after this change, and have all fields filled.

image

dianabarsan added a commit that referenced this issue Sep 7, 2022
Instead of relying on xml attachments, uses report fields to populate form model on report edits.

Some changes have been required:
(1) db-doc fields will be saved in the main report fields. To maintain old report content display, these fields will become "hidden".
(2) Because of (1), it's not safe to default to using fields when editing a report that has a content attachment, so if the content attachment exists, it will be used to populate the form, but it will be deleted upon saving.
(3) bindJsonToXml is refactored so it treats arrays (repeats) correctly.

Limitations:
- Editing a report that created extra docs duplicates all extra docs #7594 - this behavior existed before
- Reports created as extra docs in forms are not editable #7605 - the additional context part
- Reports created as extra docs might not always be faithful to the model of the form itself.

Migrates a bunch of e2e tests from protractor to wdio.

#5549
@dianabarsan
Copy link
Member

Merged to master.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Priority: 2 - Medium Normal priority Type: Performance Make something faster
Projects
Status: Done
Development

No branches or pull requests

6 participants