Skip to content
This repository has been archived by the owner on Apr 16, 2022. It is now read-only.

Pull & push ops overhaul #736

Merged
merged 106 commits into from
Jun 25, 2019
Merged

Pull & push ops overhaul #736

merged 106 commits into from
Jun 25, 2019

Conversation

ggalmazor
Copy link
Contributor

@ggalmazor ggalmazor commented May 8, 2019

v1.6 QA feedback

Work for the future

  • We should review the message that we put in the table, as opposed to the last available message in the Details dialog, which can be unpredictable due to parallel code.
  • We should add validation to the Aggregate/Central server forms and toggle buttons according to a validation status.
  • We should add support for Central operations in the CLI
  • Discuss what to do with pushing forms to archived projects in Central
  • Improvements and issues of the job tracking code:
    • The UI can group tracking events into each form's detail window, but the UI gets all messages mixed into the std out and err streams, which makes it super hard to track by users (actually, the logs include the form's name, but doing the same on the tracking event messages would result on super bloated user feedback and redundant info in the UI)
    • We currently have an undetermined progress bar in the UI and the user doesn't have a sense of how much of the job is pending to be done
    • It would be easy to get a manifest of the tasks to be performed by Briefcase before actually doing them and produce get a number of tasks that could be use as bound for the progress bar.
    • The tracker would mark tasks as done as progress get tracked.
  • Create media folders (even when they will be empty) in Aggregate and Central, so that behavior is more consistent
  • See why only one page of Details buttons get enabled after receiving tracking events
  • Study why when pushing a form to Aggregate, if we delete the form from Aggregate, Briefcase thinks that the form still exists in Aggregate
    • Caching problem?
  • Study how we could dynamically configure the payload size when pushing to Aggregate with the information it adds on the OpenRosa headers in all its responses.
  • Study how we could identify when a server is an Aggregate server to limit the number of connections to 2 regardless of what the user has configured in the Settings tab

Differences between Aggregate and Central

  • Pull features that aren't supported (yet):
    • Start from last
    • Start from date
  • Export features that aren't supported (yet):
    • Pull before export
  • Briefcase can't push submissions belonging to a form version it doesn't know about

New issues

Pending stuff

  • There's some inconsistent toggling of tabs & buttons while a pull/push op is going on
    • Verified that buttons disable while allowing navigation between tabs during pull/push work
    • Verified that buttons re-enable once the pull/push op ends, or the user cancels it.
  • There's some inconsistency about when a pull/push op actually ends (thus, toggling UI widgets at incorrect times)
    • Ensured that now a new event is published to signal the end of the operation, which re-enables the UI at correct time.
  • "Downloading 0 submissions" could probably be replace by something more meaningful, such as "No submissions to pull"
    • Completely reviewed tracking language
  • Fix export conf dialog - disabled pull before export with single form
  • Make the export operation refresh the pull cursor when using the "pull before export" option (Aggregate only, UI and CLI)

Guillermo's To Do list

  • Test -plla without specifying a form. Use Kasia's Aggregate server. She reports having pulled only 7 from 60 available forms.
  • Test -sfl. Kasia reports -sfd works as expected, so the best guess is that cursors aren't being saved correctly.
  • Test -psha. Kasia reports:
    • On Aggregate < 2 only forms are pushed
    • On Aggregate >= 2 forms and submissions are correctly pushed but the op doesn't end.
  • Test -e -pb. Kasia reports that no file is created in the export dir.
  • Study why pushing a form takes forever but pushing many forms doesn't
  • Study how to load from the old place to make it easier for users to upgrade to v1.6 without losing saved "pull before export" and resume cursors.

Solved stuff

  • Now all CLI and UI ops that pull forms use and save the last cursor
  • Now all CLI and UI ops wait until all tasks have completed
    • On the UI, we also react differently when a form has been pull/pushed, as opposed to when all forms have been pulled/pushed. This lets us precisely sequence when to toggle UI components.
  • Refactored how Briefcase deals with saving Cursors and RemoteServers (Aggregate/Central) into Java prefs to avoid current indirection that the TransferForms and ExportForms create
    • Also took the chance to change pref key naming for consistency and collision avoidance.
  • Errors during a job execution are no longer being swallowed by the executor and show up in the logs
  • We show a deprecation notice when adding the -pp flag in pull ops
  • The new -mhc (max HTTP conns) flag has been enabled and can be used with pull ops.

Added stuff

  • Support (dummy implementation) for pulldata xform function
  • Pull and push processes should perform better because now we download attachments and submissions using parallel streams. This should not interfere on job cancellation.
  • When getting the list of forms in a remote server, we filter out those without name or form ID to prevent issues in Briefcase. This won't be necessary once we implement a central database and we can have compound keys and/or other ways to identify forms locally.

Stuff we have learned

  • The "open browser" (server url link) feature doesn't work on Ubuntu 16
  • Central doesn't add submission date information to submissions
  • Central rejects submissions of forms with versions unknown to it
  • It looks like AppEngine won't allow more than 2 simultaneous connections (determined experimentally). It's not clear if this restriction is applied for the same client IP or for all sources.
  • Running Briefcase with Java 1.9 will fail due to a missing database driver

@ggalmazor ggalmazor added this to the v2.0.0 milestone May 8, 2019
@ggalmazor ggalmazor added this to Needs review in v2.0.0 via automation May 8, 2019
@ggalmazor ggalmazor requested a review from dcbriccetti May 8, 2019 13:26
@dcbriccetti
Copy link
Contributor

Hi! This is marked as a draft. Shall I start looking at it anyway? Will there likely be many more changes?

@ggalmazor
Copy link
Contributor Author

Yes!, just a couple more commits to restore HTTP proxy and add more tests coverage. I was hoping to get into a live review session with you. We can discuss on slack if you want

public JPanel container;
private SourcePanel sourcePanel;
private SourcePanelForm sourcePanelForm;
private SourceOrTargetPanel<T> sourceOrTargetPanel;
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Using "SourceOrTarget" from now on for lack of a better name

@ggalmazor
Copy link
Contributor Author

I realize some commit messages fall somewhat short to explain some whys and that I'd like to make some small changes. I'll be rehashing the commits during my day so that you can have better material in your day, @dcbriccetti. I'll give you a heads up when it's done.

- The build process (at least when done in IntelliJ) will complain about using the methods managed by the UI Designer tool outside the form classes.
- What we would really want is to use form class instances themselves as containers, but I haven't found a way to do this yet.
- Now the TransferPanelForm has a type parameter to set the kind of panel it's hosting (pull or push).
- Now that the Source class has been segregated into the PushSource and PullTarget classes, this type parameter lets us continue having generic UI components in a typesafe way
- Review language in Request, RequestBuilder & Response classes to disambiguate concepts
- Base Request-Response on InputStreams instead of Strings for improved performance (avoid reprocessing responses as much as possible) and smaller memory footprint on big HTTP interactions
- The RequestBuilder now allows creating POST requests with multipart support, and new response mappers to JSON, and XmlElement
- The Response object includes more info from non success responses, required for the ODK Central integration
- Replace reusing/non-reusing conns with max conn number in Http
  - We always want to reuse connections, so having a factory that doesn't it makes no sense
  - We want (in the future) to limit the number of concurrent connections
- Implement POST requests that can send a single or multipart message payload
- For the time being, hardcode concurrent connection number to 8 (1 for tests)
- Add new ways to compose and sequence Jobs
- Improve the JobsRunner's API to deal with type safety in a simpler way, with the tradeoff of losing the builder style API.
  - Also simpified launching Jobs by losing the type parameter in JobsRunner (effectively making it a void operation) to cope with the complexity of futures and sync/async processes, which we don't need to deal with for now. Instead, call sites need to declare callbacks that will be called on success/failure.
- Move pull classes to an aggregate subpackage
- The key change in this commit is the one that declares a new Cursor arg in InstanceIdBatchGetter to be used as the starting cursor, which allows us to pass one when launching the pull operation.
- Also replace string primitives with Cursor instances to make explicit the important business relation between cursors and getting submission lists from Aggregate
  - We make Cursors required for low level operation such as getting the submission id pages/batches/chunks
  - We make Cursors optional for the high level push operation to let calling site decide which cursor to use, either the last saved cursor or a synthetic cursos built using a date provided by the user
- Rename RemoteServer to AggregateServer and extract RemoteServer interface
We are setting the model for a pull/push process:
- One low-level of abstraction individual method for each HTTP interaction
  - It has to be aware of the runner status
  - It has to do error management
  - It has to do logging & user feedback
  - It has to be explicit about side-effects by returning void where possible
  - It has to have unit/integration tests
- One high-level of abstraction method that composes the full operation:
  - It has to compose Jobs that call the individual low-level methods
  - There has to be as much small Jobs as possible
  - The composition has to be coherent with the return type of the low-level methods: supply (non-void) vs run/accept (void)
  - It has to have unit/integration tests
- The implementation uses assets from the Export vertical package that should be moved to a reused horizontal package.
- The implementation uses assets from the Export vertical package that should be moved to a reused horizontal package.
- Move the decision of how to launch the request to the calling site to decouple the operation from the context where it's executed
- Other calling sites might make the decision to launch it synchronously instead
- We control the flow with the JobsRunner API now
- We need this in case the user wants to pull/push forms in bulk. Otherwise, it wouldn't be possible to know to which forms each tracking event message belongs to
- Change "username" to "email", which is more appropriate
- We need to be able to limit HTTP connections in the push operation as well
- For conformity: we did the same thing in TransferForms
- Now we achieve the same things with less indirection
- *Forms classes are right to handle state that they own (selection of forms, export confs, etc.), but they turn to be bad when dealing with pull sources, which they don't own because it's a cross-cutting concept in our domain
- We want to use fresh credentials every time that we need to create a new connection to a server
- Apache HTTPClient stores credentials in two ways:
  - There's a credentials provider that saves used credentials before they're used
  - There's an auth cache that saves used auth schemes
- By disabling auth cache and cleaning the auth provider, we ensure that we will be using the provided credentials each time a new connection is established.
- Simplify JobsRunner by replacing the onSuccess callback with composition of the success action into the individual jobs
- Normalize side-effects of individual jobs and groups of jobs:
  - tracking event sending: (Pull|Push)Event.Success is sent by the individual jobs, (Pull|Push)Event.Complete is sent by the pull|push operation (when all jobs have completed)
  - The export panel unsets the "exporting mode" when all exports have ended
- Review and document what the JobsRunner onError callback does
- Now we read from the app prefs and the tab (local) prefs, giving preference to what we read on the app prefs, which is the "new way to do it"
- Detected an issue with the design of JobsRunner around waitForCompletion and onComplete. Added TODO comments with enough context to retake this in the future
- This way we avoid making requests that are probably going to fail anyway and simplify error handling in the last big block
- By throwing a named exception, we can handle it downstream in a controlled way.
- We need to handle known exceptions like this one and let only "unexpected exceptions" reach the JobsRunner
- Extract reading & storing methods into RemoteServer and Cursor respectively and reuse everywhere
- RemoteServer read methods get both the app prefs and the pull panel prefs in order to try to read from the old location at the pull panel prefs
- Create code regions for prefs management in Cursor and RemoteServer
- Ensure that we only deal with one instance of prefs object for the pull panel class
  (when combined with the "start form last" feature9
- CLI ops need to be able to set a different briefcase dir (-sd arg) instead of using BriefcasePreference objects to get it.
- Otherwise, one can't run the CLI without having set an sd through the UI, and changing to different storage dirs won't be clean (Briefcase will think that forms pulled into one sd are present in another sd)
@ggalmazor ggalmazor merged commit 8d41565 into getodk:master Jun 25, 2019
v2.0.0 automation moved this from Needs testing to Merged | Done Jun 25, 2019
@ggalmazor ggalmazor deleted the pull_push_overhaul branch June 25, 2019 15:40
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
v2.0.0
  
Merged | Done
Development

Successfully merging this pull request may close these issues.

None yet

4 participants