Skip to content
This repository has been archived by the owner on Apr 16, 2022. It is now read-only.

Investigate export of submissions pulled from odk dir. #756

Open
kkrawczyk123 opened this issue Jul 2, 2019 · 9 comments
Open

Investigate export of submissions pulled from odk dir. #756

kkrawczyk123 opened this issue Jul 2, 2019 · 9 comments

Comments

@kkrawczyk123
Copy link
Contributor

Software versions

Briefcase v1.15, 1.16.x, Java v1.x.x, operating system, Aggregate v1.x.x, Collect v1.x.x...

Problem description

I have noticed that instances of Birds form pulled from odk dir are not exported to csv file. That instances do not have instanceId and it shouldn't be considered as an error but ignored in export. When I removed instanceId from one submission of All widgets form pulled from odk dir, the instance is visible in csv file, just the instanceId is not visible in the csv file.

Steps to reproduce the problem

Export the attached sd content.

Expected behavior

Missing instanceId should not be considered as an error, just like in All widgets case.

Other information

Briefcase 1.15 and 1.16 behave in the same way so it is not regression connected to the release.
attached sd: forms.zip

@mayank8318
Copy link
Contributor

@kkrawczyk123 I think this is happening because the Birds form has a "repeat" field which requires the parents key or instance id to make the necessary linking in the repeats' CSV. All widgets form doesn't have this and so doesn't need to be linked to any other repeat CSV.

@ggalmazor

@ggalmazor
Copy link
Contributor

@kkrawczyk123, I've added the instanceID meta field to the Birds form here: Birds.zip

Could you use this form and try to reproduce the issue again with this form?

@chrissyhroberts
Copy link

One of my users has reported unusual export behaviour in v1.16.3 and v1.17.
I have also seen this with GUI, though does not seem to be a problem with CLI.

Trying to export to disk seems to halt unexpectedly at n% progress. The GUI does not hang, but the export button de-greys and you can press again.
Initialising export again also halts, but not necessarily at the same place

Screenshot attached.
Does not seem to be related to the overall number of submissions to export. This screenshot shows from around 90 submissions. I tried with 92000 submissions.

Annotation 2019-09-27 095243

@chrissyhroberts
Copy link

Here's some output from briefcase.log when this happened. Looks like some problem with encrypted forms

2019-09-27 13:43:01,684 [briefcase-pull-4-thread-1] INFO  o.o.briefcase.util.ServerFetcher - already present - skipping fetch: uuid:d48ca856-c8cf-43ef-ae58-19842abb431c
2019-09-27 13:43:01,684 [briefcase-pull-4-thread-1] INFO  o.o.briefcase.util.ServerFetcher - already present - skipping fetch: uuid:fba57dd5-9bcc-4a29-b719-316042736493
2019-09-27 13:43:01,684 [briefcase-pull-4-thread-1] INFO  o.o.briefcase.util.ServerFetcher - already present - skipping fetch: 7GPF8ZMYSJ29516S2QPTKU09F
2019-09-27 13:43:08,485 [ForkJoinPool.commonPool-worker-3] INFO  o.j.core.services.locale.Localizer - getLocaleData finished in 1.659 ms
2019-09-27 13:43:08,500 [ForkJoinPool.commonPool-worker-5] INFO  o.j.core.services.locale.Localizer - getLocaleData finished in 7.894 ms
2019-09-27 13:43:08,500 [Thread-3] INFO  o.j.core.services.locale.Localizer - getLocaleData finished in 19.765 ms
2019-09-27 13:43:08,532 [ForkJoinPool.commonPool-worker-7] INFO  o.j.core.services.locale.Localizer - getLocaleData finished in 2.734 ms
2019-09-27 13:43:15,477 [ForkJoinPool.commonPool-worker-3] INFO  o.o.b.export.ExportProcessTracker - Exported in 00:00:06.8856577
2019-09-27 13:43:17,055 [ForkJoinPool.commonPool-worker-7] INFO  o.o.b.export.ExportProcessTracker - Exported in 00:00:08.4658752
2019-09-27 13:43:20,539 [ForkJoinPool.commonPool-worker-3] INFO  o.o.b.export.ExportProcessTracker - Exported in 00:00:05.0523529
2019-09-27 13:43:22,505 [ForkJoinPool.commonPool-worker-7] INFO  o.o.b.export.ExportProcessTracker - Exported in 00:00:05.4355617
2019-09-27 13:43:23,091 [Thread-3] ERROR o.o.briefcase.ui.export.ExportPanel - Error while exporting forms
org.opendatakit.briefcase.model.ParsingException: org.opendatakit.briefcase.model.ParsingException: Encrypted file not found
	at java.base/jdk.internal.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
	at java.base/jdk.internal.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
	at java.base/jdk.internal.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
	at java.base/java.lang.reflect.Constructor.newInstanceWithCaller(Constructor.java:500)
	at java.base/java.lang.reflect.Constructor.newInstance(Constructor.java:481)
	at java.base/java.util.concurrent.ForkJoinTask.getThrowableException(ForkJoinTask.java:600)
	at java.base/java.util.concurrent.ForkJoinTask.reportException(ForkJoinTask.java:678)
	at java.base/java.util.concurrent.ForkJoinTask.invoke(ForkJoinTask.java:737)
	at java.base/java.util.stream.ForEachOps$ForEachOp.evaluateParallel(ForEachOps.java:159)
	at java.base/java.util.stream.ForEachOps$ForEachOp$OfRef.evaluateParallel(ForEachOps.java:173)
	at java.base/java.util.stream.AbstractPipeline.evaluate(AbstractPipeline.java:233)
	at java.base/java.util.stream.ReferencePipeline.forEach(ReferencePipeline.java:497)
	at org.opendatakit.briefcase.ui.export.ExportPanel.export(ExportPanel.java:184)
	at java.base/java.lang.Thread.run(Thread.java:835)
Caused by: org.opendatakit.briefcase.model.ParsingException: Encrypted file not found
	at org.opendatakit.briefcase.export.Submission.lambda$getEncryptedFilePath$8(Submission.java:220)
	at java.base/java.util.Optional.orElseThrow(Optional.java:408)
	at org.opendatakit.briefcase.export.Submission.getEncryptedFilePath(Submission.java:220)
	at org.opendatakit.briefcase.export.SubmissionParser.decrypt(SubmissionParser.java:185)
	at org.opendatakit.briefcase.export.SubmissionParser.lambda$parseSubmission$4(SubmissionParser.java:141)
	at java.base/java.util.Optional.flatMap(Optional.java:294)
	at org.opendatakit.briefcase.export.SubmissionParser.parseSubmission(SubmissionParser.java:121)
	at org.opendatakit.briefcase.export.ExportTools.lambda$getValidSubmissions$0(ExportTools.java:30)
	at java.base/java.util.stream.ReferencePipeline$3$1.accept(ReferencePipeline.java:195)
	at java.base/java.util.ArrayList$ArrayListSpliterator.forEachRemaining(ArrayList.java:1654)
	at java.base/java.util.stream.AbstractPipeline.copyInto(AbstractPipeline.java:484)
	at java.base/java.util.stream.ForEachOps$ForEachTask.compute(ForEachOps.java:290)
	at java.base/java.util.concurrent.CountedCompleter.exec(CountedCompleter.java:746)
	at java.base/java.util.concurrent.ForkJoinTask.doExec(ForkJoinTask.java:290)
	at java.base/java.util.concurrent.ForkJoinPool$WorkQueue.helpCC(ForkJoinPool.java:1115)
	at java.base/java.util.concurrent.ForkJoinPool.awaitJoin(ForkJoinPool.java:1687)
	at java.base/java.util.concurrent.ForkJoinTask.doInvoke(ForkJoinTask.java:411)
	at java.base/java.util.concurrent.ForkJoinTask.invoke(ForkJoinTask.java:736)
	at java.base/java.util.stream.ForEachOps$ForEachOp.evaluateParallel(ForEachOps.java:159)
	at java.base/java.util.stream.ForEachOps$ForEachOp$OfRef.evaluateParallel(ForEachOps.java:173)
	at java.base/java.util.stream.AbstractPipeline.evaluate(AbstractPipeline.java:233)
	at java.base/java.util.stream.ReferencePipeline.forEach(ReferencePipeline.java:497)
	at java.base/java.util.stream.ReferencePipeline.collect(ReferencePipeline.java:575)
	at org.opendatakit.briefcase.export.ExportToCsv.export(ExportToCsv.java:106)
	at org.opendatakit.briefcase.export.ExportToCsv.export(ExportToCsv.java:58)
	at org.opendatakit.briefcase.ui.export.ExportPanel.lambda$export$8(ExportPanel.java:199)
	at java.base/java.util.stream.ForEachOps$ForEachOp$OfRef.accept(ForEachOps.java:183)
	at java.base/java.util.stream.ReferencePipeline$11$1.accept(ReferencePipeline.java:442)
	at java.base/java.util.ArrayList$ArrayListSpliterator.forEachRemaining(ArrayList.java:1654)
	at java.base/java.util.stream.AbstractPipeline.copyInto(AbstractPipeline.java:484)
	at java.base/java.util.stream.ForEachOps$ForEachTask.compute(ForEachOps.java:290)
	at java.base/java.util.concurrent.CountedCompleter.exec(CountedCompleter.java:746)
	at java.base/java.util.concurrent.ForkJoinTask.doExec(ForkJoinTask.java:290)
	at java.base/java.util.concurrent.ForkJoinPool$WorkQueue.topLevelExec(ForkJoinPool.java:1020)
	at java.base/java.util.concurrent.ForkJoinPool.scan(ForkJoinPool.java:1656)
	at java.base/java.util.concurrent.ForkJoinPool.runWorker(ForkJoinPool.java:1594)
	at java.base/java.util.concurrent.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:177)
2019-09-27 13:43:56,922 [ForkJoinPool.commonPool-worker-5] INFO  o.o.b.export.ExportProcessTracker - Exported in 00:00:03.7942598
2019-09-27 13:43:58,781 [ForkJoinPool.commonPool-worker-3] INFO  o.o.b.export.ExportProcessTracker - Exported in 00:00:05.6369274
2019-09-27 13:43:59,484 [ForkJoinPool.commonPool-worker-5] INFO  o.o.b.export.ExportProcessTracker - Exported in 00:00:02.5627599
2019-09-27 13:44:01,890 [ForkJoinPool.commonPool-worker-3] INFO  o.o.b.export.ExportProcessTracker - Exported in 00:00:03.1143354
2019-09-27 13:44:02,265 [Thread-4] ERROR o.o.briefcase.ui.export.ExportPanel - Error while exporting forms
org.opendatakit.briefcase.model.ParsingException: org.opendatakit.briefcase.model.ParsingException: Encrypted file not found
	at java.base/jdk.internal.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
	at java.base/jdk.internal.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
	at java.base/jdk.internal.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
	at java.base/java.lang.reflect.Constructor.newInstanceWithCaller(Constructor.java:500)
	at java.base/java.lang.reflect.Constructor.newInstance(Constructor.java:481)
	at java.base/java.util.concurrent.ForkJoinTask.getThrowableException(ForkJoinTask.java:600)
	at java.base/java.util.concurrent.ForkJoinTask.reportException(ForkJoinTask.java:678)
	at java.base/java.util.concurrent.ForkJoinTask.invoke(ForkJoinTask.java:737)
	at java.base/java.util.stream.ForEachOps$ForEachOp.evaluateParallel(ForEachOps.java:159)
	at java.base/java.util.stream.ForEachOps$ForEachOp$OfRef.evaluateParallel(ForEachOps.java:173)
	at java.base/java.util.stream.AbstractPipeline.evaluate(AbstractPipeline.java:233)
	at java.base/java.util.stream.ReferencePipeline.forEach(ReferencePipeline.java:497)
	at org.opendatakit.briefcase.ui.export.ExportPanel.export(ExportPanel.java:184)
	at java.base/java.lang.Thread.run(Thread.java:835)
Caused by: org.opendatakit.briefcase.model.ParsingException: Encrypted file not found
	at org.opendatakit.briefcase.export.Submission.lambda$getEncryptedFilePath$8(Submission.java:220)
	at java.base/java.util.Optional.orElseThrow(Optional.java:408)
	at org.opendatakit.briefcase.export.Submission.getEncryptedFilePath(Submission.java:220)
	at org.opendatakit.briefcase.export.SubmissionParser.decrypt(SubmissionParser.java:185)
	at org.opendatakit.briefcase.export.SubmissionParser.lambda$parseSubmission$4(SubmissionParser.java:141)
	at java.base/java.util.Optional.flatMap(Optional.java:294)
	at org.opendatakit.briefcase.export.SubmissionParser.parseSubmission(SubmissionParser.java:121)
	at org.opendatakit.briefcase.export.ExportTools.lambda$getValidSubmissions$0(ExportTools.java:30)
	at java.base/java.util.stream.ReferencePipeline$3$1.accept(ReferencePipeline.java:195)
	at java.base/java.util.ArrayList$ArrayListSpliterator.forEachRemaining(ArrayList.java:1654)
	at java.base/java.util.stream.AbstractPipeline.copyInto(AbstractPipeline.java:484)
	at java.base/java.util.stream.ForEachOps$ForEachTask.compute(ForEachOps.java:290)
	at java.base/java.util.concurrent.CountedCompleter.exec(CountedCompleter.java:746)
	at java.base/java.util.concurrent.ForkJoinTask.doExec(ForkJoinTask.java:290)
	at java.base/java.util.concurrent.ForkJoinPool$WorkQueue.helpCC(ForkJoinPool.java:1115)
	at java.base/java.util.concurrent.ForkJoinPool.awaitJoin(ForkJoinPool.java:1687)
	at java.base/java.util.concurrent.ForkJoinTask.doInvoke(ForkJoinTask.java:411)
	at java.base/java.util.concurrent.ForkJoinTask.invoke(ForkJoinTask.java:736)
	at java.base/java.util.stream.ForEachOps$ForEachOp.evaluateParallel(ForEachOps.java:159)
	at java.base/java.util.stream.ForEachOps$ForEachOp$OfRef.evaluateParallel(ForEachOps.java:173)
	at java.base/java.util.stream.AbstractPipeline.evaluate(AbstractPipeline.java:233)
	at java.base/java.util.stream.ReferencePipeline.forEach(ReferencePipeline.java:497)
	at java.base/java.util.stream.ReferencePipeline.collect(ReferencePipeline.java:575)
	at org.opendatakit.briefcase.export.ExportToCsv.export(ExportToCsv.java:106)
	at org.opendatakit.briefcase.export.ExportToCsv.export(ExportToCsv.java:58)
	at org.opendatakit.briefcase.ui.export.ExportPanel.lambda$export$8(ExportPanel.java:199)
	at java.base/java.util.stream.ForEachOps$ForEachOp$OfRef.accept(ForEachOps.java:183)
	at java.base/java.util.stream.ReferencePipeline$11$1.accept(ReferencePipeline.java:442)
	at java.base/java.util.ArrayList$ArrayListSpliterator.forEachRemaining(ArrayList.java:1654)
	at java.base/java.util.stream.AbstractPipeline.copyInto(AbstractPipeline.java:484)
	at java.base/java.util.stream.ForEachOps$ForEachTask.compute(ForEachOps.java:290)
	at java.base/java.util.concurrent.CountedCompleter.exec(CountedCompleter.java:746)
	at java.base/java.util.concurrent.ForkJoinTask.doExec(ForkJoinTask.java:290)
	at java.base/java.util.concurrent.ForkJoinPool$WorkQueue.topLevelExec(ForkJoinPool.java:1020)
	at java.base/java.util.concurrent.ForkJoinPool.scan(ForkJoinPool.java:1656)
	at java.base/java.util.concurrent.ForkJoinPool.runWorker(ForkJoinPool.java:1594)
	at java.base/java.util.concurrent.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:177)```

@chrissyhroberts
Copy link

Further investigation of this and I think that there could be an issue of compatibility between modern versions of Briefcase and data collected with older versions of aggregate.

For instance, I have an old aggregate server still running (v1.4.15 Production)

If I export with Briefcase v.1.17.0 I get partial export that halts and an empty CSV file

Screenshot 2019-10-21 11 20 47

If I rollback to Briefcase v.1.9.0 then I get a complete export with a CSV file

Screenshot 2019-10-21 11 23 57

NB in this case I got an error in export (not sure if relevant to the problem, so included it in case this is a clue).

I wonder if we need a compatibility grid between versions of briefcase and aggregate...

@ggalmazor
Copy link
Contributor

ggalmazor commented Oct 22, 2019

My first reading of the details you've provided is that modern Briefcase will stall once it gets to a decryption error while processing submissions and that older Briefcase will continue with the export nevertheless.

I'll investigate this and we'll probably have to change things so that errored submissions are ignored without halting the export process.

@chrissyhroberts
Copy link

That's interesting. Presumably if a submission hits one of these pad block corrupted errors it never gets in to the CSV (assuming Briefcase skips and moves on with export)

So this means that some data that have been submitted will never find their way in to the CSV files; but the warning that this error happened is lost somewhere in the briefcase log.

Is there scope to output new logs such as pull.errors.log and export.errors.log to capture these events?

Presumably errors like this are to do with badly formed submissions and can't be fixed. As such it would be helpful to output the meta.instance.id numbers of these submissions to the CSV file. We would want to know if a submission had not successfully pulled/exported/decrypted. As UUID number is not encrypted, it would be feasible to capture this as a line of CSV that is otherwise empty. A reason for error would also be favourable bonus.

i.e. data would look like this

start end v1 v2 v3 meta.instance.id errors
2019-10-01 1203 2019-10-01 1205 YES CHEESE MALE uuid129fc08f-9fa8-4001-ae81-848a672a3a4b
2019-10-01 1207 2019-10-01 1215 NO BEEF FEMALE uuid5532a28b-07a4-42fd-a265-c5c9f895b0bc
uuid5439e7fb-a2de-4b8a-928b-08684601530a BAD_PADDING

@ggalmazor
Copy link
Contributor

ggalmazor commented Oct 23, 2019

So this means that some data that have been submitted will never find their way in to the CSV files; but the warning that this error happened is lost somewhere in the briefcase log.

I agree with this assessment.

Briefcase already creates a (form name) - errors directory inside the export directory where Briefcase copies source submission files that present some parsing error or are found to be invalid, but we don't currently include submissions with decryption issues.

We can use this to provide a quick solution to this problem by including them so that you can process them accordingly. It's not the CSV you're asking for, but it can get you there.

The way this works is that you would find a log message telling you something like: A submission has been excluded from the export output due to some problem: Missing encrypted submission file. The errored submission has been copied to /some/path/form name - errors/parse_error_submission_N.xml. If you didn't expect this, please ask for support at https://forum.opendatakit.org/c/support

  • The log message includes the root cause of the error. In this example, it's Missing encrypted submission file
  • The log message also includes the full path to the errored source submission XML file. In this example is /some/path/form name - errors/failed_submission_N.xml, which will be a subdirectory called (form name) - errors inside the selected export directory.

@ggalmazor ggalmazor self-assigned this Oct 23, 2019
@getodk-bot
Copy link
Member

getodk-bot commented Nov 4, 2019

Hello @ggalmazor, you have been unassigned from this issue because you have not updated this issue or any referenced pull requests for over 15 days.

You can reclaim this issue or claim any other issue by commenting @opendatakit-bot claim on that issue.

Thanks for your contributions, and hope to see you again soon!

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

No branches or pull requests

5 participants