-
Notifications
You must be signed in to change notification settings - Fork 141
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add flag to ignore attachments when downloading submissions csv #150
Comments
Thanks for writing this up, @ghostfreak3000! Did you consider using the OData access? If so, what made you favor the CSV? Did you see that the OData data document provides a straightforward json representation? Does your form definition have repeats? We'll definitely consider your specific request of CSV-only access. However, I do want to make sure you've considered using the JSON representation. Note also that the OData access allows for paging. It won't significantly improve the performance of the export itself but depending on your connection speed it could still be helpful not to download 80mb of data each time. |
Hi @lognaturel Context:The particular project we were working on had a heavy time constraint ( about 2 days ) and the "sync odk to mongodb" was just a small milestone of many. Answer:A: Did we consider Odata? Yes we considered OData because the submissions endpoint did not support filtering of data ( we wanted to only sync data for that day the sync ran ).. but it was dropped for three reasons;
so point 3) effectively negated the main reason for considering OData. B: Does your form definition have repeats? I wouldn't know, i wasn't part of the team that handles form definition, what i do know was that we were supposed to sync all non-multimedia data ( projecs, forms.. etc ) from odk to mongo regardless of if more were added ( Another source of headache considering the AOB: I did tell @yanokwa that i wouldn't mind sending in a PR but I'm not sure when i would have time to do it, so it was decided that the issue with all the context is documented for future reference ( incase someone else wanted to do it ) |
A1. is indeed quite involved. If R is an option at all, the R package ruODK can list projects, forms, form tables, and programmatically download all submissions (with or without media attachments, or with skip logic to only download new or force-download all media attachments) and parse each data type into native R objects. From there, you can write the data to CSV files ready for mongoimport. |
We're still finalizing criteria for v1.1 but are likely to add a CSV-only endpoint for form definitions without repeats. |
@florianm Never had a reason to try out R. Will definitely give it a try |
Great to hear! |
We have just released v1.1, which makes two changes to the API along these lines:
|
Usecase:
Recently we had a requirement to sync form data ( minus multi media ) from our odk instance to one of our mongodb clusters.
The form data made up ~80mb out of a 2gb download ( not sure what the ratio is, but looks like 1:40 ). Considering most of the download trouble was as a result of the large size ( and it's still growing, 6gb as of this writing ), an option to ignore attachments ( maybe via a query param ) would be nice.
As a side note, because this sync runs every 6 hours ( with a standing requirement for it to run every hour ), it brought our modest server server ( 2 CPU, 4GB RAM ) to it's knees and we had to provision a larger one ( about 2x larger )... i'm thinking ignoring the attachments might let us scale back the server ( considering this thing only runs odk and nothing else )... but that's not the main concern.
The text was updated successfully, but these errors were encountered: