Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support repeats in Google Sheets #1851

Merged
merged 57 commits into from Mar 20, 2018
Merged

Conversation

grzesiek2010
Copy link
Member

Closes #1385

What has been done to verify that this works as intended?

I tested many different forms:
A form with one repeat,
A form with multiple repeats,
Media files in repeats,
Empty answers etc.

Why is this the best possible solution? Were any other approaches considered?

It works in a similar way like Briefcase and Aggregate - repeats are stored in a separate sheet.

Are there any risks to merging this code? If so, what are they?

Of course, uploading to Google sheets should be tested carefully.

Do we need any specific form for testing your changes? If so, please attach one.

Any form with a repeat section.

@shobhitagarwal1612
Copy link
Contributor

shobhitagarwal1612 commented Feb 13, 2018

Form used for testing : Forest Plot Survey

  • 4 form instances were uploaded with repeats.
    • 1st form - 3 repeats
    • 2nd form - 3 repeats (same as 1st)
    • 3rd form - 2 repeats (with blank data)
    • 4th form - 1 repeat

Observations :

  • For each row in the non-repeat group, the column with the start of repeat is appearing twice
    (1st image)
  • As you can see in the image attached below, for each submission the repeats are uploaded twice. The number of rows with data in the second image should only be 9 but they are actually 18.
    (2nd image)
  • How will one be able to recognize the repeats corresponding to a particular row after it is uploaded? For example, if I had not mentioned above the number of repeats for each submission then there was no way to find that out by looking at the sheet.

image

image

@grzesiek2010
Copy link
Member Author

@shobhitagarwal1612 I fixed the issue.

@shobhitagarwal1612
Copy link
Contributor

shobhitagarwal1612 commented Feb 15, 2018

For each row in the non-repeat group, the column with the start of repeat is appearing twice
(1st image)
As you can see in the image attached below, for each submission the repeats are uploaded twice. The number of rows with data in the second image should only be 9 but they are actually 18.
(2nd image)

I can verify that both of the above issue are fixed.

How will one be able to recognize the repeats corresponding to a particular row after it is uploaded? For example, if I had not mentioned above the number of repeats for each submission then there was no way to find that out by looking at the sheet.

Can we link the cell from the first sheet to the second sheet's cell? In that way we will know that which repeat belongs to which row in the first sheet.
Something like SheetName!CellAddress (eg Sheet1!B5)

@lognaturel
Copy link
Member

lognaturel commented Feb 16, 2018

Can we link the cell from the first sheet to the second sheet's cell?

This is really important. I think we should match what Briefcase does -- include the instanceId in the parent sheet and a corresponding parentId in the repeat sheets. I believe this always works with Briefcase because if a form doesn't include an instanceId, Aggregate generates one. In this case, perhaps a form with repeats and no instanceId should be rejected.

I believe what is going on now is that the first question is being used as a link, is that right, @grzesiek2010? That will work in most cases if instanceId is part of the form design but will fail in cases like the forest plot form where no instanceId is specified.

@grzesiek2010
Copy link
Member Author

grzesiek2010 commented Feb 16, 2018

I believe what is going on now is that the first question is being used as a link, is that right

yes and I agree that it's important but Forest Plot Survey is a specific form since it doesn't contain instanceID. I noticed that creating forms using xls or ODK Build an instanceID is always there.

Something like SheetName!CellAddress (eg Sheet1!B5)

I think it might be tricky because someone might delete something manually and then such a link might be wrong and another thing is that Aggregate and Briefcase use instanceId for that so I think my current solution is ok. Maybe we should just check if an instanceId exists and don't send such a form if not.

@shobhitagarwal1612
Copy link
Contributor

I think it might be tricky because someone might delete something manually and then such a link might be wrong and another thing is that Aggregate and Briefcase use instanceId for that so I think my current solution is ok. Maybe we should just check if an instanceId exists and don't send such a form if not.

Agreed 👍

@grzesiek2010
Copy link
Member Author

@lognaturel so what should I do? Leve it as it is or add this?:

Maybe we should just check if an instanceId exists and don't send such a form if not.

@lognaturel
Copy link
Member

lognaturel commented Feb 19, 2018

I noticed that creating forms using xls or ODK Build an instanceID is always there.

Correct. XLSForm puts it at the end of the form and in general, there is no guarantee about the location of instanceID in the form. You should explicitly use the instanceID question as the link no matter where in the form it's located. Forms without an instanceID should not be allowed for sheets.

I have tried two forms that fail to upload the main form body. Some of the repeats are uploaded but not all. groups-repeats.xml.txt fails with "No data found" or "Sorry, no form was uploaded" -- I wasn't able to quickly tell what explains why it's not a consistent error. All student names repeats were uploaded but not all student grades. The other form which failed I unfortunately can't share but the failure mode seems to be the same. I can see that there's an ArrayOutOfBounds exception thrown at answers.put(columnTitles.get(j), Iterables.get(answersToUpload.get(columnTitles.get(j)), j == 0 ? 0 : i)) but I haven't yet had a chance to look beyond that. The stacktrace gets swallowed by the Exception catch-all which is a problem. In general, exceptions should be handled as locally as they possibly can. The new structure that uses Exceptions for control flow violates that.

At a high-level, I recommend spending some time looking for opportunities to make the code more readable and maintainable. That line above is a great example where it's unclear what is going on. A combination of additional comments and breaking 7b17a46 with commit messages that explain some of the decisions would also help a lot. For example, is using a guava Multimap really the best option or could you make the code easier to follow by keeping your own lists of maps? It should be immediately clear from looking at the code that the first question (soon to be changed to instanceID) is used to link the main form to repeats. It's not necessarily clear what elements.add(0, mainLevelColumnElements.get(0)); is for. Similarly, fixBlankColumnNames really should have a comment linking to the issue that it's addressing since now that it's moved it'll be harder to get to the commit that added it.

There are a few methods that take in blank collections and then populate them that should instead return (https://books.google.com/books?id=_i6bDeoCQzsC&pg=PA45&lpg=PA45&dq=Output+arguments+should+be+avoided&source=bl&ots=ep3SFo9d46&sig=GkBX61M8oLwgh6wrxx0BBtgKqU0&hl=en&sa=X&ved=0ahUKEwjz2JLo3q3ZAhUOSK0KHZ_TAkQ4ChDoAQgoMAE#v=onepage&q=Output%20arguments%20should%20be%20avoided&f=false). A clear example is uploadMedia -- it should not pass have a parameter for uploadedMedia but should build the collection and return it. There are very short methods like addMediaFiles that make things harder to follow and could simply stay inline. There are some others that are a little less clear what to do with because currently they populate multiple collections. You can see if you come up with clearer ways of writing them and I will take another look as well.

Breaking uploadOneSubmission down into methods has brought up the challenge of how to propagate errors. I see you first tried to use boolean returns and then ended up using generic Exception throwing. I think this is a structure that should generally be avoided -- http://wiki.c2.com/?DontUseExceptionsForFlowControl. It also causes problems when there are unexpected exceptions like the ArrayOutOfBounds exception I described above.

@grzesiek2010
Copy link
Member Author

@dcbriccetti could you review the code?

I changed almost everything in comparison with the old version of InstanceGoogleSheetsUploader class.
I started with code refactoring since the method responsible for uploading submissions had 400 lines.
Then I implemented changes for uploading repeat groups.
After that, I noticed that the way we use for reading column names and answers (parsing XML file) was really difficult to maintain and read (especially with changes for repeats), so I refactored that functionality.

In the end, I investigated everything again and I noticed that some functions don't work and some are not needed:

Copy link
Member

@lognaturel lognaturel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are a lot of really good improvements here. I've provided some more suggestions in line. In particular, it would be great to reduce the amount of state kept to keep things easier to reason about.

There are still a number of methods that throw exceptions and a catch (Exception e) catch-all in uploadOneInstance. @dcbriccetti do you have any best practices to share in this case? The scenario is that a task has multiple subtasks and the failure of any one of them should result in the failure of the overall task. Uploading one instance entails making sure the spreadsheet is set up to receive that instance, gathering all the relevant answers, writing those answers to the spreadsheet, etc. A lot of the calls involved can throw exceptions. Remove the throws Exception in insertRow to see that basically every single line can throw an exception. We don't particularly care about those exceptions and just want to stop (though in an ideal world we would clean up the partial submission).

The original solution included giant methods with lots of try-catches.

}

public boolean isAuthFailed() {
return authFailed;
}

public void setAuthFailed() {
public void setAuthFailedForFalse() {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

setAuthFailedToFalse

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Or unsetAuthFailed or resetAuthFailed or setAuthFailed(boolean authFailed)

@@ -199,12 +199,19 @@ private boolean uploadOneSubmission(String id, File instanceFile, String jrFormI
FormDef formDefFromXml = XFormUtils.getFormFromInputStream(new FileInputStream(new File(formFilePath)));

List<TreeElement> mainLevelColumnElements = getColumnElements(formDefFromXml.getMainInstance().getRoot());
TreeElement instanceIDColumn = getInstanceIDColumn(mainLevelColumnElements);
Copy link
Member

@lognaturel lognaturel Mar 6, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you get this from the database to avoid doing a full pass through the form? It's a small optimization but could make a difference when a form has thousands of questions255 questions.

@@ -113,23 +112,20 @@ public static boolean isLocationValid(String answer) {
.matches();
}

private String getGoogleSheetsUrl(Cursor cursor) {
private String getSpreadsheetUrl(Cursor cursor) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

getGoogleSeetsUrl seems more accurate!

if (!uploadOneSubmission(id, instance, jrformid, token, formFilePath, urlString)) {
cv.put(InstanceColumns.STATUS,
InstanceProviderAPI.STATUS_SUBMISSION_FAILED);
if (token == null) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems token doesn't change in the method so this check should come before any other work is done.

TreeElement instanceIDElement = getInstanceIDElement(getChildElements(instanceElement));
if (hasRepeatableGroups(instanceElement)) {
if (instanceIDElement == null) {
outcome.results.put(id, "This form contains repeatable group so it should contain an instanceID!");
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This will show up in the UI so it should be translated. More concise version: "Forms with repeats must define an instanceID."

private String googleSheetsUrl = "";

private String mainSheetTitle;
private String googleSheetsUrl;
private String spreadsheetId;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You should always be able to get this from a Spreadsheet object.


private void insertRow(List<Object> rowElements, String sheetName) throws IOException {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think this method is helpful. If a method name is overloaded, it should be to have different variants with different parameter lists. Here, the two insertRow methods are not related in this way. Making the helper call and catching the exception in the primary insertRow would be clearer.

} else {
String answer = element.getValue() != null ? element.getValue().getDisplayText() : "";
if (new File(instanceFile.getParentFile() + "/" + answer).isFile()) {
answers.put(elementTitle, uploadMediaFile(instanceFile, answer));
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's easy to miss that the attachment is actually uploaded here. Clearer:

String mediaHyperlink = uploadMediaFile(instanceFile, answer);
answers.put(elementTitle, mediaHyperlink);

private String spreadsheetId;
private String id;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

id is only used for the outcome text. Could jrFormId be used in the outcome instead so that this field isn't needed?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I’ve eliminated it in my proposed changes.

private Outcome outcome;
private boolean hasWritePermissionToSheet = false;

private boolean hasWritePermissionToSheet;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this field actually useful? It's only ever set to true so I think that means even if you're uploading a bunch of different forms to different sheets, as soon as you hit the first one you have write access to the field is true. I think it can be removed.

@dcbriccetti
Copy link
Contributor

dcbriccetti commented Mar 12, 2018

@dcbriccetti could you review the code?

Sorry, I missed this way back when. One comment right away is that many of the commits could be combined, and that would help reviewers.

@dcbriccetti
Copy link
Contributor

dcbriccetti commented Mar 12, 2018

Here are the changes I recommend, to simplify the error handling:

https://github.com/dcbriccetti/collect/commit/fd36388ef0ffcdb91c093ff1003b413da4d1fd5e

@grzesiek2010
Copy link
Member Author

@lognaturel @dcbriccetti I implemented all improvements.

@lognaturel
Copy link
Member

Thanks for getting those changes in so quickly, @grzesiek2010! @dcbriccetti to summarize, you're not against using exceptions for control flow in a situation like this but introducing a custom exception means that at least we can be sure that it's an "expected" exception. Did I interpret your suggestions right?

I think this is now clear enough that we can move to QA. @dcbriccetti?

@dcbriccetti
Copy link
Contributor

Sure, we can move to QA.

If this were in Scala (or possibly with more modern Java features available), I would probably not recommend the exceptions. (Scala has Either which allows a function to return either a normal value or an object that contains details about the error.)

What I suggested creates a single exception
/** An exception that results in the cancellation of an instance upload, and the presentation of an error to the user */
that encapsulates either a message string or an exception (cause).
It keeps the IOException and other specific exceptions down in the lower levels.

Here’s my commit comment, since it didn’t survive:

Improve InstanceGoogleSheetsUploader error handling

Rename Outcome.results to messagesByInstanceId
Eliminate fields outcome and id
Have lower level methods not modify outcome, but rather return errors via a new exception, UploadException
Remove some duplicate code
In-line resizeSheet
Rename and change isAnyMissingColumn and isNumberOfColumnsValid to throw UploadException

if (!sheetCols.contains(col)) {
missingColumns.add(col);
}
// This method builds a column name by joining all of the containing group names using "-" as a separator
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We might as well enjoy the benefits of having this be a doc comment. Pop-up help in the IDE, for instance.
`/** This method builds a column name by joining all of the containing group names using "-" as a separator */
(Please notice the removed extra space.)


// check if root folder exists, if not then create one
driveHelper.getIDOfFolderWithName(GOOGLE_DRIVE_ROOT_FOLDER, null, true);
private String getGoogleSeetsUrl(Cursor cursor) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sheets, not Seets

@@ -167,7 +167,7 @@ protected void onDestroy() {

@Override
public void uploadingComplete(HashMap<String, String> result) {
Timber.i("uploadingComplete: Processing results (%d) from upload of %d instances!",
Timber.i("uploadingComplete: Processing messagesByInstanceId (%d) from upload of %d instances!",
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks like a search and replace error.

@mmarciniak90
Copy link
Contributor

Regression has been found!

I checked this case on v1.13.2. Problem was not visible.

Steps to reproduce:

  1. Start form with video - one video.xml.txt
  2. Add video
  3. Send form to Google Sheets (I added a link to sheets to Fallback submission URL)

Current behavior:

  • Error appears

but the file exists in directory

@grzesiek2010
Copy link
Member Author

@mmarciniak90 I fixed the problem and you can continue testing once you update your branch.

selectionArgs, null);
if (cursor != null && cursor.getCount() != 1) {
cursor.close();
if (!new File(filePath).exists()) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To help reviewers, would you please explain this change and how it fixes the problem? Thanks.

@mmarciniak90
Copy link
Contributor

Tested with success!

Verified on Android: 4.1, 4.2, 5.1, 6.0, 7.0, 8.0

Verified cases:

  • few looped repeat groups
  • one repeat group
  • form with photos, vidoes and audio files in repeat group
  • empty answer
  • empty repeat group
  • form with images in multi select widget
  • regression on aggregate
  • removing groups
  • Google Drive permissions:

@opendatakit-bot unlabel "needs testing"
@opendatakit-bot label "behavior verified"

@dcbriccetti dcbriccetti merged commit 261c9e8 into getodk:master Mar 20, 2018
@lognaturel
Copy link
Member

👏 👏 👏

lakshyagupta21 added a commit to lakshyagupta21/collect that referenced this pull request Mar 21, 2018
shobhitagarwal1612 pushed a commit to shobhitagarwal1612/collect that referenced this pull request May 15, 2018
* Extracted areHeadersEmpty() method

* Extracted areColumnNamesLegal() method

* Extracted areWritePermissionsGranted() method

* Extracted method uploadMedia()

* Extracted resizeSpreadSheet() method

* Extracted addHeaders() method

* Extracted isColumnLengthValid() method

* Extracted readColumnNames method()

* Aded instanceFile as a method parameter

* Extracted readAnswers() method

* Extracted areSubmissionColumnNamesLegal() method

* Extracted insertRow method()

* Extracted areEmptyColumns() method

* Extracted handleBlankColumnNames() method

* Extracted addMissingColumns() method

* Extracted checkForMissingColumns() method

* Extracted addPhotos() method

* Extracted updateValues() method

* Extracted getSheetCols() method

* Extracted prepareListOfValues() method

* Extracted getRowFromList() method

* Use insertRow() method in other places too

* Extracted sleepThread() method

* Removed unnecesarry comments

* Reduced the code using exceptions

* Fixed bug in updateValues() method

* Removed unnecessary areHeadersEmpty() method

* Removed unnecessary code

* Improved naming

* Moved token check up

* Improved comments

* Implemented addSheet() method

* Support repeat groups

* Code improvements

* Fixed token check

* Removed unnecessary validation

* Fixed problem with empty groups

* Fixed problem with doubled submissions

* Fixed problem with empty answers

* Fixed naming

* Fixed lints

* Fixed bug with duplicated repeat groups

* Check for instanceID

* Support nested repeat groups

* Removed usning Exceptions for flow control

* Code improvements

* Removed unnecessary check - Google Sheets allow adding other chars too

* Code improvements

* Improved previous solution - read answers from treeElements insead of parsing submission file

* Ignore empty rows

* Code improvements

* Fixed max columns limit - it's 256 not 255

* Fixing blank headers didn't work

* Ignore template elements

* Code improvements

* Code improvements

* Fixed problem with sending media files
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

6 participants