Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

409 if single submission has two attachments with identical content #299

Closed
matthew-white opened this issue Oct 14, 2020 · 5 comments
Closed

Comments

@matthew-white
Copy link
Member

matthew-white commented Oct 14, 2020

To reproduce:

  1. Upload a form that contains two image fields.
  2. Fill the form. Select the same file for the two image fields.
  3. When I do this on Enketo, I receive a 409:
<OpenRosaResponse xmlns="http://openrosa.org/http/response" items="0">
  <message nature="error">A resource already exists with sha value(s) of 24f3b9b5a653c052f7d366fe90cdace088d6e3c4.</message>
</OpenRosaResponse>

I am able to modify an existing test to generate a failing case:

 it('should save given attachments', testService((service) =>
   service.login('alice', (asAlice) =>
     asAlice.post('/v1/projects/1/forms?publish=true')
       .set('Content-Type', 'application/xml')
       .send(testData.forms.binaryType)
       .expect(200)
       .then(() => asAlice.post('/v1/projects/1/submission')
         .set('X-OpenRosa-Version', '1.0')
         .attach('my_file1.mp4', Buffer.from('this is test file one'), { filename: 'my_file1.mp4' })
         .attach('xml_submission_file', Buffer.from(testData.instances.binaryType.both), { filename: 'data.xml' })
-        .attach('here_is_file2.jpg', Buffer.from('this is test file two'), { filename: 'here_is_file2.jpg' })
-        .expect(201)
+        .attach('here_is_file2.jpg', Buffer.from('this is test file one'), { filename: 'here_is_file2.jpg' })
+        .expect(201)))));

Note that the issue seems to stem from files with identical content, not files with identical filenames (for which I think there is already a test).

I have a theory that the issue is related to blobs.ensure():

ensure: (blob) => ({ db }) =>
db.select('id').from('blobs').where({ sha: blob.sha })
.then(maybeFirst)
.then((result) => result
.map(({ id }) => id)
.orElseGet(() => db.insert(blob).into('blobs').returning('id').then(([ id ]) => id)))

blobs.ensure() checks whether a blob exists, then inserts it if it does not (SELECT followed by INSERT). However, when a submission is created, its attachments are inserted into the database in parallel. I think that results in a race condition whereby the checks may happen for two attachments with identical content before either blob is inserted (instead of SELECT, INSERT, SELECT, INSERT, the order is SELECT, SELECT, INSERT, INSERT). When I add logging to blobs.ensure(), I think I see that behavior when I run the test above.

Just an idea, maybe it'd be possible to solve this using an INSERT ON CONFLICT clause?

I think this issue is part of what was happening in this forum topic:

https://forum.getodk.org/t/in-odk-central-0-8-some-submissions-the-attachmentspresent-is-different-to-attachmentsexpected-how-avoid-this/26359

It also came up in this forum topic:

https://forum.getodk.org/t/testing-form-on-central-upload-error-a-resource-already-exists/30947

@issa-tseng
Copy link
Member

the issue w insert on conflict is that you have to ship all the bits whether it ends up creating the data or no. and the way openrosa works i feel like duplicate data is maybe likely. adding for update didn't do anything though :/

@lognaturel
Copy link
Member

lognaturel commented Nov 17, 2020

We’re looking at a likely fix for the next Collect release. Since Collect always renames media files we’ll check file hashes and use the actual same file with the same name in multiple places if the user selects the same contents.

However, this limitation is definitely not part of the spec and Enketo or other clients are likely to send the same file with different file names. Maybe the file name can be considered in the constraint?

duplicate data is maybe likely.

I don’t really understand why the constraint is there in the first place, really. Would be good to talk through what this means.

@issa-tseng
Copy link
Member

issa-tseng commented Nov 17, 2020 via email

@lognaturel
Copy link
Member

This does not appear to be fixed in v1.1.1. I have reproduced with https://test.central.getodk.org/#/projects/269/forms/multiple_background/draft/testing and form instance Multiple background recordings_2021-02-08_12-57-27.zip

@lognaturel
Copy link
Member

I think what I'm getting is a slightly different issue which @matthew-white mentioned in his original description. I'll open a separate issue instead.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants