Integration test about merging shares randomly fails #25830

Closed
PVince81 opened this Issue Aug 17, 2016 · 15 comments

Projects

None yet

2 participants

@PVince81
Collaborator

Seen 4-5 times on different PRs, randomly:

sharing.Merging shares for recipient when shared from outside with user then group and recipient renames in between
Then as "user1" gets properties of folder "merge-test-outside-groups-renamebeforesecondshare-renamed" with:

HTTP error: 404

I have no clue why this fails. Maybe the test order changes sometimes ?
From what I saw when developing these tests, some things do not get cleant up between tests. This is the reason why I chose to use different folder names unique to the tests to make sure they were not shared by other tests.

CC @SergioBertolinSG

@PVince81 PVince81 added this to the 9.1.2 milestone Aug 17, 2016
@SergioBertolinSG
Member

Have you observed this only in stable9.1 or also in master?

@PVince81
Collaborator

The last I observed was on master here: #25639

Not sure about the previous occurrences

@SergioBertolinSG
Member

Running the scenario alone it never fails, also I haven't seen it failing while running only the sharing-v1.feature. I guess it fails only when running the whole suite.

@PVince81
Collaborator

Yeah... and since it randomly fails only on CI, I suspect that in some of the CI nodes maybe the order of the tests is different.

@SergioBertolinSG
Member

No, the order is always the same. Alphabetical order for features and line number for scenarios.

@SergioBertolinSG
Member

From what I saw when developing these tests, some things do not get cleant up between tests.

Do you remember what doesn't get clean after tests? Users are deleted, groups as well.

@PVince81
Collaborator

I think it was the shares that didn't get deleted. Maybe when the user got recreated, the filecache was still there and was reused, which would explain why the shares still existed.
The shares would only be cleared properly after a thorough cron run.

@PVince81 PVince81 modified the milestone: 9.1.3, 9.1.2 Oct 20, 2016
@PVince81 PVince81 referenced this issue Nov 9, 2016
Merged

share api expanded by tags #26583

6 of 9 tasks complete
@PVince81
Collaborator

GRRRRR and it's still there

@PVince81 PVince81 modified the milestone: 9.1.4, 9.1.3 Nov 30, 2016
@PVince81
Collaborator
PVince81 commented Dec 5, 2016

Ok, here's an idea: next time it fails somewhere, let's copy the log and compare it with a successful build. Maybe there will be clues about the test running order which we could somehow enforce locally to get the same results.

@PVince81
Collaborator
PVince81 commented Dec 6, 2016

Here we go, captured results from a failed run: https://s3.owncloud.com/owncloud/index.php/s/VFIlf7xcUlydWFz

@PVince81
Collaborator
PVince81 commented Dec 6, 2016

And here from a succesful run: https://s3.owncloud.com/owncloud/index.php/s/Pcse0rR48r6MVX2

Let's see if we can find anything about test order...

@PVince81
Collaborator
PVince81 commented Dec 6, 2016

So I post-processed the logs a bit and removed timestamps and unified the test URL due to it containing the PR number + a hash.

Here are both processed logs: https://s3.owncloud.com/owncloud/index.php/s/huNPY2MItMoGr2Y
Try diffing, for example: vimdiff jenkins-success.txt jenkins-fail.txt

I diffed them and so far what I observed:

  1. the test order is exactly the same in both
  2. for some queries the HTTP log entry appears before or after the behat entry, shouldn't matter though
  3. nothing suspicious before the merge test unfortunately
  4. after the failed merge test, the share ids are shifted of one. This is possibly because a share is missing or wasn't created properly. I wonder if it's the one share that might make that test here fail.

Now if we think race conditions, what if there is a OCS Share call that already sends a success response before it finished committing the entries in the database ? This means that the PHP request would automatically commit when reaching the end of the code. But if between that moment and now behat already sent the next request, the DB result will likely not be the same as it wasn't committed yet.

This is only a theory. So far I don't see what other kinds of race condition there could be, especially here with this very linear testing...

@PVince81
Collaborator
PVince81 commented Dec 6, 2016

Success:

    And User "user1" moved folder "/merge-test-outside-groups-renamebeforesecondshare" to "/merge-test-outside-groups-renamebeforesecondshare-renamed" # FeatureContext::userMovedFile()
 [200]: /ocs/v1.php/apps/files_sharing/api/v1/shares?path=%2Fmerge-test-outside-groups-renamebeforesecondshare
 [200]: /ocs/v1.php/apps/files_sharing/api/v1/shares
 [200]: /ocs/v1.php/apps/files_sharing/api/v1/shares?path=%2Fmerge-test-outside-groups-renamebeforesecondshare
    And folder "/merge-test-outside-groups-renamebeforesecondshare" of user "user0" is shared with group "group1"                                      # FeatureContext::assureFileIsSharedWithGroup()
 [401]: /remote.php/webdav/merge-test-outside-groups-renamebeforesecondshare-renamed
 [207]: /remote.php/webdav/merge-test-outside-groups-renamebeforesecondshare-renamed
    Then as "user1" gets properties of folder "/merge-test-outside-groups-renamebeforesecondshare-renamed" with                                        # FeatureContext::asGetsPropertiesOfFolderWith()
      | {http://owncloud.org/ns}permissions |
    And the single response should contain a property "{http://owncloud.org/ns}permissions" with value "SRDNVCK"                                       # FeatureContext::theSingleResponseShouldContainAPropertyWithValue()
 [401]: /remote.php/webdav/merge-test-outside-groups-renamebeforesecondshare
 [404]: /remote.php/webdav/merge-test-outside-groups-renamebeforesecondshare
    And as "user1" the folder "/merge-test-outside-groups-renamebeforesecondshare" does not exist                                                      # FeatureContext::asTheFileOrFolderDoesNotExist()

vs failure:

    And User "user1" moved folder "/merge-test-outside-groups-renamebeforesecondshare" to "/merge-test-outside-groups-renamebeforesecondshare-renamed" # FeatureContext::userMovedFile()
 [200]: /ocs/v1.php/apps/files_sharing/api/v1/shares?path=%2Fmerge-test-outside-groups-renamebeforesecondshare
 [200]: /ocs/v1.php/apps/files_sharing/api/v1/shares
 [200]: /ocs/v1.php/apps/files_sharing/api/v1/shares?path=%2Fmerge-test-outside-groups-renamebeforesecondshare
    And folder "/merge-test-outside-groups-renamebeforesecondshare" of user "user0" is shared with group "group1"                                      # FeatureContext::assureFileIsSharedWithGroup()
 [401]: /remote.php/webdav/merge-test-outside-groups-renamebeforesecondshare-renamed
 [404]: /remote.php/webdav/merge-test-outside-groups-renamebeforesecondshare-renamed
    Then as "user1" gets properties of folder "/merge-test-outside-groups-renamebeforesecondshare-renamed" with                                        # FeatureContext::asGetsPropertiesOfFolderWith()
      | {http://owncloud.org/ns}permissions |
      HTTP error: 404 (Sabre\DAV\Exception)
    And the single response should contain a property "{http://owncloud.org/ns}permissions" with value "SRDNVCK"                                       # FeatureContext::theSingleResponseShouldContainAPropertyWithValue()
    And as "user1" the folder "/merge-test-outside-groups-renamebeforesecondshare" does not exist                                                      # FeatureContext::asTheFileOrFolderDoesNotExist()

There is something weird happening here:

1.    When folder "/merge-test-outside-groups-renamebeforesecondshare" of user "user0" is shared with user "user1"
2.    And User "user1" moved folder "/merge-test-outside-groups-renamebeforesecondshare" to "/merge-test-outside-groups-renamebeforesecondshare-renamed"
3.    And folder "/merge-test-outside-groups-renamebeforesecondshare" of user "user0" is shared with group "group1"

At step 1 user0 shares folder X with user1
At step 2 user1 renamed folder X to Y
At step 3 user0 shares folder X with group1, but here suddenly folder X doesn't exist any more ?! (404)

Now renaming a folder as recipient shouldn't in any way affect the source folder. It should only update the "file_target" in the database.

@PVince81
Collaborator
PVince81 commented Dec 6, 2016

Hmm no, I think I read it wrong because the HTTP request logs appear much later than the behat entries.

The 404 is clearly for the PROPFIND of Then as "user1" gets properties of folder "/merge-test-outside-groups-renamebeforesecondshare-renamed" with # FeatureContext::asGetsPropertiesOfFolderWith()

So this means that both the user share then group share get created, and then some magic happens and the folder is not visible any more for "user1".
Or maybe it is renamed back to its original name "merge-test-outside-groups-renamebeforesecondshare".

So the place to research for race conditions is the code that merges the shares, maybe something is making their priority change. From what I remember the merging code sorts by share time. Maybe if the share time of both shares is too close, they might swap ?

@PVince81 PVince81 referenced this issue Dec 6, 2016
Merged

When grouping shares, sort by stime then id #26777

3 of 9 tasks complete
@PVince81
Collaborator
PVince81 commented Dec 6, 2016

Mystery solved: #26777

It was indeed an issue that only appears when the test server is too fast and the share times are equal.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment