New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Integration test about merging shares randomly fails #25830

Closed
PVince81 opened this Issue Aug 17, 2016 · 15 comments

Comments

Projects
None yet
2 participants
@PVince81
Member

PVince81 commented Aug 17, 2016

Seen 4-5 times on different PRs, randomly:

sharing.Merging shares for recipient when shared from outside with user then group and recipient renames in between
Then as "user1" gets properties of folder "merge-test-outside-groups-renamebeforesecondshare-renamed" with:

HTTP error: 404

I have no clue why this fails. Maybe the test order changes sometimes ?
From what I saw when developing these tests, some things do not get cleant up between tests. This is the reason why I chose to use different folder names unique to the tests to make sure they were not shared by other tests.

CC @SergioBertolinSG

@SergioBertolinSG

This comment has been minimized.

Show comment
Hide comment
@SergioBertolinSG

SergioBertolinSG Aug 17, 2016

Member

Have you observed this only in stable9.1 or also in master?

Member

SergioBertolinSG commented Aug 17, 2016

Have you observed this only in stable9.1 or also in master?

@PVince81

This comment has been minimized.

Show comment
Hide comment
@PVince81

PVince81 Aug 17, 2016

Member

The last I observed was on master here: #25639

Not sure about the previous occurrences

Member

PVince81 commented Aug 17, 2016

The last I observed was on master here: #25639

Not sure about the previous occurrences

@SergioBertolinSG

This comment has been minimized.

Show comment
Hide comment
@SergioBertolinSG

SergioBertolinSG Aug 17, 2016

Member

Running the scenario alone it never fails, also I haven't seen it failing while running only the sharing-v1.feature. I guess it fails only when running the whole suite.

Member

SergioBertolinSG commented Aug 17, 2016

Running the scenario alone it never fails, also I haven't seen it failing while running only the sharing-v1.feature. I guess it fails only when running the whole suite.

@PVince81

This comment has been minimized.

Show comment
Hide comment
@PVince81

PVince81 Aug 17, 2016

Member

Yeah... and since it randomly fails only on CI, I suspect that in some of the CI nodes maybe the order of the tests is different.

Member

PVince81 commented Aug 17, 2016

Yeah... and since it randomly fails only on CI, I suspect that in some of the CI nodes maybe the order of the tests is different.

@SergioBertolinSG

This comment has been minimized.

Show comment
Hide comment
@SergioBertolinSG

SergioBertolinSG Aug 17, 2016

Member

No, the order is always the same. Alphabetical order for features and line number for scenarios.

Member

SergioBertolinSG commented Aug 17, 2016

No, the order is always the same. Alphabetical order for features and line number for scenarios.

@SergioBertolinSG

This comment has been minimized.

Show comment
Hide comment
@SergioBertolinSG

SergioBertolinSG Aug 17, 2016

Member

From what I saw when developing these tests, some things do not get cleant up between tests.

Do you remember what doesn't get clean after tests? Users are deleted, groups as well.

Member

SergioBertolinSG commented Aug 17, 2016

From what I saw when developing these tests, some things do not get cleant up between tests.

Do you remember what doesn't get clean after tests? Users are deleted, groups as well.

@PVince81

This comment has been minimized.

Show comment
Hide comment
@PVince81

PVince81 Aug 17, 2016

Member

I think it was the shares that didn't get deleted. Maybe when the user got recreated, the filecache was still there and was reused, which would explain why the shares still existed.
The shares would only be cleared properly after a thorough cron run.

Member

PVince81 commented Aug 17, 2016

I think it was the shares that didn't get deleted. Maybe when the user got recreated, the filecache was still there and was reused, which would explain why the shares still existed.
The shares would only be cleared properly after a thorough cron run.

@PVince81 PVince81 modified the milestones: 9.1.3, 9.1.2 Oct 20, 2016

@PVince81 PVince81 referenced this issue Nov 9, 2016

Merged

share api expanded by tags #26583

6 of 9 tasks complete
@PVince81

This comment has been minimized.

Show comment
Hide comment
@PVince81

PVince81 Nov 30, 2016

Member

GRRRRR and it's still there

Member

PVince81 commented Nov 30, 2016

GRRRRR and it's still there

@PVince81 PVince81 modified the milestones: 9.1.4, 9.1.3 Nov 30, 2016

@PVince81

This comment has been minimized.

Show comment
Hide comment
@PVince81

PVince81 Dec 5, 2016

Member

Ok, here's an idea: next time it fails somewhere, let's copy the log and compare it with a successful build. Maybe there will be clues about the test running order which we could somehow enforce locally to get the same results.

Member

PVince81 commented Dec 5, 2016

Ok, here's an idea: next time it fails somewhere, let's copy the log and compare it with a successful build. Maybe there will be clues about the test running order which we could somehow enforce locally to get the same results.

@PVince81

This comment has been minimized.

Show comment
Hide comment
@PVince81

PVince81 Dec 6, 2016

Member

Here we go, captured results from a failed run: https://s3.owncloud.com/owncloud/index.php/s/VFIlf7xcUlydWFz

Member

PVince81 commented Dec 6, 2016

Here we go, captured results from a failed run: https://s3.owncloud.com/owncloud/index.php/s/VFIlf7xcUlydWFz

@PVince81

This comment has been minimized.

Show comment
Hide comment
@PVince81

PVince81 Dec 6, 2016

Member

And here from a succesful run: https://s3.owncloud.com/owncloud/index.php/s/Pcse0rR48r6MVX2

Let's see if we can find anything about test order...

Member

PVince81 commented Dec 6, 2016

And here from a succesful run: https://s3.owncloud.com/owncloud/index.php/s/Pcse0rR48r6MVX2

Let's see if we can find anything about test order...

@PVince81

This comment has been minimized.

Show comment
Hide comment
@PVince81

PVince81 Dec 6, 2016

Member

So I post-processed the logs a bit and removed timestamps and unified the test URL due to it containing the PR number + a hash.

Here are both processed logs: https://s3.owncloud.com/owncloud/index.php/s/huNPY2MItMoGr2Y
Try diffing, for example: vimdiff jenkins-success.txt jenkins-fail.txt

I diffed them and so far what I observed:

  1. the test order is exactly the same in both
  2. for some queries the HTTP log entry appears before or after the behat entry, shouldn't matter though
  3. nothing suspicious before the merge test unfortunately
  4. after the failed merge test, the share ids are shifted of one. This is possibly because a share is missing or wasn't created properly. I wonder if it's the one share that might make that test here fail.

Now if we think race conditions, what if there is a OCS Share call that already sends a success response before it finished committing the entries in the database ? This means that the PHP request would automatically commit when reaching the end of the code. But if between that moment and now behat already sent the next request, the DB result will likely not be the same as it wasn't committed yet.

This is only a theory. So far I don't see what other kinds of race condition there could be, especially here with this very linear testing...

Member

PVince81 commented Dec 6, 2016

So I post-processed the logs a bit and removed timestamps and unified the test URL due to it containing the PR number + a hash.

Here are both processed logs: https://s3.owncloud.com/owncloud/index.php/s/huNPY2MItMoGr2Y
Try diffing, for example: vimdiff jenkins-success.txt jenkins-fail.txt

I diffed them and so far what I observed:

  1. the test order is exactly the same in both
  2. for some queries the HTTP log entry appears before or after the behat entry, shouldn't matter though
  3. nothing suspicious before the merge test unfortunately
  4. after the failed merge test, the share ids are shifted of one. This is possibly because a share is missing or wasn't created properly. I wonder if it's the one share that might make that test here fail.

Now if we think race conditions, what if there is a OCS Share call that already sends a success response before it finished committing the entries in the database ? This means that the PHP request would automatically commit when reaching the end of the code. But if between that moment and now behat already sent the next request, the DB result will likely not be the same as it wasn't committed yet.

This is only a theory. So far I don't see what other kinds of race condition there could be, especially here with this very linear testing...

@PVince81

This comment has been minimized.

Show comment
Hide comment
@PVince81

PVince81 Dec 6, 2016

Member

Success:

    And User "user1" moved folder "/merge-test-outside-groups-renamebeforesecondshare" to "/merge-test-outside-groups-renamebeforesecondshare-renamed" # FeatureContext::userMovedFile()
 [200]: /ocs/v1.php/apps/files_sharing/api/v1/shares?path=%2Fmerge-test-outside-groups-renamebeforesecondshare
 [200]: /ocs/v1.php/apps/files_sharing/api/v1/shares
 [200]: /ocs/v1.php/apps/files_sharing/api/v1/shares?path=%2Fmerge-test-outside-groups-renamebeforesecondshare
    And folder "/merge-test-outside-groups-renamebeforesecondshare" of user "user0" is shared with group "group1"                                      # FeatureContext::assureFileIsSharedWithGroup()
 [401]: /remote.php/webdav/merge-test-outside-groups-renamebeforesecondshare-renamed
 [207]: /remote.php/webdav/merge-test-outside-groups-renamebeforesecondshare-renamed
    Then as "user1" gets properties of folder "/merge-test-outside-groups-renamebeforesecondshare-renamed" with                                        # FeatureContext::asGetsPropertiesOfFolderWith()
      | {http://owncloud.org/ns}permissions |
    And the single response should contain a property "{http://owncloud.org/ns}permissions" with value "SRDNVCK"                                       # FeatureContext::theSingleResponseShouldContainAPropertyWithValue()
 [401]: /remote.php/webdav/merge-test-outside-groups-renamebeforesecondshare
 [404]: /remote.php/webdav/merge-test-outside-groups-renamebeforesecondshare
    And as "user1" the folder "/merge-test-outside-groups-renamebeforesecondshare" does not exist                                                      # FeatureContext::asTheFileOrFolderDoesNotExist()

vs failure:

    And User "user1" moved folder "/merge-test-outside-groups-renamebeforesecondshare" to "/merge-test-outside-groups-renamebeforesecondshare-renamed" # FeatureContext::userMovedFile()
 [200]: /ocs/v1.php/apps/files_sharing/api/v1/shares?path=%2Fmerge-test-outside-groups-renamebeforesecondshare
 [200]: /ocs/v1.php/apps/files_sharing/api/v1/shares
 [200]: /ocs/v1.php/apps/files_sharing/api/v1/shares?path=%2Fmerge-test-outside-groups-renamebeforesecondshare
    And folder "/merge-test-outside-groups-renamebeforesecondshare" of user "user0" is shared with group "group1"                                      # FeatureContext::assureFileIsSharedWithGroup()
 [401]: /remote.php/webdav/merge-test-outside-groups-renamebeforesecondshare-renamed
 [404]: /remote.php/webdav/merge-test-outside-groups-renamebeforesecondshare-renamed
    Then as "user1" gets properties of folder "/merge-test-outside-groups-renamebeforesecondshare-renamed" with                                        # FeatureContext::asGetsPropertiesOfFolderWith()
      | {http://owncloud.org/ns}permissions |
      HTTP error: 404 (Sabre\DAV\Exception)
    And the single response should contain a property "{http://owncloud.org/ns}permissions" with value "SRDNVCK"                                       # FeatureContext::theSingleResponseShouldContainAPropertyWithValue()
    And as "user1" the folder "/merge-test-outside-groups-renamebeforesecondshare" does not exist                                                      # FeatureContext::asTheFileOrFolderDoesNotExist()

There is something weird happening here:

1.    When folder "/merge-test-outside-groups-renamebeforesecondshare" of user "user0" is shared with user "user1"
2.    And User "user1" moved folder "/merge-test-outside-groups-renamebeforesecondshare" to "/merge-test-outside-groups-renamebeforesecondshare-renamed"
3.    And folder "/merge-test-outside-groups-renamebeforesecondshare" of user "user0" is shared with group "group1"

At step 1 user0 shares folder X with user1
At step 2 user1 renamed folder X to Y
At step 3 user0 shares folder X with group1, but here suddenly folder X doesn't exist any more ?! (404)

Now renaming a folder as recipient shouldn't in any way affect the source folder. It should only update the "file_target" in the database.

Member

PVince81 commented Dec 6, 2016

Success:

    And User "user1" moved folder "/merge-test-outside-groups-renamebeforesecondshare" to "/merge-test-outside-groups-renamebeforesecondshare-renamed" # FeatureContext::userMovedFile()
 [200]: /ocs/v1.php/apps/files_sharing/api/v1/shares?path=%2Fmerge-test-outside-groups-renamebeforesecondshare
 [200]: /ocs/v1.php/apps/files_sharing/api/v1/shares
 [200]: /ocs/v1.php/apps/files_sharing/api/v1/shares?path=%2Fmerge-test-outside-groups-renamebeforesecondshare
    And folder "/merge-test-outside-groups-renamebeforesecondshare" of user "user0" is shared with group "group1"                                      # FeatureContext::assureFileIsSharedWithGroup()
 [401]: /remote.php/webdav/merge-test-outside-groups-renamebeforesecondshare-renamed
 [207]: /remote.php/webdav/merge-test-outside-groups-renamebeforesecondshare-renamed
    Then as "user1" gets properties of folder "/merge-test-outside-groups-renamebeforesecondshare-renamed" with                                        # FeatureContext::asGetsPropertiesOfFolderWith()
      | {http://owncloud.org/ns}permissions |
    And the single response should contain a property "{http://owncloud.org/ns}permissions" with value "SRDNVCK"                                       # FeatureContext::theSingleResponseShouldContainAPropertyWithValue()
 [401]: /remote.php/webdav/merge-test-outside-groups-renamebeforesecondshare
 [404]: /remote.php/webdav/merge-test-outside-groups-renamebeforesecondshare
    And as "user1" the folder "/merge-test-outside-groups-renamebeforesecondshare" does not exist                                                      # FeatureContext::asTheFileOrFolderDoesNotExist()

vs failure:

    And User "user1" moved folder "/merge-test-outside-groups-renamebeforesecondshare" to "/merge-test-outside-groups-renamebeforesecondshare-renamed" # FeatureContext::userMovedFile()
 [200]: /ocs/v1.php/apps/files_sharing/api/v1/shares?path=%2Fmerge-test-outside-groups-renamebeforesecondshare
 [200]: /ocs/v1.php/apps/files_sharing/api/v1/shares
 [200]: /ocs/v1.php/apps/files_sharing/api/v1/shares?path=%2Fmerge-test-outside-groups-renamebeforesecondshare
    And folder "/merge-test-outside-groups-renamebeforesecondshare" of user "user0" is shared with group "group1"                                      # FeatureContext::assureFileIsSharedWithGroup()
 [401]: /remote.php/webdav/merge-test-outside-groups-renamebeforesecondshare-renamed
 [404]: /remote.php/webdav/merge-test-outside-groups-renamebeforesecondshare-renamed
    Then as "user1" gets properties of folder "/merge-test-outside-groups-renamebeforesecondshare-renamed" with                                        # FeatureContext::asGetsPropertiesOfFolderWith()
      | {http://owncloud.org/ns}permissions |
      HTTP error: 404 (Sabre\DAV\Exception)
    And the single response should contain a property "{http://owncloud.org/ns}permissions" with value "SRDNVCK"                                       # FeatureContext::theSingleResponseShouldContainAPropertyWithValue()
    And as "user1" the folder "/merge-test-outside-groups-renamebeforesecondshare" does not exist                                                      # FeatureContext::asTheFileOrFolderDoesNotExist()

There is something weird happening here:

1.    When folder "/merge-test-outside-groups-renamebeforesecondshare" of user "user0" is shared with user "user1"
2.    And User "user1" moved folder "/merge-test-outside-groups-renamebeforesecondshare" to "/merge-test-outside-groups-renamebeforesecondshare-renamed"
3.    And folder "/merge-test-outside-groups-renamebeforesecondshare" of user "user0" is shared with group "group1"

At step 1 user0 shares folder X with user1
At step 2 user1 renamed folder X to Y
At step 3 user0 shares folder X with group1, but here suddenly folder X doesn't exist any more ?! (404)

Now renaming a folder as recipient shouldn't in any way affect the source folder. It should only update the "file_target" in the database.

@PVince81

This comment has been minimized.

Show comment
Hide comment
@PVince81

PVince81 Dec 6, 2016

Member

Hmm no, I think I read it wrong because the HTTP request logs appear much later than the behat entries.

The 404 is clearly for the PROPFIND of Then as "user1" gets properties of folder "/merge-test-outside-groups-renamebeforesecondshare-renamed" with # FeatureContext::asGetsPropertiesOfFolderWith()

So this means that both the user share then group share get created, and then some magic happens and the folder is not visible any more for "user1".
Or maybe it is renamed back to its original name "merge-test-outside-groups-renamebeforesecondshare".

So the place to research for race conditions is the code that merges the shares, maybe something is making their priority change. From what I remember the merging code sorts by share time. Maybe if the share time of both shares is too close, they might swap ?

Member

PVince81 commented Dec 6, 2016

Hmm no, I think I read it wrong because the HTTP request logs appear much later than the behat entries.

The 404 is clearly for the PROPFIND of Then as "user1" gets properties of folder "/merge-test-outside-groups-renamebeforesecondshare-renamed" with # FeatureContext::asGetsPropertiesOfFolderWith()

So this means that both the user share then group share get created, and then some magic happens and the folder is not visible any more for "user1".
Or maybe it is renamed back to its original name "merge-test-outside-groups-renamebeforesecondshare".

So the place to research for race conditions is the code that merges the shares, maybe something is making their priority change. From what I remember the merging code sorts by share time. Maybe if the share time of both shares is too close, they might swap ?

@PVince81 PVince81 referenced this issue Dec 6, 2016

Merged

When grouping shares, sort by stime then id #26777

3 of 9 tasks complete
@PVince81

This comment has been minimized.

Show comment
Hide comment
@PVince81

PVince81 Dec 6, 2016

Member

Mystery solved: #26777

It was indeed an issue that only appears when the test server is too fast and the share times are equal.

Member

PVince81 commented Dec 6, 2016

Mystery solved: #26777

It was indeed an issue that only appears when the test server is too fast and the share times are equal.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment