[stable9.1] Skip null groups in group manager #26871

Merged
merged 8 commits into from Jan 17, 2017
@PVince81
Collaborator

Description

Skip null groups in group manager, because data inconsistency could cause some backends (LDAP?) to return null groups.

Related Issue

Motivation and Context

How Has This Been Tested?

Screenshots (if appropriate):

Types of changes

  • Bug fix (non-breaking change which fixes an issue)
  • New feature (non-breaking change which adds functionality)
  • Breaking change (fix or feature that would cause existing functionality to change)

Checklist:

  • My code follows the code style of this project.
  • My change requires a change to the documentation.
  • I have updated the documentation accordingly.
  • I have read the CONTRIBUTING document.
  • I have added tests to cover my changes.
  • All new and existing tests passed.

@felixboehm @jvillafanez @butonic

@PVince81 PVince81 added this to the 9.1.4 milestone Dec 23, 2016
@mention-bot

@PVince81, thanks for your PR! By analyzing the history of the files in this pull request, we identified @DeepDiver1975, @bartv2 and @macjohnny to be potential reviewers.

@PVince81
Collaborator

Steps to reproduce:

Problem can be reproduced with memcache/redis by manually causing a cache inconsistency by meddling with the keys:

  1. Setup LDAP
  2. Setup OC to use Redis as memcache
  3. Create a user "zombie300"
  4. Add "zombie300" in two groups "Box100" and "Box105" (I used "memberUid" mode)
  5. flushall in redis to clear all keys, just in case
  6. PROPFIND as zombie300 on the root as this user: curl -X PROPFIND -u zombie300:zombie300 http://localhost/owncloud/remote.php/webdav/ (this is important to not cause web UI polling to repopulate the cache while testing)
  7. Compute key name on CLI: php -r 'print(md5("groupExists" . "Box105"));' => f083920ef775eb16207405c1c4dc57e9
  8. In Redis CLI, find the: keys *f083920ef775eb16207405c1c4dc57e9* => "6a3a92343f0dc7ff3b548564acfcbb91/LDAP-user_ldap--f083920ef775eb16207405c1c4dc57e9" (you will get a different result depending on OC instance id)
  9. Go to LDAP and delete the group "Box105"
  10. In Redis CLI, delete the key from above: del 6a3a92343f0dc7ff3b548564acfcbb91/LDAP-user_ldap--f083920ef775eb16207405c1c4dc57e9
  11. Redo the PROPFIND

Note: the cache key can also exist if the group does not exist, its contents will be the value "false" in JSON format.

Before the fix:

{"reqId":"HU2rhjW37re7hoEcU0vx","remoteAddr":"127.0.0.1","app":"PHP","message":"Call to a member function getGID() on null at \/srv\/www\/htdocs\/owncloud\/apps\/dav\/lib\/Connector\/Sabre\/Principal.php#152","level":3,"time":"2016-12-23T10:53:02+00:00","method":"PROPFIND","url":"\/owncloud\/remote.php\/webdav\/","user":"5adb9728-5d44-1036-95d9-3f869d76dba2"}

After the fix, the warning from this PR is logged:

{"reqId":"eSDUQp4TuoeKtPiYBbS7","remoteAddr":"127.0.0.1","app":"core","message":"Warning: user \"5adb9728-5d44-1036-95d9-3f869d76dba2\" belongs to deleted group: \"Box104\"","level":0,"time":"2016-12-23T10:56:03+00:00","method":"PROPFIND","url":"\/owncloud\/remote.php\/webdav\/","user":"5adb9728-5d44-1036-95d9-3f869d76dba2"}
@PVince81
Collaborator

So I'd say this PR is mostly about hardening against such inconsistencies until we figure out why they are happening in the first place. Here's a starting point for LDAP's crappy caching: owncloud/user_ldap#30

Please review @DeepDiver1975 @jvillafanez @PhilippSchaffrath @felixboehm @butonic

lib/private/Group/Manager.php
+ if (!is_null($aGroup)) {
+ $groups[$groupId] = $aGroup;
+ } else {
+ \OC::$server->getLogger()->debug('Warning: user "' . $uid . '" belongs to deleted group: "' . $groupId . '"', array('app' => 'core'));
@cdamken
cdamken Dec 23, 2016 Contributor

Would not be better instead debug a higher log level? I would suggest 2

@felixboehm
felixboehm Dec 26, 2016 Contributor

debug log level makes sense, with this code it is a designed functionality to handle users in deleted groups.

@butonic
butonic Jan 9, 2017 Member

The change makes oc gracefully handle null groups. So no warning necessary IMO. But starting the debug log line with 'Warning' is weird. Either the admin should do something to fix this or he shouldn't. If he should we need to tell him whit to do to make the warning go away.

@PVince81
PVince81 Jan 11, 2017 Collaborator

This situation should never happen. The purpose of the warning is to make it possible to find out about such scenarios where the LDAP server (or our LDAP impl) returns inconsistent results.

I'm fine removing the "Warning: " piece of the string.

@felixboehm

๐Ÿ‘

@felixboehm
Contributor

Please add test for deleted groups.

@cdamken
Contributor
cdamken commented Dec 27, 2016

Please add test for deleted groups.

@SergioBertolinSG Please help here

@PVince81
Collaborator
PVince81 commented Jan 9, 2017

The automated test will need to interact directly with memcache to be able to delete the key in question there... good luck

@PVince81
Collaborator
PVince81 commented Jan 12, 2017 edited

Also skip groups in "search" function that is used by the users page and others: 400baf3

Additionally I searched for usages of "groupManager->get(" and found that some parts of the sharing code doesn't properly check for group existence, so I hardened that as well in 8e7e5a9

  • TODO: add unit tests for group manager changes
  • TODO: add unit tests for sharing code changes
  • TODO: integration tests for deleted group cases with existing shares
@PVince81
Collaborator

Just had a quick test with local DB users and shares: when deleting a group in the users admin page, it automatically deletes all matching shares.

But I think in the case of LDAP the group deletion detection happens with a delay, so it makes sense to harden the sharing code to avoid issues when bumping into shares pointing to deleted groups.
I hope we can find a way to automatically test this with LDAP @davitol

@PVince81
Collaborator

Apparently for LDAP without caching, deleting the group in LDAP has a direct effect in OC and any affected shares disappear in the file listing. However they keep the oc_share entries. This makes the hardening above even more important.
Added test cases here owncloud/user_ldap#35 (comment)

@PVince81
Collaborator

Ewww, with redis memcaching enabled (and on this branch) if I delete a group in LDAP, it is still cached. That's ok. The shared mount still appears but the OCS Share API doesn't return the share anymore. So possibly it's reading from a different cache ?!

@PVince81
Collaborator

I debugged into it and the OCS Share API uses getUserGroups which still returns the deleted group as it was cached. Then later it calls inGroup() which itself has its own cache. But that cache was not populated yet, so it grabbed the value directly from LDAP after the deletion, so it returns false.
So this is the cache discrepancy I described in owncloud/user_ldap#30.

The only thing we can do here is to be prepared to receive inconsistent results. This is what this PR is about.

Ok, then let's continue with the unit tests...

@PVince81
Collaborator

integration tests for deleted group cases with existing shares

@SergioBertolinSG do we have integration tests for shares related to deleted groups with the database backend ? (LDAP cases will have to be done in the LDAP app as stated above)

@PVince81
Collaborator

@jvillafanez @butonic @PhilippSchaffrath I need a new review, more code was added.

@SergioBertolinSG
Member

@SergioBertolinSG do we have integration tests for shares related to deleted groups with the database backend ? (LDAP cases will have to be done in the LDAP app as stated above)

No, I don't think so. We have one about orphaned shares, this case is be something similar?

+
+ // such issue can usually happen when dealing with
+ // null groups which usually appear with group backend
+ // caching inconsistencies
@jvillafanez
jvillafanez Jan 13, 2017 Contributor

I'd rather log something. It will be easier to debug possible errors or to know what is being ignored.

- $this->groupManager->method('get')->with('group2')->willReturn($group);
@jvillafanez
jvillafanez Jan 13, 2017 Contributor

I think some adjustments are needed here.

@PVince81
PVince81 Jan 13, 2017 Collaborator

indeed, forgot to use "groupnull"

tests/lib/Share20/ManagerTest.php
+ ->with($path)
+ ->willReturn([$share2]);
+
+ $this->invokePrivate($this->manager, 'userCreateChecks', [$share]);
@jvillafanez
jvillafanez Jan 13, 2017 Contributor

no assertions in this test?

@PVince81
PVince81 Jan 13, 2017 Collaborator

No assertion possible here. This is similar to other tests.
Basically we run the method and no exception is thrown.

@jvillafanez
jvillafanez Jan 17, 2017 Contributor

Checking sebastianbergmann/phpunit-documentation#171 (comment)

A $this->assertNull($this->invokePrivate(....)) might be enough in this case. It's a kind of dummy check that fills the requirements.

@PVince81
Collaborator

No, I don't think so. We have one about orphaned shares, this case is be something similar?

@SergioBertolinSG similar, but not the same. This is about shares that point to deleted groups. Apparently only reproducible with LDAP + caching so far. Would still be good to have tests that confirm that after deleting a group, the matching shares do not appear neither in the PROPFIND nor in the "shared with you" section (OCS Share API query).

@PVince81
Collaborator

@jvillafanez adjusted here c1eea58

PVince81 added some commits Dec 23, 2016
@PVince81 PVince81 Skip null groups in group manager 83318d4
@PVince81 PVince81 Also skip null groups in group manager's search function a9a1e42
@PVince81 PVince81 Add more group null checks in sharing code af97752
@PVince81 PVince81 Add unit tests for null group safety in group manager 6bad08e
@PVince81 PVince81 Add unit tests for sharing code null group checks ee42733
@PVince81 PVince81 Added tests for null groups handling in sharing code 1923e9a
@PVince81 PVince81 Ignore moveShare optional repair in mount provider
In some cases, data is inconsistent in the oc_share table due to legacy
data. The mount provider might attempt to make it consistent but if the
target group does not exist any more it cannot work. In such case we
simply ignore the exception as it is not critical. Keeping the
exception would break user accounts as they would be unable to use
their filesystem.
44874fd
@PVince81 PVince81 Adjust null group handing + tests
56ca62a
@PVince81
Collaborator

@jvillafanez added assertNull() as requested.

@jvillafanez
Contributor

๐Ÿ‘

@PVince81 PVince81 merged commit fe939ed into stable9.1 Jan 17, 2017

3 checks passed

continuous-integration/jenkins/pr-head This commit looks good
Details
continuous-integration/travis-ci/pr The Travis CI build passed
Details
licence/cla Contributor License Agreement is signed.
Details
@PVince81 PVince81 deleted the stable9.1-skipnullgroups branch Jan 17, 2017
@PVince81 PVince81 added a commit that referenced this pull request Jan 17, 2017
@PVince81 PVince81 Skip null groups in group manager (#26871)
* Skip null groups in group manager

* Also skip null groups in group manager's search function

* Add more group null checks in sharing code

* Add unit tests for null group safety in group manager

* Add unit tests for sharing code null group checks

* Added tests for null groups handling in sharing code

* Ignore moveShare optional repair in mount provider

In some cases, data is inconsistent in the oc_share table due to legacy
data. The mount provider might attempt to make it consistent but if the
target group does not exist any more it cannot work. In such case we
simply ignore the exception as it is not critical. Keeping the
exception would break user accounts as they would be unable to use
their filesystem.

* Adjust null group handing + tests
688f06f
@PVince81
Collaborator

Forward port to master: #26956

@PVince81 PVince81 added a commit that referenced this pull request Jan 19, 2017
@PVince81 PVince81 Skip null groups in group manager (#26871) (#26956)
* Skip null groups in group manager (#26871)

* Skip null groups in group manager

* Also skip null groups in group manager's search function

* Add more group null checks in sharing code

* Add unit tests for null group safety in group manager

* Add unit tests for sharing code null group checks

* Added tests for null groups handling in sharing code

* Ignore moveShare optional repair in mount provider

In some cases, data is inconsistent in the oc_share table due to legacy
data. The mount provider might attempt to make it consistent but if the
target group does not exist any more it cannot work. In such case we
simply ignore the exception as it is not critical. Keeping the
exception would break user accounts as they would be unable to use
their filesystem.

* Adjust null group handing + tests

* Fix new group manager tests
58d43f1
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment