Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Do not cache file ids in FileSystemTags inside group folders #28774

Merged
merged 1 commit into from
Sep 14, 2021

Conversation

st3iny
Copy link
Member

@st3iny st3iny commented Sep 9, 2021

The numerical storage id returned by a cache might not be unique because it is passed down the CacheWrapper chain. Multiple caches may return the same numerical id. Additionally, when combined with the CacheJail wrapper multiple files may be mapped to the same numerical id + path combination even if they represent separate entries in the file cache.

In this case the first inserted file id for the combination of numerical id and path "wins" and blocks any subsequent insertions. This will lead to invalid tags being returned.

How to reproduce:

  1. Install groupfolders and files_accesscontrol.
  2. Create a group folder "Groupfolder".
  3. Create another group folder nested inside the first one "Groupfolder/folder-1".
  4. Create a system tag "AccessDenied".
  5. Add a workflow to block access to all files with "AccessDenied" tag.
  6. Tag the inner group folder at "Groupfolder/folder-1" with the new tag.
  7. Try to access the inner folder at "Groupfolder/folder-1".

Expected: The access should be blocked.
Actual: You can navigate to "Groupfolder/folder-1" and list its contents.

@st3iny st3iny added bug 3. to review Waiting for reviews labels Sep 9, 2021
@st3iny st3iny self-assigned this Sep 9, 2021
@ChristophWurst ChristophWurst requested review from a team and PVince81 and removed request for a team September 9, 2021 15:45
Copy link
Member

@ChristophWurst ChristophWurst left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍 🐘

@blizzz
Copy link
Member

blizzz commented Sep 9, 2021

@nickvergessen do you remember which motivation the younger @nickvergessen had when doing so in 2016? Performance (significant?) or side effects (loops?), or other?

@skjnldsv skjnldsv added this to the Nextcloud 23 milestone Sep 14, 2021
@nickvergessen
Copy link
Member

nickvergessen commented Sep 14, 2021

nickvergessen do you remember which motivation the younger nickvergessen had when doing so in 2016? Performance (significant?) or side effects (loops?), or other?

Yes basically. The paths are called X times for every request and repeated with the same path over and over again, because we do things like:

$file = $folder->get('path'); // Runs it once (2+ queries per nesting level)
if ($file->isWriteable()) { // Runs it again (2+ queries per nesting level)
$file->putContent('c'); // Runs it again (2+ queries per nesting level)

etc.

Building the system tag list every time again and again is very bad performancewise. But we also only do it on the owner level, so it worked with shares although there were JailCache applied.

But then again, also for a groupfolder the parents should be the same for everyone? Or can they be mounted to any nesting level? In that case I wouldn't even know which users tags should be considered. So maybe just ignore the cached fileIds when it's a group folder for now, so it continues to work for the rest, while we try to find a performant solution for groupfolders with the expected result.

@st3iny st3iny added 2. developing Work in progress and removed 3. to review Waiting for reviews labels Sep 14, 2021
@st3iny st3iny force-pushed the fix/noid/find-system-tags-of-nested-groupfolders branch from bbedc83 to 9203870 Compare September 14, 2021 09:31
@st3iny st3iny changed the title Do not cache file ids in FileSystemTags Do not cache file ids in FileSystemTags inside group folders Sep 14, 2021
@st3iny
Copy link
Member Author

st3iny commented Sep 14, 2021

So maybe just ignore the cached fileIds when it's a group folder for now, so it continues to work for the rest, while we try to find a performant solution for groupfolders with the expected result.

I refactored it to behave this way. File ids will be cached for any storage other than group folders.

@st3iny st3iny added 3. to review Waiting for reviews and removed 2. developing Work in progress labels Sep 14, 2021
@st3iny st3iny force-pushed the fix/noid/find-system-tags-of-nested-groupfolders branch from 9203870 to 1bb022e Compare September 14, 2021 09:50
@st3iny st3iny force-pushed the fix/noid/find-system-tags-of-nested-groupfolders branch from 1bb022e to d434e7f Compare September 14, 2021 10:15
@st3iny
Copy link
Member Author

st3iny commented Sep 14, 2021

/backport to stable22

@st3iny
Copy link
Member Author

st3iny commented Sep 14, 2021

/backport to stable21

@st3iny
Copy link
Member Author

st3iny commented Sep 14, 2021

/backport to stable20

@st3iny st3iny force-pushed the fix/noid/find-system-tags-of-nested-groupfolders branch from d434e7f to 6a3118c Compare September 14, 2021 12:37
Signed-off-by: Richard Steinmetz <richard@steinmetz.cloud>
@st3iny st3iny force-pushed the fix/noid/find-system-tags-of-nested-groupfolders branch from 6a3118c to 8bd8e97 Compare September 14, 2021 13:11
@st3iny st3iny merged commit 9501e1b into master Sep 14, 2021
@st3iny st3iny deleted the fix/noid/find-system-tags-of-nested-groupfolders branch September 14, 2021 14:04
CarlSchwan added a commit that referenced this pull request Jan 7, 2022
In #28774 we disabled the
caching for the groupfolder application since it worked due to the fact
that in groupfolders, getFileIds could be called with the same $cacheId
and path for actually different groupfolders.

This revert this change and instead add the folderId from the
groupFolder to the cacheId. This solve the issue of the uniqueness of
the cacheId inside GroupFolder. Downside is that we introduce
groupfolder specific implementation inside the server repo.

The seconf optimization is to not consider paths starting with
__groupfolders in executeCheck. This is due to the fact that files in
the groupfolder application call two times executeCheck one time with
the url __groupfolder/<folderId>/<path> and the other time with <path>.
The first time will always return an empty systemTags array while the
second call will return the correct system tags.

Signed-off-by: Carl Schwan <carl@carlschwan.eu>
CarlSchwan added a commit that referenced this pull request Jan 12, 2022
In #28774 we disabled the
caching for the groupfolder application since it worked due to the fact
that in groupfolders, getFileIds could be called with the same $cacheId
and path for actually different groupfolders.

This revert this change and instead add the folderId from the
groupFolder to the cacheId. This solve the issue of the uniqueness of
the cacheId inside GroupFolder. Downside is that we introduce
groupfolder specific implementation inside the server repo.

The seconf optimization is to not consider paths starting with
__groupfolders in executeCheck. This is due to the fact that files in
the groupfolder application call two times executeCheck one time with
the url __groupfolder/<folderId>/<path> and the other time with <path>.
The first time will always return an empty systemTags array while the
second call will return the correct system tags.

Signed-off-by: Carl Schwan <carl@carlschwan.eu>
CarlSchwan added a commit that referenced this pull request Jan 13, 2022
In #28774 we disabled the
caching for the groupfolder application since it worked due to the fact
that in groupfolders, getFileIds could be called with the same $cacheId
and path for actually different groupfolders.

This revert this change and instead add the folderId from the
groupFolder to the cacheId. This solve the issue of the uniqueness of
the cacheId inside GroupFolder. Downside is that we introduce
groupfolder specific implementation inside the server repo.

The seconf optimization is to not consider paths starting with
__groupfolders in executeCheck. This is due to the fact that files in
the groupfolder application call two times executeCheck one time with
the url __groupfolder/<folderId>/<path> and the other time with <path>.
The first time will always return an empty systemTags array while the
second call will return the correct system tags.

Signed-off-by: Carl Schwan <carl@carlschwan.eu>
backportbot-nextcloud bot pushed a commit that referenced this pull request Jan 14, 2022
In #28774 we disabled the
caching for the groupfolder application since it worked due to the fact
that in groupfolders, getFileIds could be called with the same $cacheId
and path for actually different groupfolders.

This revert this change and instead add the folderId from the
groupFolder to the cacheId. This solve the issue of the uniqueness of
the cacheId inside GroupFolder. Downside is that we introduce
groupfolder specific implementation inside the server repo.

The seconf optimization is to not consider paths starting with
__groupfolders in executeCheck. This is due to the fact that files in
the groupfolder application call two times executeCheck one time with
the url __groupfolder/<folderId>/<path> and the other time with <path>.
The first time will always return an empty systemTags array while the
second call will return the correct system tags.

Signed-off-by: Carl Schwan <carl@carlschwan.eu>
backportbot-nextcloud bot pushed a commit that referenced this pull request Jan 14, 2022
In #28774 we disabled the
caching for the groupfolder application since it worked due to the fact
that in groupfolders, getFileIds could be called with the same $cacheId
and path for actually different groupfolders.

This revert this change and instead add the folderId from the
groupFolder to the cacheId. This solve the issue of the uniqueness of
the cacheId inside GroupFolder. Downside is that we introduce
groupfolder specific implementation inside the server repo.

The seconf optimization is to not consider paths starting with
__groupfolders in executeCheck. This is due to the fact that files in
the groupfolder application call two times executeCheck one time with
the url __groupfolder/<folderId>/<path> and the other time with <path>.
The first time will always return an empty systemTags array while the
second call will return the correct system tags.

Signed-off-by: Carl Schwan <carl@carlschwan.eu>
backportbot-nextcloud bot pushed a commit that referenced this pull request Jan 14, 2022
In #28774 we disabled the
caching for the groupfolder application since it worked due to the fact
that in groupfolders, getFileIds could be called with the same $cacheId
and path for actually different groupfolders.

This revert this change and instead add the folderId from the
groupFolder to the cacheId. This solve the issue of the uniqueness of
the cacheId inside GroupFolder. Downside is that we introduce
groupfolder specific implementation inside the server repo.

The seconf optimization is to not consider paths starting with
__groupfolders in executeCheck. This is due to the fact that files in
the groupfolder application call two times executeCheck one time with
the url __groupfolder/<folderId>/<path> and the other time with <path>.
The first time will always return an empty systemTags array while the
second call will return the correct system tags.

Signed-off-by: Carl Schwan <carl@carlschwan.eu>
backportbot-nextcloud bot pushed a commit that referenced this pull request Jan 14, 2022
In #28774 we disabled the
caching for the groupfolder application since it worked due to the fact
that in groupfolders, getFileIds could be called with the same $cacheId
and path for actually different groupfolders.

This revert this change and instead add the folderId from the
groupFolder to the cacheId. This solve the issue of the uniqueness of
the cacheId inside GroupFolder. Downside is that we introduce
groupfolder specific implementation inside the server repo.

The seconf optimization is to not consider paths starting with
__groupfolders in executeCheck. This is due to the fact that files in
the groupfolder application call two times executeCheck one time with
the url __groupfolder/<folderId>/<path> and the other time with <path>.
The first time will always return an empty systemTags array while the
second call will return the correct system tags.

Signed-off-by: Carl Schwan <carl@carlschwan.eu>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
3. to review Waiting for reviews bug
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants