Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Alternative SMB external storage implementation where all users of on… #38139

Closed

Conversation

DeepDiver1975
Copy link
Member

@DeepDiver1975 DeepDiver1975 commented Nov 24, 2020

…e share use the same storage id. User specific access is filtered in a cache wrapper by stating the files on SMB

Description

  • all users share the same storage id for the smb share (loggin via user credentials)
  • fileids are constant over users
  • fileid based concepts work (tagging, comments, direct link, file locking, wopi)

ToDo:

  • find a backwards compatible implementation
    • parameter
    • mount point option
    • own backend class
  • test performance
  • update detection needs love - different users with different permissions need to retrigger a scan ..... somehow ....

How Has This Been Tested?

Screenshots (if appropriate):

Same file id is used and as result comments and tags are show properly on both users:

Screenshot from 2020-11-24 12-12-41

File ID is the same - see last query parameter in url

Screenshot from 2020-11-24 12-10-00

Tow users with different permissions see folder content accordingly:

Screenshot from 2020-11-24 12-08-56

Setup option

@pmaier1 @hodyroff - happy to take your recommendations in terms of naming this checkbox ;-)

Screenshot from 2020-11-24 15-20-54

Types of changes

  • Bug fix (non-breaking change which fixes an issue)
  • New feature (non-breaking change which adds functionality)
  • Database schema changes (next release will require increase of minor version instead of patch)
  • Breaking change (fix or feature that would cause existing functionality to change)
  • Technical debt
  • Tests only (no source changes)

Checklist:

  • Code changes
  • Unit tests added
  • Acceptance tests added
  • Documentation ticket raised:
  • Changelog item, see TEMPLATE

@update-docs
Copy link

update-docs bot commented Nov 24, 2020

Thanks for opening this pull request! The maintainers of this repository would appreciate it if you would create a changelog item based on your changes.

@mmattel
Copy link
Contributor

mmattel commented Nov 24, 2020

As you mentioned using redis for caching stat information. I would propose implementing that from the beginning. Just imagine a big photo stock that gets shared by smb...

(PS: I like that kind of conversation. I propose, you work 😎 🍻 )

@DeepDiver1975
Copy link
Member Author

As you mentioned using redis for caching stat information. I would propose implementing that from the beginning. Just imagine a big photo stock that gets shared by smb...

generally speaking (on a high level) there is no difference if I ask for a file listing over the network talking smb or redis protocol with a smb or radis server. Such a cache might be of interest in special setups or on special purpose - nothing to be added as a general concept right away.

(PS: I like that kind of conversation. I propose, you work sunglasses beers )

hehe - I better reach you how to code ;-)

@mmattel
Copy link
Contributor

mmattel commented Nov 24, 2020

Regarding caching, there is a protocol overhead when using smb (smb is a monster) compared to redis. This could be even more a notifyable impact when the request to redis is on the same physical server than on a seperate one. Just my 2c

@DeepDiver1975
Copy link
Member Author

t to redis is on the same physical server

Productive setups require a shared cache which lives on the network ... this is why I said 'over the network' ;-)
And yes I am aware of the different protocol implementations (thats why I said 'on a high level') ;-)

I am not against caches - but they come at a later stage .....

…e share use the same storage id. User specific access is filtered in a cache wrapper by stating the files on SMB
@DeepDiver1975 DeepDiver1975 force-pushed the feature/shared-cache-for-external-storages branch from 0f2401b to 97b1755 Compare November 24, 2020 14:59
@DeepDiver1975 DeepDiver1975 marked this pull request as ready for review November 24, 2020 16:27
@mmattel
Copy link
Contributor

mmattel commented Nov 25, 2020

Q: when rereading the description text, a question raised. It is about "All users share this storage". Does this mean the following: Having existing users and adding new users, when a smb storage has this option set, the mount is accessible to all of them depending on the credentials set (readonly) / given by the host? In other words, having a ownCloud group where you add all users would do the same but with more overhead work?

@jvillafanez
Copy link
Member

We have a couple of options at the moment:

  • Use username+password auth -> this causes that all the users accessing to the mount point will use the same account to access the external storage. As such, ownCloud will use only one storage view for all the ownCloud users, since such view will remain consistent in regards to what each ownCloud user can see.
  • Use session auth and others -> this causes that each ownCloud user to use different credentials to access to the external storage. Since we can't guarantee that all the ownCloud users will see the same files (each external account might have different permissions), each ownCloud user will use its own storage view.

All of this is bound to the specific storage id the implementation provides, which usually depends on the external account that is used to access to the external storage.

For the case of this PR, what we're doing is that, having different external accounts, we'll still use the same storage view for all the ownCloud users. The limitations exposed above should be mitigated by checking the access to the storage more often.
I guess we're trading DB storage space with performance, because with this change we'll need to check if the user can access to the file returned by the filecache.
The additional advantage is that all the metadata stored in the filecache will also be shared with the users. Comments, tags, etc will be also visible to all the users.

@jvillafanez
Copy link
Member

@DeepDiver1975 could you double-check the following use case?:

  1. Having "/SMB/a/b/c/d" in the backend, not cached yet in ownCloud
  2. Ensure user1 DOESN'T have access to "/SMB/a/b"
  3. Ensure user2 does have access to "/SMB/a/b" and inner folders
  4. With user1, access to "/SMB/a". Folder "/SMB/a/b" shouldn't be visible because user1 doesn't have access there
  5. With user2, try to access to "/SMB/a/b/c/d".

Is step 5 possible under those conditions? At step 4 the "/SMB/a" folder should be present in the filecache. When the user2 access to "/SMB/a" the mtime won't have changed and we'll get the cached contents. The problem is that those contents will be empty because user1 couldn't scan further, in particular "/SMB/a/b" won't appear because it isn't in the cache yet.

@jvillafanez
Copy link
Member

I guess that falls into the "update detection" checkbox in the TODO.

@DeepDiver1975
Copy link
Member Author

Is step 5 possible under those conditions? At step 4 the "/SMB/a" folder should be present in the filecache. When the user2 access to "/SMB/a" the mtime won't have changed and we'll get the cached contents. The problem is that those contents will be empty because user1 couldn't scan further, in particular "/SMB/a/b" won't appear because it isn't in the cache yet.

I already tested this scenario - and you are absolutely right - because the mtime did not change no scan happens and the 'new' folders/files do not show up.
Simplistic solution would be to return always true in hasUpdated() - but I fear this will be a too big performance penalty ....

A more feasible solution might be to store the mtime for each folder per user and use this value for comparison in hasUpdated()
store in redis/memcache: userid -> hash(path) -> mtime

Objections? @jvillafanez

@jvillafanez
Copy link
Member

There could still be some overload. When a change is detected, each user will trigger a scan even if it isn't strictly needed: user1 changes a file and user2, user3, user4.... will update the filecache even if the information is the same for them (same access restrictions for all of them)

Definitely better than triggering a filecache update always, but we'll have to maintain an additional structure in redis. We'll need to put some limits somehow because I don't think we can store the whole SMB storage in redis multiplied by the number of users, specially if there are other things stored there.

There could be some alternatives if we add ACLs into the mix, but it will add even more complexity to the solution. This seems a reasonable approach

@mmattel
Copy link
Contributor

mmattel commented Nov 26, 2020

When a change is detected, each user will trigger a scan even if it isn't strictly needed

Q: what triggers a change detect and can a scan be marked to be in progress so other scan request for the same can be rejected?

@DeepDiver1975
Copy link
Member Author

There could still be some overload. When a change is detected, each user will trigger a scan even if it isn't strictly needed: user1 changes a file and user2, user3, user4.... will update the filecache even if the information is the same for them (same access restrictions for all of them)

Definitely better than triggering a filecache update always, but we'll have to maintain an additional structure in redis. We'll need to put some limits somehow because I don't think we can store the whole SMB storage in redis multiplied by the number of users, specially if there are other things stored there.

There could be some alternatives if we add ACLs into the mix, but it will add even more complexity to the solution. This seems a reasonable approach

But: if a user accesses a smb share with a file explorer - this will also talk to the smb server and this consumes time.
Why are we expecting to be faster? ;-)

Maybe it is just okay that the user has to wait a short while and in return proper information is displayed ....

@sonarcloud
Copy link

sonarcloud bot commented Nov 26, 2020

Kudos, SonarCloud Quality Gate passed!

Bug A 0 Bugs
Vulnerability A 0 Vulnerabilities
Security Hotspot A 0 Security Hotspots
Code Smell A 0 Code Smells

No Coverage information No Coverage information
No Duplication information No Duplication information

@jvillafanez
Copy link
Member

Maybe it is just okay that the user has to wait a short while and in return proper information is displayed ....

Having the hasUpdated method return true will cause problems:

  1. user1 scans all the way to "/SMB/a/b/c/d"
  2. user1 comments / tags .... in "/SMB/a/b/c"
  3. user2 enters in "/SMB/a"
  4. Since user2 doesn't have permissions to access to "/SMB/a/b", the folder is removed from the cache
  5. user1 has lost the comments on "/SMB/a/b/c" because the fileid will be different

This brings us to the next problem: what happens if user2 (who has less permissions) has to scan a folder? This could also happen in the previous solution, although less frequently.

Maybe we should use a service account instead for the scanning. This way we ensure that the scans remain stable. We can still use the user accounts in order to check access.

@DeepDiver1975
Copy link
Member Author

Very valid point - perform scanning of the full smb share via cron? Is that reasonable in terms of execution time?

@DeepDiver1975 DeepDiver1975 marked this pull request as draft November 26, 2020 13:52
@jvillafanez
Copy link
Member

I was thinking about using 2 different connections: one for the "normal" access which will use the service account, and a different one for the storage cache using the user account.
Not sure if it will be too complex to adapt the current code for this, but the idea might work.

@DeepDiver1975
Copy link
Member Author

I was thinking about using 2 different connections: one for the "normal" access which will use the service account, and a different one for the storage cache using the user account.

(wnd only) or we leave this to the notificatoin process - this already has a service account and pushes file changes to owncloud.

@DeepDiver1975
Copy link
Member Author

superseded by #38151

@DeepDiver1975 DeepDiver1975 deleted the feature/shared-cache-for-external-storages branch November 30, 2020 08:51
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants