Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

exclude directories from beeing processed #16534

Closed
wants to merge 2 commits into from

Conversation

mmattel
Copy link
Contributor

@mmattel mmattel commented May 22, 2015

Summary:

  1. Enterprise Storage Systems are capable of Snapshots. These Snapshots are directories with a particular name and keep point in time views of the data. Usually, these snapshot directories are present in each directory
  2. There is no common naming for these directories and will never be. NetApp uses .snapshot and ~snapshot, EMC eg .ckpt, HDS eg .latest and ~latest and so on
  3. Viewing and scanning of these directories does not make any sense as these directories are used to restore files from an OS point of view
  4. You can also have hardlinks to backups with the same background as described in Exlude particular directories from beeing viewed, accessed and scanned  #15385
  5. Scanning a huge datastore containing snapshots does mean that for each snapshot taken ALL files of the datastore are imported into the database again. Example: a datastore containing 200K files, having 15 snapshots means scanning of 3M files where 2.8M are worthless to be imported

There are for sure more resons, but I tried to keep it short.

With this PR, you can define a array of directory names in config.php which are further excluded from beeing processed.

These excluded directory names are not scanned, not viewed, can not be created or renamed (or deleted) or accessed via direct path input from a file explorer.

Access methods covered: webdav, webgui and client.

Following test have been made creating either a file or directory having once a allowed and once a excluded name. The results were the same independent of the storage type (tested on user home and SMB).

The functions shown in the table are except one all triggered and covered by the access methods used in lib/private/files/view.php

For WebDav, CyberDuck and native WebDav access was used on a W7 client.
The IOS client ran the latest available version (3.4.1)

All access tests made passed well.

The table shows the covered methods for all storage types using it:

webdav webgui client IOS
directory create mkdir mkdir mkdir
directory rename rename rename rename
directory delete rmdir rmdir rmdir
file create put(*) touch n.a.
file rename rename rename rename
file delete unlink unlink unlink

n.a. ... function not available
(*) put ... this function is not handled in lib/private/files/view.php but in lib/private/connector/sabre/directory.php which points to ../sabre/file.php

The case manually defining the path in a file explorer to access a excluded directory via a webdav connected client is also covered. (lib/private/connector/sabre/objecttree.php and ../sabre/custompropertiesbackend.php)

Adding a new external storage and let it scan also worked well, excluded directories are not further processed.

@DeepDiver1975 @karlitschek @nickvergessen @Aesculapius @icewind1991 @LukasReschke

(sorry to create another PR, but a rebase messed everything up badly...)

@karlitschek
Copy link
Contributor

I must say that I like this. 👍

*/
'excluded_directories' =>
array (
0 => '.snapshot',
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

no index please

@mmattel
Copy link
Contributor Author

mmattel commented May 22, 2015

changes according @nickvergessen suggestions
squashed

@PVince81
Copy link
Contributor

So I guess this would not have worked with the "blacklisted_files" option ?

Did you check if encryption still works ? I believe the checks for excluded dirs might need to be added there too.

I also wonder if this should be more than just dirs, and also files. But then it sounds again like "blacklisted_files".

@mmattel
Copy link
Contributor Author

mmattel commented May 22, 2015

Just from the naming side, there is a difference between Blacklisted and Excluded...

isBlacklisted:
--> Listens to write and rename hooks only, calls the function isFileBlacklisted
--> triggered in a late stage

isExcludedDir:
--> Like isFileBlacklisted but is in extended use to cover scanning and access via webdav when the path is entered directly
--> early calling, mostyl direct after the parent function is called

The point is, that the hook does not cover all cases we need and is always in a "late" stage

From the introduction:

These excluded directory names are not scanned, not viewed, can not be created or renamed (or deleted) or accessed via direct path input from a file explorer.

If you take a look into the code you will see, that even you have the hook, the function isFileBlacklisted itself is called quite often and always before the hook gets triggered. imho the hook is late while the function call is used for early checks, but this is just my 2c
I think we miss a precheck condition hook system for filesystem operations for early checks, but this is a different story :-)

@mmattel
Copy link
Contributor Author

mmattel commented May 22, 2015

Regarding encryption:

From: https://github.com/owncloud/documentation/blob/master/user_manual/files/encrypting_files.rst
"Only the data in your files is encrypted, and not the filenames or folder structures."

As isExcludedDir is dealing with the structure but not the contents, nothing to do special.

@mmattel
Copy link
Contributor Author

mmattel commented May 24, 2015

I extended the code to cover one possibility I missed,
only necessary for webdav access with the ability to enter the path manually:

old: /dir/sub/sub/.snapshot
new: /dir/sub/sub/.snapshot/nightly.0

In the old version, the check was only returning true when the excluded directory name was at the last position.

In the new version, the check will return true when the excluded directory name is at any position.

So if someone knows the substructure of .snapshot, he will now not be able to get into.

Directories and files named eg myname.snapshot.me are allowed to be created.

} else {
// search at any position of $dir, eg webdav - when path is entered manually
foreach($excluded as $keyword) {
if (stripos($dir, $keyword) !== false) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This check wrongly denies hello.snapshot
You need to pre- and append slashes to $keyword before checking for it?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done and squashed

@nickvergessen
Copy link
Contributor

Please also add test for it?

@mmattel mmattel force-pushed the exclude_directories_III branch 3 times, most recently from 132a02d to 07f3234 Compare May 26, 2015 06:44
@PVince81
Copy link
Contributor

Thanks for the explanation @mmattel. So if I understand well, excluded dirs can still be accessed when requested directly, but would not be synced because a PROPFIND on the parent folder would not list them ?
I understand that the purpose of isExcludedDir is different than isBlacklisted. This should be properly documented once this PR is merged, as I can see the potential confusion for developers reading the code.

@mmattel
Copy link
Contributor Author

mmattel commented May 26, 2015

@PVince81

assumtion: excluded directory name is .snapshot

allowed:
my.snapshot
my.snapshot.name
.snapshot.name
/dir/my.snapshot
/dir/my.snapshot/subsubdir
...

excluded:
.snapshot
/dir/.snapshot
/dir/.snapshot/nightly.0

@nickvergessen gave me a goot hint for a case I did not cover before, the PR is already updated with the implementation.

If you take a look into the coding you see, that I explode the path handed over into components seperated by '/' and do a array match against the excluded directories array. (Note: if there is no '/' present, explode returns the name to be matched against). Only if it exactly matches, I return true. This covers all possible cases as shown above.

I will of course do a proper explanation with a documentation PR

@mmattel
Copy link
Contributor Author

mmattel commented May 26, 2015

@owncloud-bot retest this please

@PVince81
Copy link
Contributor

Thanks. As we are past feature freeze, I'm setting the milestone to 8.2.

CC @MTRichards for this nice addition

@PVince81 PVince81 added this to the 8.2-next milestone May 26, 2015
@@ -588,6 +588,26 @@ static public function isIgnoredDir($dir) {
}

/**
* check if the directory should be excluded
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you add a more detailed explanation here what "excluded" means ?
Developers reading this should be able to understand the different of isExcludedDir from isBlacklisted, as you explained in your github comment.

@mmattel
Copy link
Contributor Author

mmattel commented May 26, 2015

@PVince81
I have an idea, but I would like to discuss that outside first. Is there an ability to contact you without IRC eg eMail?

@MTRichards
Copy link
Contributor

This is great!

@mmattel
Copy link
Contributor Author

mmattel commented May 26, 2015

Based on a comment from @PVince81, I did some rethinking if it would be possible or a good idea to combine isBlacklisted / isFileBlacklisted and this PR. When I created this PR, I did not wanted to change the current isBlacklisted stuff beeing afraid something could break, but now... In an nutshell, yes this would work and would bring some benefits. In my testsystem, I did all the adoptions and I have to say that this new idea is imho a better approach than the current one, and it does not break anything.

Why I did that rethinking:
isBlacklisted and isExcludedDir are similar, but have a different root causes.
From the code itself they look if a file or path matches a given pattern. isExcludedDir is proof for more possibilities as it looks in case into the complete path for a match (webdav proof) while isBlacklisted looks only at the basename but is used at more places. On the other hand, isExcludedDir is also used at the scanner to avoid scanning of excluded directories ect. And the naming is of course also important for the meaning behind.

Here is what I did very briefly: (on my testsystem)

  • rename the function isBlacklisted (hook) to a more neutral name (isForbiddenFileOrDir_Hook)
  • add the functionality of function isFileBlacklisted into isExcludedDir
  • rename isExcludedDir to isForbiddenFileOrDir
  • remove the code of isFileBlacklisted but call isForbiddenFileOrDir to get a return value. isFileBlacklisted will stay but gets a Depriciated note.
  • rename the function call names from isExcludedDir to isForbiddenFileOrDir in all files where it is used and isFileBlacklisted is not called
  • rename the function call namess from isFileBlacklisted to isForbiddenFileOrDir in all files (8 files in core)
  • currently there is only one app outside core that uses isFileBlacklisted = /apps/contacts/lib/controller/importcontroller.php. Contacts can be changed seperately is easy and secure because isFileBlacklisted gets for the time beeing re-routed to isForbiddenFileOrDir.
  • remove function calls isExcludedDir in view.php as they are not needed anymore

Benefits: code harmonisation, better readability, better functionality and all the good things from both sides are kept

My request: I please you for your comments about this idea and if I should revise this PR or open another PR so you can do a comparison

(I think it is better to ask upfront to make the base idea proof and secure)

@mmattel
Copy link
Contributor Author

mmattel commented May 30, 2015

just rebased (and it worked, I think I got it now hwo to do it... 😄 )

@nickvergessen
Copy link
Contributor

@mmattel please add an issue in the documentation repository and schedule it for 8.2 as well, so we don't forget to document it ;)

@mmattel
Copy link
Contributor Author

mmattel commented Jun 1, 2015

Will do, may take some days as I am off for one week.
@nickvergessen, may I ask you for your comment on the idea I documented above?
A response from the experts would be really helpful.

@@ -277,7 +277,9 @@ protected function getNewChildren($folder) {
if (is_resource($dh)) {
while (($file = readdir($dh)) !== false) {
if (!Filesystem::isIgnoredDir($file)) {
$children[] = $file;
if (!Filesystem::isExcludedDir($file)) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if (!Filesystem::isIgnoredDir($file) && !Filesystem::isExcludedDir($file)) {

to avoid nesting too deep 🐙

@mmattel
Copy link
Contributor Author

mmattel commented Sep 9, 2015

@PVince81
I found the cause of the bug. It are differnent code paths for user local and eg files external (eg smb shares). I am working on and have already fixed it for files_external, next is user local.
Note: this should not be the case having different code paths. All code access to storage should be independent of it´s type and run thru the same codebase and fork afterwards... Just my 2c

@MorrisJobke
Copy link
Contributor

cc @Xenopathic

@mmattel
Copy link
Contributor Author

mmattel commented Sep 9, 2015

I need your support.

Before searching endless, here is the problem:
Having a non local storage added (eg SMB), I can define a exluded directory name in it and it will not be shown in the browser and via WebDav. If I remove the excluded name from config.php, the folder shows up again - fine.
BUT: if I do the same on the users local space, means no use of any external mounts, the directory will show up in the browser and on WebDav independent the excluded directory setting made.

Question: where is the coding made for local files and folders where I can hook in to prevent the access to excluded directories?

@RobinMcCorkell
Copy link
Member

Hmm, perhaps this is something to do with the filesystem check parameter? It is set to 0 for the local storage, meaning any external updates will not be detected, but is set to 1 on external storages, meaning it gets checked once per access. Try setting that parameter to 1 in config.php (sorry, I've forgotten its complete name) and see if the behaviour becomes consistent.

@mmattel
Copy link
Contributor Author

mmattel commented Sep 10, 2015

That is config.php parameter filesystem_check_changes.
"Specifies how often the filesystem is checked for changes made outside ownCloud"
I have tried it with 0, 1 and 2 but there is no difference in the behaviour regarding this PR.

Need to identify the coding place...

@mmattel
Copy link
Contributor Author

mmattel commented Sep 18, 2015

@owncloud-bot retest this please

@mmattel
Copy link
Contributor Author

mmattel commented Sep 18, 2015

@PVince81
Q: why do the checks not start after I have updated ?

@mmattel mmattel force-pushed the exclude_directories_III branch 2 times, most recently from 0f85872 to 7bdcc8f Compare September 18, 2015 15:52
@mmattel
Copy link
Contributor Author

mmattel commented Sep 18, 2015

@owncloud-bot retest this please

@nickvergessen
Copy link
Contributor

@mmattel @owncloud-bot retest this please doesn't help anymore. You need to rebase your branch and force push

extended case: search at any place of the path given

adding case insensitiveness

added suggestions and improvements

removed unnecessary function parameter

new approach according the rethinking

fixed code and unit tests

update comments

improved code, added calls for trashbin copyFromStorage
@mmattel
Copy link
Contributor Author

mmattel commented Sep 21, 2015

@nickvergessen I just rebased and force pushed, but checking did not start...
I tried it for some days, everyday a rebase and force push, but no success. Any ideas how to proceed?

@DeepDiver1975
Copy link
Member

The checks as of now only work on branches within the core repo.

We need to push this .....

@mmattel
Copy link
Contributor Author

mmattel commented Sep 21, 2015

Anything I have to / can do?

@nickvergessen
Copy link
Contributor

@mmattel become a core contributor by signing the agreement and then push your branches to owncloud/core instead of mmattel/core

@karlitschek
Copy link
Contributor

@nickvergessen I can confirm that we have a contributor agreement from @mmattel He is in the right group as the tag "contributor" shows.

@DeepDiver1975
Copy link
Member

@nickvergessen I can confirm that we have a contributor agreement from @mmattel He is in the right group as the tag "contributor" shows.

so @mmattel please clone owncloud/core, create a branch with these changes within the clone and push that branche - CI will start with the checks right away - THX

@mmattel mmattel mentioned this pull request Sep 22, 2015
@mmattel mmattel deleted the exclude_directories_III branch January 4, 2016 17:29
@lock lock bot locked as resolved and limited conversation to collaborators Aug 8, 2019
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

8 participants