-
Notifications
You must be signed in to change notification settings - Fork 667
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Exclude regular expression improvements #6200
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I only looked at the first commit so far.
👎 Because I don't want to introduce more dependency between the syncengine and the config file. I want to move ConfigFile to gui, because i don't want the tools or test to mess or depend on the user's config file
Also, it is a bit strange that the ExcludeFiles
class goes in a csync_xxx file, i'd rather it to be named excludesfiles.{cpp,h} and not have a mix of C-style code and proper C++ in the same file.
assert_int_equal(rc, 0); | ||
rc = _csync_exclude_add(&(csync->excludes), "latex/*/*.tex.tmp"); | ||
assert_int_equal(rc, 0); | ||
excludedFiles->addManualExclude("*.💩"); // is this source file utf8 encoded? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes!
src/libsync/syncengine.cpp
Outdated
@@ -26,6 +26,7 @@ | |||
#include "propagateremotedelete.h" | |||
#include "propagatedownload.h" | |||
#include "common/asserts.h" | |||
#include "configfile.h" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we avoid dependency between the syncengine and the config file?
Ideally all the config should be handled by the app.
Please put that in the Folder instead.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, absolutely.
* | ||
* See ExcludedFiles in csync_exclude. | ||
*/ | ||
std::function<CSYNC_EXCLUDE_TYPE(const char *path, int filetype)> exclude_traversal_fn; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why not put a (smart?) pointer to the ExcludeFiles here and call the relevant function, instead of using this hook-like indirection?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I prefer tying ExcludedFiles and csync together through this function because it's more composable/decoupled. Now csync only ever cares about the traversal exclude match function which makes it easy to hook in a different exclude function for testing.
src/csync/csync_exclude.h
Outdated
*/ | ||
bool reloadExcludeFiles(); | ||
|
||
#ifdef CSYNC_TEST |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
add a friend?
(This breaks ODR, and will not work on windows because it mangle the access in the symbol name)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
src/csync/csync_exclude.cpp
Outdated
if (OCC::Utility::fsCasePreserving()) | ||
patternOptions |= QRegularExpression::CaseInsensitiveOption; | ||
_regex.setPatternOptions(patternOptions); | ||
_regex.optimize(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why pass the OptimizeOnFirstUsageOption flag if you optimize now?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That was kept from the initial version. I agree that it can be removed.
src/csync/csync_exclude.cpp
Outdated
_nonRegexExcludes.clear(); | ||
|
||
// Start out with regexes that would match nothing | ||
QString exclude_only = "a^"; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe this can be done at the end (if exclude_only.isEmpty()) exclude_only = "a^";
)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can do.
src/csync/csync_exclude.cpp
Outdated
QString patternFileDirRemove = "a^"; | ||
QString patternDirKeep = "a^"; | ||
QString patternDirRemove = "a^"; | ||
auto regexAppend = [](QString &pattern, QString v) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
const QString &v
match = CSYNC_FILE_SILENTLY_EXCLUDED; | ||
goto out; | ||
if (blen == 11) { | ||
rc = csync_fnmatch("Desktop.ini", bname, 0); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why not bname == "Desktop.ini" (with Qt::CaseInsensitive for file system that needs it) ?
But also, why should we not put this and all other builtin in the regexp?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, that's definitely worthy of further simplification. Some of the patterns above have similar opportunities. Adding to the regex doesn't work easily since these want to be SILENTLY_EXCLUDED.
@ogoffart Addressed comments (separate commits for easy checking, will be rebased away before merge) |
src/csync/csync_exclude.cpp
Outdated
@@ -702,7 +702,7 @@ void ExcludedFiles::prepare() | |||
+ "(?:^|/)(?:" + bnameFileDirRemove + "|" + bnameDirRemove + ")(?:$|/)" | |||
+ ")"); | |||
|
|||
QRegularExpression::PatternOptions patternOptions = QRegularExpression::OptimizeOnFirstUsageOption; | |||
QRegularExpression::PatternOptions patternOptions = QRegularExpression::NoPatternOption; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually, i would have kept this flag, but removed the optimize call.
But it does not matter anyway.
Make ExcludedFiles something that is instantiated outside of the CSYNC context and then given to it as a hook. ExcludedFiles still lives in csync_exclude and the internal workings haven't been touched.
Improves full matches by more than an order of magnitude and also improves speed of traversal matches by roughly 20%, judging by the check_csync_exclude performance test.
Neither the documentation at http://pubs.opengroup.org/onlinepubs/009695399/utilities/xcu_chap02.html#tag_02_13 nor the fnmatch source code in <glibc>/posix/fnmatch_loop.c points toward { causing special matching behavior.
Now all exclude patterns can be translated to regular expressions.
93da62c
to
d1d2765
Compare
Thanks a lot @ckamm for taking this up again :) |
@guruz Current benchmarks on my machine.
The Note that these values are for a high-end linux machine. Speedups were even larger when we profiled windows/osx during the initial investigation. |
Impressive :-) FYI @michaelstingl |
A collection of refactor and improvement of the regex-based codepath. Calls to
fnmatch
are entirely gone.