New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
common: Cache Highlight/Ignore/nick rules, port to QRegularExpression, fixes #415
common: Cache Highlight/Ignore/nick rules, port to QRegularExpression, fixes #415
Conversation
07fb3d8
to
3b2cfdc
Compare
Outdated notes to self on work-in-progress version
|
3b2cfdc
to
3d1a6d9
Compare
Outdated notes to self on work-in-progress version
|
6f6f1fd
to
461a2a5
Compare
Tentative working draft is here! Just need to do all the documentation and testing... Aside, the |
c412436
to
226117c
Compare
Haven't been able to get fully into testing this but gave it a quick try with both client/core updated. Also to note, my test core doesn't have enough backlog/large enough channels to really test the time to load backlog issue. I'll try to remedy that for more in-depth testing. Initial good findings:
Noted concerns (minor/thoughts):
|
Fix ignorelisteditdlg.ui's reference to oxygen_icons.qrc, pointing it to the new location.
226117c
to
cf53160
Compare
@genius3000 I've addressed the warning issue (by removing it). It's not a crashing issue, it just means that the ignore rule won't ever match. Quassel never warned about this in the past, so it should be fine to leave as is. Supporting wildcard for local/remote highlights is a nice idea, and now's the time to do any backwards-incompatible protocol breaking and string changes before Thank you for your initial testing! I've got my own testing and documentation to write, then this should be okay... |
cf53160
to
e8933e0
Compare
3e48e36
to
666cf88
Compare
2018-9-1 - More testing is welcome!I tested the basics (highlight match, nick match, ignore rules), and did more piecemeal tests during development, but I haven't extensively tried patterns outside of |
98bd576
to
8475e2a
Compare
Made a small typo in a test case and pull request documentation (regex Edit 2018-9-2: Fixed typos in comments, no change to functionality or tests (fixed description of wildcard setting of internal regex objects). |
Add ExpressionMatch class to unify handling of various search expressions in Quassel, including automatically caching the regular expression instances and providing Qt 4/5 compatibility. The source expression depends on the matching mode: * ExpressionMatch::MatchPhrase Match the entire phrase, looking for whitespace or beginning/end around either side of the phrase. No further processing. * ExpressionMatch::MatchMultiPhrase Same as MatchPhrase, but split the expression on newlines ("\n") and treat as match successful if any phrase is found. This avoids having to create multiple ExpressionMatch classes just to match multiple phrases. * ExpressionMatch::MatchWildcard Split on ";" and newlines ("\n"), and apply basic wildcard globbing, with "*" representing any characters and "?" a single character. Prefixing a section with "!" turns it into an invert-match, negating any other matching rules. If only invert-match rules exist, matching is true unless an invert-rule applies. "\[...]" escapes the given character. * ExpressionMatch::MatchRegEx Treat expression as a regular expression, inverting if prefixed with "!" (and not escaped as "\!"). Cached regular expression objects are updated whenever changing any parameters. When Qt 4 support is dropped, the QT_VERSION macros can be adjusted. This lays the foundation for performance and readibility improvements in future commits.
Add ExpressionMatchTests class to test the functionality of ExpressionMatch, helping ensure it works across Qt 4 and Qt 5. This is implemented as a series of Q_ASSERT()'s as Quassel does not yet have an actual testing framework. Once a test framework is settled on, this class can be migrated to unit tests. To use ExpressionMatchTests: 1. Include it in "quassel.cpp" 2. Call ExpressionMatchTests::runTests() near end of Quassel::init() // DEBUG: Run tests! ExpressionMatchTests::runTests();
Handle "\n" and ";" as separator in scope rules. This fixes using newlines in the Configure Ignore Rule dialog. Make use of ExpressionMatch::trimMultiWildcardWhitespace() to handle all of the arcane details, unifying code into one place.
Port HighlightRule objects to ExpressionMatch class, providing easy caching and simplifying expression handling. Migrate HighlightRule struct into a full-blown class for easier management and greater assurance over automatic internal cache management. Port HighlightRuleManager to ExpressionMatch for nickname matching, providing easy caching and simplifying expression handling. (Noticing a theme?) Add tons of documentation comments, too, and fix up line lengths.
Port IgnoreListItem objects to ExpressionMatch class, providing easy caching and simplifying expression handling. Migrate IgnoreListItem struct into a full-blown class for easier management and greater assurance over automatic internal cache management. Add tons of documentation comments, too, and fix up line lengths. Thanks to @sandsmark for the initial efforts towards the QRegularExpression migration; it helped a lot!
When constructing a CTCP ignore, check that splitting on whitespace results in non-empty list before taking from the list. If it's empty, fall back to assuming any valid sender, i.e. "*"/".*" depending on whether the rule is set as wildcard or regex. This fixes issues when receiving a CTCP with an invalid CTCP ignore rule such as " ".
Port QtUiMessageProcessor HighlightRule objects to ExpressionMatch class, providing easy caching and simplifying expression handling. Migrate HighlightRule struct into a full-blown class for easier management and greater assurance over automatic internal cache management. Port QtUiMessageProcessor to ExpressionMatch for nickname matching, providing easy caching and simplifying expression handling. Add tons of documentation comments, too, and fix up line lengths. NOTE: Legacy highlight rules lack support for "sender", "ID", and inverse rules. There's no technical reason for this, just lack of developer time. Feel free to add support in the future! Just make sure to not miss any part - I'd suggest simply replacing the legacy LegacyHighlightRule class in QtUiMessageProcessor with the full one from HighlightRuleManager, and don't forget to fix > QtUiMessageProcessor::HighlightRule::operator!=() ...and... > QtUiMessageProcessor::HighlightRule::determineExpressions()
Remove scopeMatch() from util.cpp - not only is it no longer needed, superseded by ExpressionMatch, the scopeMatch behavior broke previous Quassel functionality. Details: Channel names can begin with "!", and scopeMatch() does not support escaping "!" in the beginning of rules. Escaping support needs to be added.. but ExpressionMatch can now handle this. Let us keep the ugly parser loops confined to one place...
Add NickHighlightMatcher class to unify handling of nick highlight matching in Quassel, including automatically updating the expression matcher instance as needed per network. Cached ExpressionMatch objects are updated on demand after any change in nickname configuration or active/configured nicks. This lays the foundation for performance and readibility improvements in future commits.
Port HighlightRuleManager nick highlights to NickHighlightMatcher class, providing easy caching and simplifying expression handling. This fixes nickname caching being reset when switching between networks. Add SIGNAL/SLOT traversal to pass on information about network removal to clean up per-network nickname highlight caches, avoiding memory leaks.
Port QtUiMessageProcessor nick highlights to NickHighlightMatcher class, providing easy caching and simplifying expression handling. This fixes nickname caching being reset when switching between networks. Add SIGNAL/SLOT traversal to pass on information about network removal to clean up per-network nickname highlight caches, avoiding memory leaks.
8475e2a
to
899ca3a
Compare
With the unifying changes in this pull request, I have been wondering how little (or much) work is left to give We'll see if/when this is merged, maybe. |
How do I use this?
Check out the Quassel wiki page on pattern matching.
Added since this has been merged.
In short
ExpressionMatch
class, unifying regular expression handlingPhrase
,MultiPhrase
,Wildcard
,MultiWildcard
, andRegEx
patterns?
,*
,!
, and;
inWildcard
/MultiWildcard
ExpressionMatchTests
NickHighlightMatcher
class, unifying and caching nickname highlightsExpressionMatch
for all,NickHighlightMatcher
for highlightsscopeMatch()
function fromutil.cpp
*
,!
, etcThanks to @sandsmark for the initial
QRegularExpression
efforts!Rationale
Quassel processes potentially hundreds of messages at a time to apply highlight and ignore rules, some retroactively (ignore rules, local highlights). To improve performance, Quassel should compile down regular expressions to the simplest format, and cache them for reuse until changed. Similarly, on Qt 5, porting to
QRegularExpression
provides further speedups.Some IRC channels can begin with
!
. Quassel scope matching needs a way to escape!
to match it literally, e.g. for!channel
.Spam messages can contain
*
and?
characters, such as**** ATTENTION ****
. Quassel should provide a way to escape these characters to match them literally.Breaking changes
Wildcard
no longer supports character classes, like[abc]
to matcha
,b
, orc
RegEx
mode and rewrite the ruleRegular expression
does not affect the scope (legacy reasons)\
inWildcard
rules must now be escaped\*
translates to literally matching*
,\?
to literal?
\\[...]
translates to literally matching\[...]
*
,?
, and;
!
at the start ofWildcard
rules now inverts the rule!*some message*
sets a rule for everything that does not containsome message
\![...]
escapes the!
to literally match![...]
\\![...]
escapes the\
to literally match\![...]
Implementation
ExpressionMatch
ExpressionMatch
implements an iterative parser, tracking state while looping through the list. This is necessary to provide escaping sequences forWildcard
andMultiWildcard
, akin to how Qt implemented their algorithm but dropping the character class support, and adding handling;
as a separator.ExpressionMatch
will take any input pattern and simplify it into a single regular expression, optimizing for speed.To avoid separating documentation from code, I've put most of the implementation remarks into the comments of the code itself.
It is not easily possible to implement escape sequences with simple string-to-regex substitution.
Test cases are implemented as opt-in
Q_ASSERT()
viaExpressionMatchTests::runTests()
. These are not run by default. A quick edit toquassel.cpp
will add them in for development tests:This is designed to be adaptable to a proper test framework whenever Quassel adopts one.
NickHighlightMatcher
NickHighlightMatcher
builds uponExpressionMatch
, using theMultiPhrase
mode, and stores nickname-based expression match objects per network.Whenever a nickname set changes, or a network is removed, the cache is deleted and built anew on demand.
Examples
Legend
0.12.5
stable release0.13rc1
Modes of operation
Nickname matching always uses
MatchMode::MultiPhrase
.Regular expressions
/RegEx
is NOT checkedMatchMode
for patternsPhrase
MultiWildcard
0.13rc1
Wildcard
MultiWildcard
Regular expressions
/RegEx
is checkedMatchMode
for patternsRegEx
RegEx
RegEx
MultiWildcard
Ignore rule scope is fixed to
MultiWildcard
for backwards compatibility.Phrase (
Phrase
)This is used for highlight rules when
RegEx
is not checked🏁 In stable
word
A word.
has a word in it
wording
🆕 New to this PR
There's a leading space; this can be used for more exact matches
, spaced
is spaced out
;spaced
Multiple phrases (
MultiPhrase
)This is used for nickname matching.
🏁 In stable
nick
nick[something]
othernick: ...
pinging test hi
nicks
tests
Quassel
0.12.5
also would matchnick[something]
; if this isn't desirable, it can be changed, albeit with some complexity, needing a newNickname
matching mode. Quassel currently assumes non-word characters are not valid in nicknames as per the\W
in phrase regular expression.Wildcard (
Wildcard
)This is used for ignore rule contents when
Regular expression
is not checked🆕 Changed in this PR
Asking questions? Nopea
Basking questions? Nope.
Asking questions. Nope.
Asking questions? Nopes.
🆕 New to this PR
Implicit wildcard is supported, too
filter
filtering
#nofilter yo
🆕 New to this PR
Exclamation points can be escaped at beginning (not required elsewhere)
!filter
!yes filtering
filter
🆕 New to this PR
Escape character can be escaped
\!filter
\!yes filtering
filter
!filter
Multiple wildcards (
MultiWildcard
)This is used for ignore rule scope regardless of
Regular expression
checkbox, and for highlight rule sender/channel whenRegEx
is not checked🆕 Changed in this PR
Newlines and
;
are interchangeable in ignore rule scope; highlight rules treat Enter as submitAlice![...]
Bob![...]@example.com
Carol[...]![...]
except as noted belowDan![...]
escaped ; separator
!not-inverted
\!slash-prefixed
Caroline![...]
Malice![...]
John!
☑️ In
0.13rc1
Implicit wildcard is supported, too
Announce[...]![...]
Wheatley!aperture@[...]
Regular expression (
RegEx
)This is used for ignore rule contents and highlight rule contents/sender/channel when
RegEx
is checked🏁 In stable
simpleA*escape-match
simpleA*escape-matchBBBB
not above
simpleA*escape-mat
simple*escape-match
simpleABBBBescape-matchBBBB
🏁 In stable
Inverted rules are supported, too
invertA*escape-match
invertA*escape-matchBBBB
🆕 Changed in this PR
Exclamation points can be escaped at beginning (not required elsewhere)
!simpleA*escape-matchBBBB
simpleA*escape-matchBBBB
🆕 New to this PR
Escape character can be escaped at beginning
\!simpleA*escape-matchBBBB
!simpleA*escape-matchBBBB
Though I checked, I may still have made mistakes in the examples above. If you want some samples to test with, try out the above.