Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fixes #3262 Bad word filter #3353

Merged
merged 6 commits into from Jul 27, 2018
Merged

Fixes #3262 Bad word filter #3353

merged 6 commits into from Jul 27, 2018

Conversation

@effone
Copy link
Member

effone commented Jul 22, 2018

Attempt to fix #3262
Implemented effective usage of * and +

Note: This is gonna break all the pre-defined bad word filters. Its is required to describe in release note to redefine bad word filters as per the new rule after applying this patch.

Edit:
Here is a draft md that can be included in release notes.
https://github.com/effone/misc.drafts/blob/master/badword.md

@effone effone added the b:1.8 label Jul 22, 2018
@effone effone requested review from euantorano and Shade- Jul 22, 2018
@effone
Copy link
Member Author

effone commented Jul 22, 2018

Note: Need to revise infinity loop check after this being merged. Here:
https://github.com/mybb/mybb/blob/feature/admin/modules/config/badwords.php#L50

I am not placing that modification now because doing that this PR will invariably introduce conflict.

^ Done.

@effone effone removed the s:in-progress label Jul 24, 2018
@effone
Copy link
Member Author

effone commented Jul 24, 2018

@euantorano PR completed. Thanks for sequential merge.

effone added 2 commits Jul 24, 2018
@euantorano
Copy link
Member

euantorano commented Jul 24, 2018

Great, will test this one ASAP.

@effone effone added s:in-progress and removed s:in-progress labels Jul 25, 2018
@Eldenroot
Copy link
Contributor

Eldenroot commented Jul 26, 2018

Tested, seems to be working fine, good job!

@@ -650,18 +650,12 @@ function parse_badwords($message, $options=array())
$badword['replacement'] = "*****";
}

if($badword['regex'])

This comment has been minimized.

@Shade-

Shade- Jul 27, 2018 Contributor

The whole PR works as expected, however, I'm not sure this is the intended behavior. IMHO, a filter with Regex option turned off should replace things without using regexes; so, if you specify whatever123* and Regex is off, it should replace the word literally.

This comment has been minimized.

@effone

effone Jul 27, 2018 Author Member

We are doing regex with or without regex is on. If it is on we consider the declaration as regex and validate, then use (store), if it is off we create the regex pattern through new function and parse.

Hence turning regex off will generate a pattern for whatever123* as whatever[^\s\n]* and will surely catch whatever123.

snap

This comment has been minimized.

@Shade-

Shade- Jul 27, 2018 Contributor

Briefly tested that, and it doesn’t work. Even with regex turned off, it also replaces words containing the filtered word, which is wrong.

This comment has been minimized.

@effone

effone Jul 27, 2018 Author Member

I honestly am not getting the issue. Can you please state steps with what intended and what is happening?

This comment has been minimized.

@Shade-

Shade- Jul 27, 2018 Contributor

Well, turns out I was testing in the wrong branch. It works as expected.

This comment has been minimized.

@effone

effone Jul 27, 2018 Author Member

Ha ha, thanks for the merge :D

@Shade- Shade- merged commit 7741b4d into mybb:feature Jul 27, 2018
@effone effone deleted the effone:badword-filter branch Jul 27, 2018
@Eldenroot
Copy link
Contributor

Eldenroot commented Jul 27, 2018

@effone - please check the last comment - abusing wordfilter by url tags, this will need a new issue and PR in future

@effone
Copy link
Member Author

effone commented Jul 27, 2018

Noted.

@dvz
Copy link
Contributor

dvz commented Aug 16, 2018

Issues:

Resolution:

Patch:
#3353

Impact:

  • Word as regex set to 'YES':

    • Effect: N/A. Onwards, words defined as REGEX by an admin will be checked for validity of the expression before it gets saved to confirm no warning or error at front end with invalid expression defined.
    • Corrective Action: N/A. Those who prefer to go a step further; can re-declare the existing REGEX patterns to be sure about their validity.
  • Word as regex set to 'NO':

    • Effect: The dynamic word filters (using symbols to catch unknown characters) already set by admins will not work as intended due to the logic change in symbol (*) usage. From now onwards the symbols '*' (any number of any character) and '+' (one number of any character) will be used efficiently. For example: *on* will catch 'congo', 'ontology' or 'moron'. However, on+ will catch 'one' it will not catch 'ton' or 'onion'. my++ will catch 'mybb' and 'myth', but will not catch 'mya','mummy' or 'mystery'.
    • Corrective Action: Admins are required to edit all the already existing dynamic word filters and define the word as per the new logic to achieve intended behavior.

Files Changed:

  • inc/class_parser.php
  • admin/modules/config/badwords.php
  • inc/languages/english/admin/config_badwords.lang.php

// Ensure we run the replacement enough times but not recursively (i.e. not while(preg_match..))
$message = preg_replace("#(^|\W)".$badword['badword']."(?=\W|$)#i", '\1'.$badword['replacement'], $message);

This comment has been minimized.

@dvz

dvz Aug 16, 2018 Contributor

removed boundaries for words make the new implementation filter strings that are part of longer words

This comment has been minimized.

@effone

effone Aug 17, 2018 Author Member

I don't get what you say here.
Edit: Nvm. #3399

}

// Neutralize multiple adjacent wildcards and generate pattern
$ptrn = array('/[\*]{1}[\+]+/', '/[\+]+[\*]{1}/', '/[\*]+/');

This comment has been minimized.

@dvz

dvz Aug 16, 2018 Contributor

patterns like [\*]{1} can be simply written as \*

@Eldenroot
Copy link
Contributor

Eldenroot commented Aug 17, 2018

Except that would be nice to open a new issue - bad filter ignores words placed between url tags etc. So it canbe abused easily.

dvz added a commit that referenced this pull request Aug 17, 2018
* Bad word filter - extended refinement

#3353 (comment)
#3353 (comment)

* Ungroup single special character
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Linked issues

Successfully merging this pull request may close these issues.

5 participants
You can’t perform that action at this time.