New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fixes #3262 Bad word filter #3353

Merged
merged 6 commits into from Jul 27, 2018

Conversation

Projects
None yet
5 participants
@effone
Member

effone commented Jul 22, 2018

Attempt to fix #3262
Implemented effective usage of * and +

Note: This is gonna break all the pre-defined bad word filters. Its is required to describe in release note to redefine bad word filters as per the new rule after applying this patch.

Edit:
Here is a draft md that can be included in release notes.
https://github.com/effone/misc.drafts/blob/master/badword.md

@effone effone added the b:1.8 label Jul 22, 2018

@effone effone requested review from euantorano and Shade- Jul 22, 2018

@effone

This comment has been minimized.

Show comment
Hide comment
@effone

effone Jul 22, 2018

Member

Note: Need to revise infinity loop check after this being merged. Here:
https://github.com/mybb/mybb/blob/feature/admin/modules/config/badwords.php#L50

I am not placing that modification now because doing that this PR will invariably introduce conflict.

^ Done.

Member

effone commented Jul 22, 2018

Note: Need to revise infinity loop check after this being merged. Here:
https://github.com/mybb/mybb/blob/feature/admin/modules/config/badwords.php#L50

I am not placing that modification now because doing that this PR will invariably introduce conflict.

^ Done.

@effone effone removed the s:in-progress label Jul 24, 2018

@effone

This comment has been minimized.

Show comment
Hide comment
@effone

effone Jul 24, 2018

Member

@euantorano PR completed. Thanks for sequential merge.

Member

effone commented Jul 24, 2018

@euantorano PR completed. Thanks for sequential merge.

effone added some commits Jul 24, 2018

@euantorano

This comment has been minimized.

Show comment
Hide comment
@euantorano

euantorano Jul 24, 2018

Member

Great, will test this one ASAP.

Member

euantorano commented Jul 24, 2018

Great, will test this one ASAP.

@effone effone added s:in-progress and removed s:in-progress labels Jul 25, 2018

@Eldenroot

This comment has been minimized.

Show comment
Hide comment
@Eldenroot

Eldenroot Jul 26, 2018

Contributor

Tested, seems to be working fine, good job!

Contributor

Eldenroot commented Jul 26, 2018

Tested, seems to be working fine, good job!

@@ -650,18 +650,12 @@ function parse_badwords($message, $options=array())
$badword['replacement'] = "*****";
}
if($badword['regex'])

This comment has been minimized.

@Shade-

Shade- Jul 27, 2018

Contributor

The whole PR works as expected, however, I'm not sure this is the intended behavior. IMHO, a filter with Regex option turned off should replace things without using regexes; so, if you specify whatever123* and Regex is off, it should replace the word literally.

@Shade-

Shade- Jul 27, 2018

Contributor

The whole PR works as expected, however, I'm not sure this is the intended behavior. IMHO, a filter with Regex option turned off should replace things without using regexes; so, if you specify whatever123* and Regex is off, it should replace the word literally.

This comment has been minimized.

@effone

effone Jul 27, 2018

Member

We are doing regex with or without regex is on. If it is on we consider the declaration as regex and validate, then use (store), if it is off we create the regex pattern through new function and parse.

Hence turning regex off will generate a pattern for whatever123* as whatever[^\s\n]* and will surely catch whatever123.

snap

@effone

effone Jul 27, 2018

Member

We are doing regex with or without regex is on. If it is on we consider the declaration as regex and validate, then use (store), if it is off we create the regex pattern through new function and parse.

Hence turning regex off will generate a pattern for whatever123* as whatever[^\s\n]* and will surely catch whatever123.

snap

This comment has been minimized.

@Shade-

Shade- Jul 27, 2018

Contributor

Briefly tested that, and it doesn’t work. Even with regex turned off, it also replaces words containing the filtered word, which is wrong.

@Shade-

Shade- Jul 27, 2018

Contributor

Briefly tested that, and it doesn’t work. Even with regex turned off, it also replaces words containing the filtered word, which is wrong.

This comment has been minimized.

@effone

effone Jul 27, 2018

Member

I honestly am not getting the issue. Can you please state steps with what intended and what is happening?

@effone

effone Jul 27, 2018

Member

I honestly am not getting the issue. Can you please state steps with what intended and what is happening?

This comment has been minimized.

@Shade-

Shade- Jul 27, 2018

Contributor

Well, turns out I was testing in the wrong branch. It works as expected.

@Shade-

Shade- Jul 27, 2018

Contributor

Well, turns out I was testing in the wrong branch. It works as expected.

This comment has been minimized.

@effone

effone Jul 27, 2018

Member

Ha ha, thanks for the merge :D

@effone

effone Jul 27, 2018

Member

Ha ha, thanks for the merge :D

@Shade- Shade- merged commit 7741b4d into mybb:feature Jul 27, 2018

@effone effone deleted the effone:badword-filter branch Jul 27, 2018

@Eldenroot

This comment has been minimized.

Show comment
Hide comment
@Eldenroot

Eldenroot Jul 27, 2018

Contributor

@effone - please check the last comment - abusing wordfilter by url tags, this will need a new issue and PR in future

Contributor

Eldenroot commented Jul 27, 2018

@effone - please check the last comment - abusing wordfilter by url tags, this will need a new issue and PR in future

@effone

This comment has been minimized.

Show comment
Hide comment
@effone

effone Jul 27, 2018

Member

Noted.

Member

effone commented Jul 27, 2018

Noted.

@dvz

This comment has been minimized.

Show comment
Hide comment
@dvz

dvz Aug 16, 2018

Contributor

Issues:

Resolution:

Patch:
#3353

Impact:

  • Word as regex set to 'YES':

    • Effect: N/A. Onwards, words defined as REGEX by an admin will be checked for validity of the expression before it gets saved to confirm no warning or error at front end with invalid expression defined.
    • Corrective Action: N/A. Those who prefer to go a step further; can re-declare the existing REGEX patterns to be sure about their validity.
  • Word as regex set to 'NO':

    • Effect: The dynamic word filters (using symbols to catch unknown characters) already set by admins will not work as intended due to the logic change in symbol (*) usage. From now onwards the symbols '*' (any number of any character) and '+' (one number of any character) will be used efficiently. For example: *on* will catch 'congo', 'ontology' or 'moron'. However, on+ will catch 'one' it will not catch 'ton' or 'onion'. my++ will catch 'mybb' and 'myth', but will not catch 'mya','mummy' or 'mystery'.
    • Corrective Action: Admins are required to edit all the already existing dynamic word filters and define the word as per the new logic to achieve intended behavior.

Files Changed:

  • inc/class_parser.php
  • admin/modules/config/badwords.php
  • inc/languages/english/admin/config_badwords.lang.php
Contributor

dvz commented Aug 16, 2018

Issues:

Resolution:

Patch:
#3353

Impact:

  • Word as regex set to 'YES':

    • Effect: N/A. Onwards, words defined as REGEX by an admin will be checked for validity of the expression before it gets saved to confirm no warning or error at front end with invalid expression defined.
    • Corrective Action: N/A. Those who prefer to go a step further; can re-declare the existing REGEX patterns to be sure about their validity.
  • Word as regex set to 'NO':

    • Effect: The dynamic word filters (using symbols to catch unknown characters) already set by admins will not work as intended due to the logic change in symbol (*) usage. From now onwards the symbols '*' (any number of any character) and '+' (one number of any character) will be used efficiently. For example: *on* will catch 'congo', 'ontology' or 'moron'. However, on+ will catch 'one' it will not catch 'ton' or 'onion'. my++ will catch 'mybb' and 'myth', but will not catch 'mya','mummy' or 'mystery'.
    • Corrective Action: Admins are required to edit all the already existing dynamic word filters and define the word as per the new logic to achieve intended behavior.

Files Changed:

  • inc/class_parser.php
  • admin/modules/config/badwords.php
  • inc/languages/english/admin/config_badwords.lang.php
// Ensure we run the replacement enough times but not recursively (i.e. not while(preg_match..))
$message = preg_replace("#(^|\W)".$badword['badword']."(?=\W|$)#i", '\1'.$badword['replacement'], $message);

This comment has been minimized.

@dvz

dvz Aug 16, 2018

Contributor

removed boundaries for words make the new implementation filter strings that are part of longer words

@dvz

dvz Aug 16, 2018

Contributor

removed boundaries for words make the new implementation filter strings that are part of longer words

This comment has been minimized.

@effone

effone Aug 17, 2018

Member

I don't get what you say here.
Edit: Nvm. #3399

@effone

effone Aug 17, 2018

Member

I don't get what you say here.
Edit: Nvm. #3399

}
// Neutralize multiple adjacent wildcards and generate pattern
$ptrn = array('/[\*]{1}[\+]+/', '/[\+]+[\*]{1}/', '/[\*]+/');

This comment has been minimized.

@dvz

dvz Aug 16, 2018

Contributor

patterns like [\*]{1} can be simply written as \*

@dvz

dvz Aug 16, 2018

Contributor

patterns like [\*]{1} can be simply written as \*

@Eldenroot

This comment has been minimized.

Show comment
Hide comment
@Eldenroot

Eldenroot Aug 17, 2018

Contributor

Except that would be nice to open a new issue - bad filter ignores words placed between url tags etc. So it canbe abused easily.

Contributor

Eldenroot commented Aug 17, 2018

Except that would be nice to open a new issue - bad filter ignores words placed between url tags etc. So it canbe abused easily.

effone added a commit to effone/mybb that referenced this pull request Aug 17, 2018

dvz added a commit that referenced this pull request Aug 17, 2018

Bad word filter - extended refinement (#3399)
* Bad word filter - extended refinement

#3353 (comment)
#3353 (comment)

* Ungroup single special character
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment