Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fixes #3262 Bad word filter #3353

Merged
merged 6 commits into from
Jul 27, 2018
Merged

Fixes #3262 Bad word filter #3353

merged 6 commits into from
Jul 27, 2018

Conversation

effone
Copy link
Member

@effone effone commented Jul 22, 2018

Attempt to fix #3262
Implemented effective usage of * and +

Note: This is gonna break all the pre-defined bad word filters. Its is required to describe in release note to redefine bad word filters as per the new rule after applying this patch.

Edit:
Here is a draft md that can be included in release notes.
https://github.com/effone/misc.drafts/blob/master/badword.md

@effone effone added the b:1.8 Branch: 1.8.x label Jul 22, 2018
@effone effone requested review from euantorano and Shade- July 22, 2018 03:35
@effone
Copy link
Member Author

effone commented Jul 22, 2018

Note: Need to revise infinity loop check after this being merged. Here:
https://github.com/mybb/mybb/blob/feature/admin/modules/config/badwords.php#L50

I am not placing that modification now because doing that this PR will invariably introduce conflict.

^ Done.

@effone effone added the s:in-progress Status: In Progress. Some work completed label Jul 24, 2018
@effone effone removed the s:in-progress Status: In Progress. Some work completed label Jul 24, 2018
@effone
Copy link
Member Author

effone commented Jul 24, 2018

@euantorano PR completed. Thanks for sequential merge.

@euantorano
Copy link
Member

Great, will test this one ASAP.

@effone effone added s:in-progress Status: In Progress. Some work completed and removed s:in-progress Status: In Progress. Some work completed labels Jul 25, 2018
@Eldenroot
Copy link
Contributor

Tested, seems to be working fine, good job!

@@ -650,18 +650,12 @@ function parse_badwords($message, $options=array())
$badword['replacement'] = "*****";
}

if($badword['regex'])
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The whole PR works as expected, however, I'm not sure this is the intended behavior. IMHO, a filter with Regex option turned off should replace things without using regexes; so, if you specify whatever123* and Regex is off, it should replace the word literally.

Copy link
Member Author

@effone effone Jul 27, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We are doing regex with or without regex is on. If it is on we consider the declaration as regex and validate, then use (store), if it is off we create the regex pattern through new function and parse.

Hence turning regex off will generate a pattern for whatever123* as whatever[^\s\n]* and will surely catch whatever123.

snap

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Briefly tested that, and it doesn’t work. Even with regex turned off, it also replaces words containing the filtered word, which is wrong.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I honestly am not getting the issue. Can you please state steps with what intended and what is happening?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well, turns out I was testing in the wrong branch. It works as expected.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ha ha, thanks for the merge :D

@Shade- Shade- merged commit 7741b4d into mybb:feature Jul 27, 2018
@effone effone deleted the badword-filter branch July 27, 2018 10:48
@Eldenroot
Copy link
Contributor

@effone - please check the last comment - abusing wordfilter by url tags, this will need a new issue and PR in future

@effone
Copy link
Member Author

effone commented Jul 27, 2018

Noted.

@dvz
Copy link
Member

dvz commented Aug 16, 2018

Issues:

Resolution:

Patch:
#3353

Impact:

  • Word as regex set to 'YES':

    • Effect: N/A. Onwards, words defined as REGEX by an admin will be checked for validity of the expression before it gets saved to confirm no warning or error at front end with invalid expression defined.
    • Corrective Action: N/A. Those who prefer to go a step further; can re-declare the existing REGEX patterns to be sure about their validity.
  • Word as regex set to 'NO':

    • Effect: The dynamic word filters (using symbols to catch unknown characters) already set by admins will not work as intended due to the logic change in symbol (*) usage. From now onwards the symbols '*' (any number of any character) and '+' (one number of any character) will be used efficiently. For example: *on* will catch 'congo', 'ontology' or 'moron'. However, on+ will catch 'one' it will not catch 'ton' or 'onion'. my++ will catch 'mybb' and 'myth', but will not catch 'mya','mummy' or 'mystery'.
    • Corrective Action: Admins are required to edit all the already existing dynamic word filters and define the word as per the new logic to achieve intended behavior.

Files Changed:

  • inc/class_parser.php
  • admin/modules/config/badwords.php
  • inc/languages/english/admin/config_badwords.lang.php


// Ensure we run the replacement enough times but not recursively (i.e. not while(preg_match..))
$message = preg_replace("#(^|\W)".$badword['badword']."(?=\W|$)#i", '\1'.$badword['replacement'], $message);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

removed boundaries for words make the new implementation filter strings that are part of longer words

Copy link
Member Author

@effone effone Aug 17, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't get what you say here.
Edit: Nvm. #3399

}

// Neutralize multiple adjacent wildcards and generate pattern
$ptrn = array('/[\*]{1}[\+]+/', '/[\+]+[\*]{1}/', '/[\*]+/');
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

patterns like [\*]{1} can be simply written as \*

@Eldenroot
Copy link
Contributor

Except that would be nice to open a new issue - bad filter ignores words placed between url tags etc. So it canbe abused easily.

dvz pushed a commit that referenced this pull request Aug 17, 2018
* Bad word filter - extended refinement

#3353 (comment)
#3353 (comment)

* Ungroup single special character
lairdshaw pushed a commit to lairdshaw/mybb that referenced this pull request Oct 11, 2021
[Rebased for 1.9 by Laird]

* bad word filter

* spelling mistake in comment

* new parser method, regex loop check

* Cleanup

* var rev
lairdshaw pushed a commit to lairdshaw/mybb that referenced this pull request Oct 11, 2021
[Rebased for 1.9 by Laird]

* Bad word filter - extended refinement

mybb#3353 (comment)
mybb#3353 (comment)

* Ungroup single special character
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
b:1.8 Branch: 1.8.x
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Bad word filter
5 participants