Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

UTF-8 added words not being detected #7

Closed
Priler opened this issue Jan 16, 2023 · 3 comments
Closed

UTF-8 added words not being detected #7

Priler opened this issue Jan 16, 2023 · 3 comments
Assignees
Labels
bug Something isn't working

Comments

@Priler
Copy link

Priler commented Jan 16, 2023

The following code does not work

use rustrict::{CensorStr, Type};
use rustrict::add_word;

fn main() {
    #[cfg(feature = "customize")]
    {
        unsafe {
            add_word("плохоеслово", Type::PROFANE & Type::SEVERE);
        }
    }

    let inappropriate = "hello плохоеслово".is_inappropriate();
    println!("{}", inappropriate); // false
}

Same with English chars does work

use rustrict::{CensorStr, Type};
use rustrict::add_word;

fn main() {
    #[cfg(feature = "customize")]
    {
        unsafe {
            add_word("badword", Type::PROFANE & Type::SEVERE);
        }
    }

    let inappropriate = "hello badword".is_inappropriate();
    println!("{}", inappropriate); // true
}

Also, is there a way to massively add new words?
Or maybe somehow extend the default one.

Context

I am using latest rustrict version (0.5.10).

@Priler Priler added the bug Something isn't working label Jan 16, 2023
@finnbear
Copy link
Owner

Thanks for the issue! I've changed the character replacement strategy to allow matching certain non-ASCII characters. Your example now works in version 0.5.11.

Also, is there a way to massively add new words?

You can call add_word as many times as you want. #6 did ask for a more ergonomic API, and it is something I'm considering :)

@Priler
Copy link
Author

Priler commented Jan 16, 2023

Thanks, I can confirm that everything works fine now in 0.5.11.

About the add_word, is there will be any performance impact if I call it, let's say ... 10 thousand times?
I just want to use rustrict in my telegram chat bot as a profanity filter module.
And I need some kind of easy-to-use way of extending the default dictionary.

@finnbear
Copy link
Owner

finnbear commented Jan 16, 2023

About the add_word, is there will be any performance impact if I call it, let's say ... 10 thousand times?

There are two parts to the performance impact:

  • Calling add_word that many times will add a small, possibly negligible delay at startup
  • Future filter performance will be slightly worse (I'm guessing around 25% slower), because using more memory means a smaller proportion will fit in the CPU cache. It will not be 10,000 times slower than if there was only 1 word (the filter never iterates all words in the wordlist).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants