Skip to content

jinrunheng/sensitive-words-filter

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

16 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

sensitive-words-filter

这是一个使用 Java 语言编写的敏感词过滤器

敏感词库

敏感词库拷贝自:https://github.com/lyy720301/Sensitive-word

Maven

<dependency>
  <groupId>io.github.jinrunheng</groupId>
  <artifactId>sensitive-words-filter</artifactId>
  <version>0.0.1</version>
</dependency>

项目原理

前缀树(Trie)

image-20210611002057898

使用

初始化

TrieFilter trieFilter = new TrieFilter();

加载敏感词库文件

文件格式要求,每个敏感词占一行

aaa
bbb
ccc
...
trieFilter.batchAdd(String filePath);

trieFilter 支持添加,删除敏感词,以及判断某个敏感词是否在当前构建的敏感词字典中

@Test
public void testTrieFilterBasicMethod(){
    TrieFilter trieFilter = new TrieFilter();
    Assertions.assertFalse(trieFilter.exist("test"));
    trieFilter.put("test");
    Assertions.assertTrue(trieFilter.exist("test"));
    trieFilter.remove("test");
    Assertions.assertFalse(trieFilter.exist("test"));
}

过滤敏感词

@Test
public void testFilterMethod() {
    TrieFilter trieFilter = new TrieFilter();
    trieFilter.put("abc");
    trieFilter.put("bf");
    trieFilter.put("be");
    trieFilter.put("faf");
    String sentence = "xwabfabcfaf";
    String filteredSentence = trieFilter.filter(sentence, '*');
    Assertions.assertEquals(filteredSentence, "xwa********");
}

敏感词过滤器可以有效屏蔽干扰符号

@Test
public void testSentenceWithSymbol() {
    TrieFilter trieFilter = new TrieFilter();
    trieFilter.put("abcabc");
    String sentence = "^a^^^b/c&ab&&c^";
    String filteredSentence = trieFilter.filter(sentence, '*');
    Assertions.assertEquals(filteredSentence, "^*************^");
}

版权信息

Apache License 2.0

About

This is a Chinese sensitive words filter implemented in Java

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages