forked from moowahaha/despamilator
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
added new filter, put count and remove methods into text (what a clas…
…s!!)
- Loading branch information
Stephen Hardisty
committed
Sep 1, 2011
1 parent
9af88e0
commit ba2dd1f
Showing
13 changed files
with
92 additions
and
28 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,38 @@ | ||
require 'despamilator/filter' | ||
|
||
module DespamilatorFilter | ||
|
||
class WeirdPunctuation < Despamilator::Filter | ||
|
||
def name | ||
'Weird Punctuation' | ||
end | ||
|
||
def description | ||
'Detects unusual use of punctuation.' | ||
end | ||
|
||
def parse subject | ||
text = subject.text.without_uris | ||
text.gsub!(/\w&\w/, '') | ||
matches = text.remove_and_count!(/(?:\W|\s|^)(#{punctuation})/) | ||
matches += text.remove_and_count!(/(#{punctuation})(#{punctuation})/) | ||
matches += text.remove_and_count!(/(#{punctuation})$/) | ||
matches += text.remove_and_count!(/(?:\W|\s|^)\d+(#{punctuation})/) | ||
|
||
subject.register_match!({:score => 0.015 * matches, :filter => self}) if matches > 0 | ||
end | ||
|
||
private | ||
|
||
def punctuation | ||
@punctuation ||= %w{~ ` ! @ # $ % ^ & * _ - + = , / ? | \\ : ; ' "}.map do |punctuation_character| | ||
Regexp.escape(punctuation_character) | ||
end.join('|') | ||
|
||
@punctuation | ||
end | ||
|
||
end | ||
|
||
end |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,18 @@ | ||
describe DespamilatorFilter::WeirdPunctuation do | ||
|
||
the_name_should_be 'Weird Punctuation' | ||
the_description_should_be 'Detects unusual use of punctuation.' | ||
|
||
despamilator_should_apply_the_filter_for('^this^') | ||
|
||
a_single_match_of('>', should_score: 0.015) | ||
a_multiple_match_of('%D :-D >:-[ 123, l 89.', should_score: 0.075) | ||
|
||
it 'should ignore weird punctuation in urls' do | ||
parsing('http://www.blah.com?x=1&y=z').should have_score(0) | ||
end | ||
|
||
it 'should ignore ampersands surrounded by letters' do | ||
parsing('j&r').should have_score(0) | ||
end | ||
end |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters