Skip to content

Commit

Permalink
fixed a serious bug that caused filter_without_english_characters to …
Browse files Browse the repository at this point in the history
…affect all probers that came after the prober that called it first
  • Loading branch information
Jeff Hodges committed Jul 6, 2007
1 parent 2f2d4a3 commit 3688fc9
Showing 1 changed file with 9 additions and 4 deletions.
13 changes: 9 additions & 4 deletions lib/rchardet/charsetprober.rb
Expand Up @@ -53,13 +53,18 @@ def get_confidence
end

def filter_high_bit_only(aBuf)
aBuf.gsub!(/([\x00-\x7F])+/, ' ')
return aBuf
# DO NOT USE `gsub!`
# It will remove all characters from the buffer that is later used by
# other probers. This is because gsub! removes data from the instance variable
# that will be passed to later probers, while gsub makes a new instance variable
# that will not.
newBuf = aBuf.gsub(/([\x00-\x7F])+/, ' ')
return newBuf
end

def filter_without_english_letters(aBuf)
aBuf.gsub!(/([A-Za-z])+/,' ')
return aBuf
newBuf = aBuf.gsub(/([A-Za-z])+/,' ')
return newBuf
end

def filter_with_english_letters(aBuf)
Expand Down

0 comments on commit 3688fc9

Please sign in to comment.