Skip to content

Commit

Permalink
Initial version
Browse files Browse the repository at this point in the history
  • Loading branch information
nono committed Apr 3, 2011
0 parents commit 3ca492e
Show file tree
Hide file tree
Showing 8 changed files with 194 additions and 0 deletions.
3 changes: 3 additions & 0 deletions .gitignore
Original file line number Diff line number Diff line change
@@ -0,0 +1,3 @@
.bundle
Gemfile.lock
pkg
1 change: 1 addition & 0 deletions .rspec
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
--color
1 change: 1 addition & 0 deletions Gemfile
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
gemspec
20 changes: 20 additions & 0 deletions MIT-LICENSE
Original file line number Diff line number Diff line change
@@ -0,0 +1,20 @@
Copyright (c) 2011 Bruno Michel

Permission is hereby granted, free of charge, to any person obtaining
a copy of this software and associated documentation files (the
"Software"), to deal in the Software without restriction, including
without limitation the rights to use, copy, modify, merge, publish,
distribute, sublicense, and/or sell copies of the Software, and to
permit persons to whom the Software is furnished to do so, subject to
the following conditions:

The above copyright notice and this permission notice shall be
included in all copies or substantial portions of the Software.

THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND,
EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF
MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND
NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE
LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION
OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION
WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.
55 changes: 55 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,55 @@
HTML Spellchecker
=================

Wants to spellcheck an HTML string properly? This gem is for you.
It's powered by [Nokogiri](http://nokogiri.org/) and
[hunspell-ffi](https://github.com/ahaller/hunspell-ffi)!


How to use it
-------------

It's very simple. Install it with rubygems:

gem install html_spellchecker

Or, if you use bundler, add it to your `Gemfile`:

gem "html_spellchecker", :version => "~>0.1"

Then you can use it in your code:

require "html_spellchecker"
HTML_Spellchecker.english.spellcheck("<p>This is xzqwy.</p>")
# => "<p>This is <mark class="misspelled">xzqwy</mark>.</p>"

The HTML_Spellchecker class can be initialized by giving 2 paths:
the affinity and dictionnary for hunspell. There are helpers to
create a new instance for english and french dictionnaries.

Then, you can use `spellcheck` method: you give it an HTML string
and it returns you with the same string with misspelled words
enclosed in `<mark>` tags (with the `misspelled` class).

HTML_Spellchecker can avoid to check the spelling of special tags
like `<code>`, by keeping a list of the tags to spellcheck in
`HTML_Spellchecker.spellcheckable_tags`.


Issues or Suggestions
---------------------

Found an issue or have a suggestion? Please report it on
[Github's issue tracker](http://github.com/nono/HTML-Spellchecker/issues).

If you wants to make a pull request, please check the specs before:

rspec spec


Credits
-------

Thanks [Andreas Haller](https://github.com/ahaller) for the hunspell-ffi gem.

Copyright (c) 2011 Bruno Michel <bmichel@menfin.info>, released under the MIT license
17 changes: 17 additions & 0 deletions html_spellchecker.gemspec
Original file line number Diff line number Diff line change
@@ -0,0 +1,17 @@
Gem::Specification.new do |s|
s.name = "html_spellchecker"
s.version = "0.1.0"
s.date = Time.now.utc.strftime("%Y-%m-%d")
s.homepage = "http://github.com/nono/HTML-Spellchecker"
s.authors = "Bruno Michel"
s.email = "bmichel@menfin.info"
s.description = "Wants to spellcheck an HTML string properly? This gem is for you."
s.summary = "Wants to spellcheck an HTML string properly? This gem is for you."
s.extra_rdoc_files = %w(README.md)
s.files = Dir["MIT-LICENSE", "README.md", "Gemfile", "lib/**/*.rb"]
s.require_paths = ["lib"]
s.rubygems_version = %q{1.3.7}
s.add_dependency "nokogiri", "~>1.4"
s.add_dependency "hunspell-ffi", "=0.1.3.alpha2"
s.add_development_dependency "rspec", "~>2.4"
end
62 changes: 62 additions & 0 deletions lib/html_spellchecker.rb
Original file line number Diff line number Diff line change
@@ -0,0 +1,62 @@
# Encoding: UTF-8

require "hunspell-ffi"
require "nokogiri"
require "set"


class HTML_Spellchecker
def self.english
@english ||= self.new("/usr/share/hunspell/en_US.aff", "/usr/share/hunspell/en_US.dic")
end

def self.french
@french ||= self.new("/usr/share/hunspell/fr_FR.aff", "/usr/share/hunspell/fr_FR.dic")
end

def initialize(aff, dic)
@dict = Hunspell.new(aff, dic)
end

def spellcheck(html)
Nokogiri::HTML::DocumentFragment.parse(html).spellcheck(@dict)
end

class <<self
attr_accessor :spellcheckable_tags
end
self.spellcheckable_tags = Set.new(%w(p ol ul li div header article nav section footer aside dd dt dl
span blockquote cite q mark ins del table td th tr tbody thead tfoot
a b i s em small strong hgroup h1 h2 h3 h4 h5 h6))
end

class Nokogiri::HTML::DocumentFragment
def spellcheckable?
true
end
end

class Nokogiri::XML::Node
def spellcheck(dict)
if spellcheckable?
inner = children.map {|child| child.spellcheck(dict) }.join
children.remove
add_child Nokogiri::HTML::DocumentFragment.parse(inner)
end
to_html(:indent => 0)
end

def spellcheckable?
HTML_Spellchecker.spellcheckable_tags.include? name
end
end

class Nokogiri::XML::Text
WORDS_REGEXP = RUBY_VERSION =~ /^1\.8/ ? /\w+/ : /\p{Word}+/

def spellcheck(dict)
to_xhtml(:encoding => 'UTF-8').gsub(WORDS_REGEXP) do |word|
dict.check(word) ? word : "<mark class=\"misspelled\">#{word}</mark>"
end
end
end
35 changes: 35 additions & 0 deletions spec/html_spellchecker_spec.rb
Original file line number Diff line number Diff line change
@@ -0,0 +1,35 @@
# encoding: UTF-8
path = File.expand_path(File.dirname(__FILE__) + "/../lib/")
$LOAD_PATH.unshift(path) unless $LOAD_PATH.include?(path)
require "html_spellchecker"


describe HTML_Spellchecker do
let(:checker) { HTML_Spellchecker.english }

it "doesn't modify correct sentences" do
correct = "<p>This is a sentence with correct words.</p>"
checker.spellcheck(correct).should == correct
end

it "marks spelling errors" do
incorrect = "<p>xzqwy is not a word!</p>"
checker.spellcheck(incorrect) == "<p><mark class=\"misspelled\">xzqwy</span> is not a word!</p>"
end

it "doesn't try to spellcheck code tags" do
txt = "<code>class Foo\ndef hello\nputs 'Hi'\nend\nend</code>"
checker.spellcheck(txt).should == txt
end

it "can use different dictionnaries" do
french_text = "<p>Ceci est un texte correct, mais xzqwy n'est pas un mot</p>"
expected = french_text.gsub('xzqwy', '<mark class="misspelled">xzqwy</mark>')
HTML_Spellchecker.french.spellcheck(french_text).should == expected
end

it "can spellcheck nested tags" do
txt = "<p>This is <strong>Important and <em>xzqwy</em></strong>!</p>"
checker.spellcheck(txt).should == txt.gsub('xzqwy', '<mark class="misspelled">xzqwy</mark>')
end
end

0 comments on commit 3ca492e

Please sign in to comment.