Skip to content

HTTPS clone URL

Subversion checkout URL

You can clone with
or
.
Download ZIP
Browse files

improve de-HTML5ization

  • Loading branch information...
commit c8b0af6403b54442f6ff37b35a6238967bef016a 1 parent 1561761
@mislav authored
Showing with 9 additions and 9 deletions.
  1. +9 −9 dehtml5.rb
View
18 dehtml5.rb
@@ -1,16 +1,16 @@
require 'nokogiri'
doc = Nokogiri::HTML ARGF
-doc.search('article, section, figure, figcaption').reverse.each do |elem|
- type = elem.name
- type = 'caption' if type == 'figcaption'
- elem.name = 'div'
+doc.search('article, section, figure, figcaption, hgroup, mark').reverse.each do |elem|
+ type = elem.name
+ type = 'caption' if type == 'figcaption'
+ elem.name = type == 'mark' ? 'span' : 'div'
- classnames = elem['class'].to_s.lstrip.split(/\s+/)
- unless classnames.include? type
- classnames << type
- elem['class'] = classnames.join(' ')
- end
+ classnames = elem['class'].to_s.lstrip.split(/\s+/)
+ unless classnames.include? type
+ classnames << type
+ elem['class'] = classnames.join(' ')
+ end
end
puts doc
Please sign in to comment.
Something went wrong with that request. Please try again.