Skip to content
Newer
Older
100644 483 lines (435 sloc) 16 KB
82d0251 @imathis improved starting point
imathis authored Oct 18, 2009
1 #
2 # = RubyPants - SmartyPants ported to Ruby
3 #
4 # Ported by Christian Neukirchen <mailto:chneukirchen@gmail.com>
5 # Copyright (C) 2004 Christian Neukirchen
6 #
7 # Incooporates ideas, comments and documentation by Chad Miller
8 # Copyright (C) 2004 Chad Miller
9 #
10 # Original SmartyPants by John Gruber
11 # Copyright (C) 2003 John Gruber
12 #
13
14 #
15 # = RubyPants - SmartyPants ported to Ruby
16 #
17 # == Synopsis
18 #
19 # RubyPants is a Ruby port of the smart-quotes library SmartyPants.
20 #
21 # The original "SmartyPants" is a free web publishing plug-in for
22 # Movable Type, Blosxom, and BBEdit that easily translates plain ASCII
23 # punctuation characters into "smart" typographic punctuation HTML
24 # entities.
25 #
26 #
27 # == Description
28 #
29 # RubyPants can perform the following transformations:
30 #
31 # * Straight quotes (<tt>"</tt> and <tt>'</tt>) into "curly" quote
32 # HTML entities
33 # * Backticks-style quotes (<tt>``like this''</tt>) into "curly" quote
34 # HTML entities
35 # * Dashes (<tt>--</tt> and <tt>---</tt>) into en- and em-dash
36 # entities
37 # * Three consecutive dots (<tt>...</tt> or <tt>. . .</tt>) into an
38 # ellipsis entity
39 #
40 # This means you can write, edit, and save your posts using plain old
41 # ASCII straight quotes, plain dashes, and plain dots, but your
42 # published posts (and final HTML output) will appear with smart
43 # quotes, em-dashes, and proper ellipses.
44 #
45 # RubyPants does not modify characters within <tt><pre></tt>,
46 # <tt><code></tt>, <tt><kbd></tt>, <tt><math></tt> or
47 # <tt><script></tt> tag blocks. Typically, these tags are used to
48 # display text where smart quotes and other "smart punctuation" would
49 # not be appropriate, such as source code or example markup.
50 #
51 #
52 # == Backslash Escapes
53 #
54 # If you need to use literal straight quotes (or plain hyphens and
55 # periods), RubyPants accepts the following backslash escape sequences
56 # to force non-smart punctuation. It does so by transforming the
57 # escape sequence into a decimal-encoded HTML entity:
58 #
59 # \\ \" \' \. \- \`
60 #
61 # This is useful, for example, when you want to use straight quotes as
62 # foot and inch marks: 6'2" tall; a 17" iMac. (Use <tt>6\'2\"</tt>
63 # resp. <tt>17\"</tt>.)
64 #
65 #
66 # == Algorithmic Shortcomings
67 #
68 # One situation in which quotes will get curled the wrong way is when
69 # apostrophes are used at the start of leading contractions. For
70 # example:
71 #
72 # 'Twas the night before Christmas.
73 #
74 # In the case above, RubyPants will turn the apostrophe into an
75 # opening single-quote, when in fact it should be a closing one. I
76 # don't think this problem can be solved in the general case--every
77 # word processor I've tried gets this wrong as well. In such cases,
78 # it's best to use the proper HTML entity for closing single-quotes
79 # (``&#8217;``) by hand.
80 #
81 #
82 # == Bugs
83 #
84 # To file bug reports or feature requests (except see above) please
85 # send email to: mailto:chneukirchen@gmail.com
86 #
87 # If the bug involves quotes being curled the wrong way, please send
88 # example text to illustrate.
89 #
90 #
91 # == Authors
92 #
93 # John Gruber did all of the hard work of writing this software in
94 # Perl for Movable Type and almost all of this useful documentation.
95 # Chad Miller ported it to Python to use with Pyblosxom.
96 #
97 # Christian Neukirchen provided the Ruby port, as a general-purpose
98 # library that follows the *Cloth api.
99 #
100 #
101 # == Copyright and License
102 #
103 # === SmartyPants license:
104 #
105 # Copyright (c) 2003 John Gruber
106 # (http://daringfireball.net)
107 # All rights reserved.
108 #
109 # Redistribution and use in source and binary forms, with or without
110 # modification, are permitted provided that the following conditions
111 # are met:
112 #
113 # * Redistributions of source code must retain the above copyright
114 # notice, this list of conditions and the following disclaimer.
115 #
116 # * Redistributions in binary form must reproduce the above copyright
117 # notice, this list of conditions and the following disclaimer in
118 # the documentation and/or other materials provided with the
119 # distribution.
120 #
121 # * Neither the name "SmartyPants" nor the names of its contributors
122 # may be used to endorse or promote products derived from this
123 # software without specific prior written permission.
124 #
125 # This software is provided by the copyright holders and contributors
126 # "as is" and any express or implied warranties, including, but not
127 # limited to, the implied warranties of merchantability and fitness
128 # for a particular purpose are disclaimed. In no event shall the
129 # copyright owner or contributors be liable for any direct, indirect,
130 # incidental, special, exemplary, or consequential damages (including,
131 # but not limited to, procurement of substitute goods or services;
132 # loss of use, data, or profits; or business interruption) however
133 # caused and on any theory of liability, whether in contract, strict
134 # liability, or tort (including negligence or otherwise) arising in
135 # any way out of the use of this software, even if advised of the
136 # possibility of such damage.
137 #
138 # === RubyPants license
139 #
140 # RubyPants is a derivative work of SmartyPants and smartypants.py.
141 #
142 # Redistribution and use in source and binary forms, with or without
143 # modification, are permitted provided that the following conditions
144 # are met:
145 #
146 # * Redistributions of source code must retain the above copyright
147 # notice, this list of conditions and the following disclaimer.
148 #
149 # * Redistributions in binary form must reproduce the above copyright
150 # notice, this list of conditions and the following disclaimer in
151 # the documentation and/or other materials provided with the
152 # distribution.
153 #
154 # This software is provided by the copyright holders and contributors
155 # "as is" and any express or implied warranties, including, but not
156 # limited to, the implied warranties of merchantability and fitness
157 # for a particular purpose are disclaimed. In no event shall the
158 # copyright owner or contributors be liable for any direct, indirect,
159 # incidental, special, exemplary, or consequential damages (including,
160 # but not limited to, procurement of substitute goods or services;
161 # loss of use, data, or profits; or business interruption) however
162 # caused and on any theory of liability, whether in contract, strict
163 # liability, or tort (including negligence or otherwise) arising in
164 # any way out of the use of this software, even if advised of the
165 # possibility of such damage.
166 #
167 #
168 # == Links
169 #
170 # John Gruber:: http://daringfireball.net
171 # SmartyPants:: http://daringfireball.net/projects/smartypants
172 #
173 # Chad Miller:: http://web.chad.org
174 #
175 # Christian Neukirchen:: http://kronavita.de/chris
176 #
177
178
179 class RubyPants < String
180 VERSION = "0.1"
181
182 # Allowed elements in the options array:
183 #
184 # 0 :: do nothing
185 # 1 :: set all
186 # 2 :: set all, using old school en- and em- dash shortcuts
187 # 3 :: set all, using inverted old school en and em- dash shortcuts
188 # -1 :: stupefy (translate HTML entities to their ASCII-counterparts)
189 #
190 # <tt>:quotes</tt> :: quotes
191 # <tt>:backticks</tt> :: backtick quotes (``double'' only)
192 # <tt>:allbackticks</tt> :: backtick quotes (``double'' and `single')
193 # <tt>:dashes</tt> :: dashes
194 # <tt>:oldschool</tt> :: old school dashes
195 # <tt>:inverted</tt> :: inverted old school dashes
196 # <tt>:ellipses</tt> :: ellipses
197 # <tt>:convertquotes</tt> :: convert <tt>&quot;</tt> entities to
198 # <tt>"</tt> for Dreamweaver users
199 # <tt>:stupefy</tt> :: translate SmartyPants HTML entities
200 # to their ASCII counterparts.
201 #
202 def initialize(string, options=[2])
203 super string
204 @options = [*options]
205 end
206
207 # Apply SmartyPants transformations.
208 def to_html
209 do_quotes = do_backticks = do_dashes = do_ellipses = do_stupify = nil
210 convert_quotes = false
211
212 if @options.include? 0
213 # Do nothing.
214 return self
215 elsif @options.include? 1
216 # Do everything, turn all options on.
217 do_quotes = do_backticks = do_ellipses = true
218 do_dashes = :normal
219 elsif @options.include? 2
220 # Do everything, turn all options on, use old school dash shorthand.
221 do_quotes = do_backticks = do_ellipses = true
222 do_dashes = :oldschool
223 elsif @options.include? 3
224 # Do everything, turn all options on, use inverted old school
225 # dash shorthand.
226 do_quotes = do_backticks = do_ellipses = true
227 do_dashes = :inverted
228 elsif @options.include?(-1)
229 do_stupefy = true
230 else
231 do_quotes = @options.include? :quotes
232 do_backticks = @options.include? :backticks
233 do_backticks = :both if @options.include? :allbackticks
234 do_dashes = :normal if @options.include? :dashes
235 do_dashes = :oldschool if @options.include? :oldschool
236 do_dashes = :inverted if @options.include? :inverted
237 do_ellipses = @options.include? :ellipses
238 convert_quotes = @options.include? :convertquotes
239 do_stupefy = @options.include? :stupefy
240 end
241
242 # Parse the HTML
243 tokens = tokenize
244
245 # Keep track of when we're inside <pre> or <code> tags.
246 in_pre = false
247
248 # Here is the result stored in.
249 result = ""
250
251 # This is a cheat, used to get some context for one-character
252 # tokens that consist of just a quote char. What we do is remember
253 # the last character of the previous text token, to use as context
254 # to curl single- character quote tokens correctly.
255 prev_token_last_char = ""
256
257 tokens.each { |token|
258 if token.first == :tag
259 result << token[1]
260 if token[1] =~ %r!<(/?)(?:pre|code|kbd|script|math)[\s>]!
261 in_pre = ($1 != "/") # Opening or closing tag?
262 end
263 else
264 t = token[1]
265
266 # Remember last char of this token before processing.
267 last_char = t[-1]
268
269 unless in_pre
270 t = process_escapes t
271
272 t.gsub!(/&quot;/, '"') if convert_quotes
273
274 if do_dashes
275 t = educate_dashes t if do_dashes == :normal
276 t = educate_dashes_oldschool t if do_dashes == :oldschool
277 t = educate_dashes_inverted t if do_dashes == :inverted
278 end
279
280 t = educate_ellipses t if do_ellipses
281
282 # Note: backticks need to be processed before quotes.
283 if do_backticks
284 t = educate_backticks t
285 t = educate_single_backticks t if do_backticks == :both
286 end
287
288 if do_quotes
289 if t == "'"
290 # Special case: single-character ' token
291 if prev_token_last_char =~ /\S/
292 t = "&#8217;"
293 else
294 t = "&#8216;"
295 end
296 elsif t == '"'
297 # Special case: single-character " token
298 if prev_token_last_char =~ /\S/
299 t = "&#8221;"
300 else
301 t = "&#8220;"
302 end
303 else
304 # Normal case:
305 t = educate_quotes t
306 end
307 end
308
309 t = stupefy_entities t if do_stupefy
310 end
311
312 prev_token_last_char = last_char
313 result << t
314 end
315 }
316
317 # Done
318 result
319 end
320
321 protected
322
323 # Return the string, with after processing the following backslash
324 # escape sequences. This is useful if you want to force a "dumb" quote
325 # or other character to appear.
326 #
327 # Escaped are:
328 # \\ \" \' \. \- \`
329 #
330 def process_escapes(str)
331 str.gsub(/\\\\/, '&#92;').
332 gsub(/\\"/, '&#34;').
333 gsub(/\\'/, '&#39;').
334 gsub(/\\\./, '&#46;').
335 gsub(/\\-/, '&#45;').
336 gsub(/\\`/, '&#96;')
337 end
338
339 # The string, with each instance of "<tt>--</tt>" translated to an
340 # em-dash HTML entity.
341 #
342 def educate_dashes(str)
343 str.gsub(/--/, '&#8212;')
344 end
345
346 # The string, with each instance of "<tt>--</tt>" translated to an
347 # en-dash HTML entity, and each "<tt>---</tt>" translated to an
348 # em-dash HTML entity.
349 #
350 def educate_dashes_oldschool(str)
351 str.gsub(/---/, '&#8212;').gsub(/--/, '&#8211;')
352 end
353
354 # Return the string, with each instance of "<tt>--</tt>" translated
355 # to an em-dash HTML entity, and each "<tt>---</tt>" translated to
356 # an en-dash HTML entity. Two reasons why: First, unlike the en- and
357 # em-dash syntax supported by +educate_dashes_oldschool+, it's
358 # compatible with existing entries written before SmartyPants 1.1,
359 # back when "<tt>--</tt>" was only used for em-dashes. Second,
360 # em-dashes are more common than en-dashes, and so it sort of makes
361 # sense that the shortcut should be shorter to type. (Thanks to
362 # Aaron Swartz for the idea.)
363 #
364 def educate_dashes_inverted(str)
365 str.gsub(/---/, '&#8211;').gsub(/--/, '&#8212;')
366 end
367
368 # Return the string, with each instance of "<tt>...</tt>" translated
369 # to an ellipsis HTML entity. Also converts the case where there are
370 # spaces between the dots.
371 #
372 def educate_ellipses(str)
373 str.gsub('...', '&#8230;').gsub('. . .', '&#8230;')
374 end
375
376 # Return the string, with <tt>``backticks''</tt>-style single quotes
377 # translated into HTML curly quote entities.
378 #
379 def educate_backticks(str)
380 str.gsub("``", '&#8220;').gsub("''", '&#8221;')
381 end
382
383 # Return the string, with <tt>`backticks'</tt>-style single quotes
384 # translated into HTML curly quote entities.
385 #
386 def educate_single_backticks(str)
387 str.gsub("`", '&#8216;').gsub("'", '&#8217;')
388 end
389
390 # Return the string, with "educated" curly quote HTML entities.
391 #
392 def educate_quotes(str)
393 punct_class = '[!"#\$\%\'()*+,\-.\/:;<=>?\@\[\\\\\]\^_`{|}~]'
394
395 str = str.dup
396
397 # Special case if the very first character is a quote followed by
398 # punctuation at a non-word-break. Close the quotes by brute
399 # force:
400 str.gsub!(/^'(?=#{punct_class}\B)/, '&#8217;')
401 str.gsub!(/^"(?=#{punct_class}\B)/, '&#8221;')
402
403 # Special case for double sets of quotes, e.g.:
404 # <p>He said, "'Quoted' words in a larger quote."</p>
405 str.gsub!(/"'(?=\w)/, '&#8220;&#8216;')
406 str.gsub!(/'"(?=\w)/, '&#8216;&#8220;')
407
408 # Special case for decade abbreviations (the '80s):
409 str.gsub!(/'(?=\d\ds)/, '&#8217;')
410
411 close_class = %![^\ \t\r\n\\[\{\(\-]!
412 dec_dashes = '&#8211;|&#8212;'
413
414 # Get most opening single quotes:
415 str.gsub!(/(\s|&nbsp;|--|&[mn]dash;|#{dec_dashes}|&#x201[34];)'(?=\w)/,
416 '\1&#8216;')
417 # Single closing quotes:
418 str.gsub!(/(#{close_class})'/, '\1&#8217;')
419 str.gsub!(/'(\s|s\b|$)/, '&#8217;\1')
420 # Any remaining single quotes should be opening ones:
421 str.gsub!(/'/, '&#8216;')
422
423 # Get most opening double quotes:
424 str.gsub!(/(\s|&nbsp;|--|&[mn]dash;|#{dec_dashes}|&#x201[34];)"(?=\w)/,
425 '\1&#8220;')
426 # Double closing quotes:
427 str.gsub!(/(#{close_class})"/, '\1&#8221;')
428 str.gsub!(/"(\s|s\b|$)/, '&#8221;\1')
429 # Any remaining quotes should be opening ones:
430 str.gsub!(/"/, '&#8220;')
431
432 str
433 end
434
435 # Return the string, with each SmartyPants HTML entity translated to
436 # its ASCII counterpart.
437 #
438 def stupefy_entities(str)
439 str.
440 gsub(/&#8211;/, '-'). # en-dash
441 gsub(/&#8212;/, '--'). # em-dash
442
443 gsub(/&#8216;/, "'"). # open single quote
444 gsub(/&#8217;/, "'"). # close single quote
445
446 gsub(/&#8220;/, '"'). # open double quote
447 gsub(/&#8221;/, '"'). # close double quote
448
449 gsub(/&#8230;/, '...') # ellipsis
450 end
451
452 # Return an array of the tokens comprising the string. Each token is
453 # either a tag (possibly with nested, tags contained therein, such
454 # as <tt><a href="<MTFoo>"></tt>, or a run of text between
455 # tags. Each element of the array is a two-element array; the first
456 # is either :tag or :text; the second is the actual value.
457 #
458 # Based on the <tt>_tokenize()</tt> subroutine from Brad Choate's
459 # MTRegex plugin. <http://www.bradchoate.com/past/mtregex.php>
460 #
461 # This is actually the easier variant using tag_soup, as used by
462 # Chad Miller in the Python port of SmartyPants.
463 #
464 def tokenize
465 tag_soup = /([^<]*)(<[^>]*>)/
466
467 tokens = []
468
469 prev_end = 0
470 scan(tag_soup) {
471 tokens << [:text, $1] if $1 != ""
472 tokens << [:tag, $2]
473
474 prev_end = $~.end(0)
475 }
476
477 if prev_end < size
478 tokens << [:text, self[prev_end..-1]]
479 end
480
481 tokens
482 end
483 end
Something went wrong with that request. Please try again.