Skip to content

HTTPS clone URL

Subversion checkout URL

You can clone with HTTPS or Subversion.

Download ZIP
Newer
Older
100644 490 lines (441 sloc) 16.578 kb
d5a02a7 Brandon Mathis Added RubyPants (the ruby port of John Gruber's smarty pants) to intelli...
imathis authored
1 #
2 # = RubyPants -- SmartyPants ported to Ruby
3 #
4 # Ported by Christian Neukirchen <mailto:chneukirchen@gmail.com>
5 # Copyright (C) 2004 Christian Neukirchen
6 #
7 # Incooporates ideas, comments and documentation by Chad Miller
8 # Copyright (C) 2004 Chad Miller
9 #
10 # Original SmartyPants by John Gruber
11 # Copyright (C) 2003 John Gruber
12 #
13
14 #
15 # = RubyPants -- SmartyPants ported to Ruby
16 #
17 # == Synopsis
18 #
19 # RubyPants is a Ruby port of the smart-quotes library SmartyPants.
20 #
21 # The original "SmartyPants" is a free web publishing plug-in for
22 # Movable Type, Blosxom, and BBEdit that easily translates plain ASCII
23 # punctuation characters into "smart" typographic punctuation HTML
24 # entities.
25 #
26 #
27 # == Description
28 #
29 # RubyPants can perform the following transformations:
30 #
31 # * Straight quotes (<tt>"</tt> and <tt>'</tt>) into "curly" quote
32 # HTML entities
33 # * Backticks-style quotes (<tt>``like this''</tt>) into "curly" quote
34 # HTML entities
35 # * Dashes (<tt>--</tt> and <tt>---</tt>) into en- and em-dash
36 # entities
37 # * Three consecutive dots (<tt>...</tt> or <tt>. . .</tt>) into an
38 # ellipsis entity
39 #
40 # This means you can write, edit, and save your posts using plain old
41 # ASCII straight quotes, plain dashes, and plain dots, but your
42 # published posts (and final HTML output) will appear with smart
43 # quotes, em-dashes, and proper ellipses.
44 #
45 # RubyPants does not modify characters within <tt><pre></tt>,
46 # <tt><code></tt>, <tt><kbd></tt>, <tt><math></tt> or
47 # <tt><script></tt> tag blocks. Typically, these tags are used to
48 # display text where smart quotes and other "smart punctuation" would
49 # not be appropriate, such as source code or example markup.
50 #
51 #
52 # == Backslash Escapes
53 #
54 # If you need to use literal straight quotes (or plain hyphens and
55 # periods), RubyPants accepts the following backslash escape sequences
56 # to force non-smart punctuation. It does so by transforming the
57 # escape sequence into a decimal-encoded HTML entity:
58 #
59 # \\ \" \' \. \- \`
60 #
61 # This is useful, for example, when you want to use straight quotes as
62 # foot and inch marks: 6'2" tall; a 17" iMac. (Use <tt>6\'2\"</tt>
63 # resp. <tt>17\"</tt>.)
64 #
65 #
66 # == Algorithmic Shortcomings
67 #
68 # One situation in which quotes will get curled the wrong way is when
69 # apostrophes are used at the start of leading contractions. For
70 # example:
71 #
72 # 'Twas the night before Christmas.
73 #
74 # In the case above, RubyPants will turn the apostrophe into an
75 # opening single-quote, when in fact it should be a closing one. I
76 # don't think this problem can be solved in the general case--every
77 # word processor I've tried gets this wrong as well. In such cases,
78 # it's best to use the proper HTML entity for closing single-quotes
79 # ("<tt>&#8217;</tt>") by hand.
80 #
81 #
82 # == Bugs
83 #
84 # To file bug reports or feature requests (except see above) please
85 # send email to: mailto:chneukirchen@gmail.com
86 #
87 # If the bug involves quotes being curled the wrong way, please send
88 # example text to illustrate.
89 #
90 #
91 # == Authors
92 #
93 # John Gruber did all of the hard work of writing this software in
94 # Perl for Movable Type and almost all of this useful documentation.
95 # Chad Miller ported it to Python to use with Pyblosxom.
96 #
97 # Christian Neukirchen provided the Ruby port, as a general-purpose
98 # library that follows the *Cloth API.
99 #
100 #
101 # == Copyright and License
102 #
103 # === SmartyPants license:
104 #
105 # Copyright (c) 2003 John Gruber
106 # (http://daringfireball.net)
107 # All rights reserved.
108 #
109 # Redistribution and use in source and binary forms, with or without
110 # modification, are permitted provided that the following conditions
111 # are met:
112 #
113 # * Redistributions of source code must retain the above copyright
114 # notice, this list of conditions and the following disclaimer.
115 #
116 # * Redistributions in binary form must reproduce the above copyright
117 # notice, this list of conditions and the following disclaimer in
118 # the documentation and/or other materials provided with the
119 # distribution.
120 #
121 # * Neither the name "SmartyPants" nor the names of its contributors
122 # may be used to endorse or promote products derived from this
123 # software without specific prior written permission.
124 #
125 # This software is provided by the copyright holders and contributors
126 # "as is" and any express or implied warranties, including, but not
127 # limited to, the implied warranties of merchantability and fitness
128 # for a particular purpose are disclaimed. In no event shall the
129 # copyright owner or contributors be liable for any direct, indirect,
130 # incidental, special, exemplary, or consequential damages (including,
131 # but not limited to, procurement of substitute goods or services;
132 # loss of use, data, or profits; or business interruption) however
133 # caused and on any theory of liability, whether in contract, strict
134 # liability, or tort (including negligence or otherwise) arising in
135 # any way out of the use of this software, even if advised of the
136 # possibility of such damage.
137 #
138 # === RubyPants license
139 #
140 # RubyPants is a derivative work of SmartyPants and smartypants.py.
141 #
142 # Redistribution and use in source and binary forms, with or without
143 # modification, are permitted provided that the following conditions
144 # are met:
145 #
146 # * Redistributions of source code must retain the above copyright
147 # notice, this list of conditions and the following disclaimer.
148 #
149 # * Redistributions in binary form must reproduce the above copyright
150 # notice, this list of conditions and the following disclaimer in
151 # the documentation and/or other materials provided with the
152 # distribution.
153 #
154 # This software is provided by the copyright holders and contributors
155 # "as is" and any express or implied warranties, including, but not
156 # limited to, the implied warranties of merchantability and fitness
157 # for a particular purpose are disclaimed. In no event shall the
158 # copyright owner or contributors be liable for any direct, indirect,
159 # incidental, special, exemplary, or consequential damages (including,
160 # but not limited to, procurement of substitute goods or services;
161 # loss of use, data, or profits; or business interruption) however
162 # caused and on any theory of liability, whether in contract, strict
163 # liability, or tort (including negligence or otherwise) arising in
164 # any way out of the use of this software, even if advised of the
165 # possibility of such damage.
166 #
167 #
168 # == Links
169 #
170 # John Gruber:: http://daringfireball.net
171 # SmartyPants:: http://daringfireball.net/projects/smartypants
172 #
173 # Chad Miller:: http://web.chad.org
174 #
175 # Christian Neukirchen:: http://kronavita.de/chris
176 #
177
178
179 class RubyPants < String
180
181 # Create a new RubyPants instance with the text in +string+.
182 #
183 # Allowed elements in the options array:
184 #
185 # 0 :: do nothing
186 # 1 :: enable all, using only em-dash shortcuts
187 # 2 :: enable all, using old school en- and em-dash shortcuts (*default*)
188 # 3 :: enable all, using inverted old school en and em-dash shortcuts
189 # -1 :: stupefy (translate HTML entities to their ASCII-counterparts)
190 #
191 # If you don't like any of these defaults, you can pass symbols to change
192 # RubyPants' behavior:
193 #
194 # <tt>:quotes</tt> :: quotes
195 # <tt>:backticks</tt> :: backtick quotes (``double'' only)
196 # <tt>:allbackticks</tt> :: backtick quotes (``double'' and `single')
197 # <tt>:dashes</tt> :: dashes
198 # <tt>:oldschool</tt> :: old school dashes
199 # <tt>:inverted</tt> :: inverted old school dashes
200 # <tt>:ellipses</tt> :: ellipses
201 # <tt>:convertquotes</tt> :: convert <tt>&quot;</tt> entities to
202 # <tt>"</tt> for Dreamweaver users
203 # <tt>:stupefy</tt> :: translate RubyPants HTML entities
204 # to their ASCII counterparts.
205 #
206 def initialize(string, options=[2])
207 super string
208 @options = [*options]
209 end
210
211 # Apply SmartyPants transformations.
212 def to_html
213 do_quotes = do_backticks = do_dashes = do_ellipses = do_stupify = nil
214 convert_quotes = false
215
216 if @options.include? 0
217 # Do nothing.
218 return self
219 elsif @options.include? 1
220 # Do everything, turn all options on.
221 do_quotes = do_backticks = do_ellipses = true
222 do_dashes = :normal
223 elsif @options.include? 2
224 # Do everything, turn all options on, use old school dash shorthand.
225 do_quotes = do_backticks = do_ellipses = true
226 do_dashes = :oldschool
227 elsif @options.include? 3
228 # Do everything, turn all options on, use inverted old school
229 # dash shorthand.
230 do_quotes = do_backticks = do_ellipses = true
231 do_dashes = :inverted
232 elsif @options.include?(-1)
233 do_stupefy = true
234 else
235 do_quotes = @options.include? :quotes
236 do_backticks = @options.include? :backticks
237 do_backticks = :both if @options.include? :allbackticks
238 do_dashes = :normal if @options.include? :dashes
239 do_dashes = :oldschool if @options.include? :oldschool
240 do_dashes = :inverted if @options.include? :inverted
241 do_ellipses = @options.include? :ellipses
242 convert_quotes = @options.include? :convertquotes
243 do_stupefy = @options.include? :stupefy
244 end
245
246 # Parse the HTML
247 tokens = tokenize
248
249 # Keep track of when we're inside <pre> or <code> tags.
250 in_pre = false
251
252 # Here is the result stored in.
253 result = ""
254
255 # This is a cheat, used to get some context for one-character
256 # tokens that consist of just a quote char. What we do is remember
257 # the last character of the previous text token, to use as context
258 # to curl single- character quote tokens correctly.
259 prev_token_last_char = nil
260
261 tokens.each { |token|
262 if token.first == :tag
263 result << token[1]
264 if token[1] =~ %r!<(/?)(?:pre|code|kbd|script|math)[\s>]!
265 in_pre = ($1 != "/") # Opening or closing tag?
266 end
267 else
268 t = token[1]
269
270 # Remember last char of this token before processing.
271 last_char = t[-1].chr
272
273 unless in_pre
274 t = process_escapes t
275
276 t.gsub!(/&quot;/, '"') if convert_quotes
277
278 if do_dashes
279 t = educate_dashes t if do_dashes == :normal
280 t = educate_dashes_oldschool t if do_dashes == :oldschool
281 t = educate_dashes_inverted t if do_dashes == :inverted
282 end
283
284 t = educate_ellipses t if do_ellipses
285
286 # Note: backticks need to be processed before quotes.
287 if do_backticks
288 t = educate_backticks t
289 t = educate_single_backticks t if do_backticks == :both
290 end
291
292 if do_quotes
293 if t == "'"
294 # Special case: single-character ' token
295 if prev_token_last_char =~ /\S/
296 t = "&#8217;"
297 else
298 t = "&#8216;"
299 end
300 elsif t == '"'
301 # Special case: single-character " token
302 if prev_token_last_char =~ /\S/
303 t = "&#8221;"
304 else
305 t = "&#8220;"
306 end
307 else
308 # Normal case:
309 t = educate_quotes t
310 end
311 end
312
313 t = stupefy_entities t if do_stupefy
314 end
315
316 prev_token_last_char = last_char
317 result << t
318 end
319 }
320
321 # Done
322 result
323 end
324
325 protected
326
327 # Return the string, with after processing the following backslash
328 # escape sequences. This is useful if you want to force a "dumb" quote
329 # or other character to appear.
330 #
331 # Escaped are:
332 # \\ \" \' \. \- \`
333 #
334 def process_escapes(str)
335 str.gsub('\\\\', '&#92;').
336 gsub('\"', '&#34;').
337 gsub("\\\'", '&#39;').
338 gsub('\.', '&#46;').
339 gsub('\-', '&#45;').
340 gsub('\`', '&#96;')
341 end
342
343 # The string, with each instance of "<tt>--</tt>" translated to an
344 # em-dash HTML entity.
345 #
346 def educate_dashes(str)
347 str.gsub(/--/, '&#8212;')
348 end
349
350 # The string, with each instance of "<tt>--</tt>" translated to an
351 # en-dash HTML entity, and each "<tt>---</tt>" translated to an
352 # em-dash HTML entity.
353 #
354 def educate_dashes_oldschool(str)
355 str.gsub(/---/, '&#8212;').gsub(/--/, '&#8211;')
356 end
357
358 # Return the string, with each instance of "<tt>--</tt>" translated
359 # to an em-dash HTML entity, and each "<tt>---</tt>" translated to
360 # an en-dash HTML entity. Two reasons why: First, unlike the en- and
361 # em-dash syntax supported by +educate_dashes_oldschool+, it's
362 # compatible with existing entries written before SmartyPants 1.1,
363 # back when "<tt>--</tt>" was only used for em-dashes. Second,
364 # em-dashes are more common than en-dashes, and so it sort of makes
365 # sense that the shortcut should be shorter to type. (Thanks to
366 # Aaron Swartz for the idea.)
367 #
368 def educate_dashes_inverted(str)
369 str.gsub(/---/, '&#8211;').gsub(/--/, '&#8212;')
370 end
371
372 # Return the string, with each instance of "<tt>...</tt>" translated
373 # to an ellipsis HTML entity. Also converts the case where there are
374 # spaces between the dots.
375 #
376 def educate_ellipses(str)
377 str.gsub('...', '&#8230;').gsub('. . .', '&#8230;')
378 end
379
380 # Return the string, with "<tt>``backticks''</tt>"-style single quotes
381 # translated into HTML curly quote entities.
382 #
383 def educate_backticks(str)
384 str.gsub("``", '&#8220;').gsub("''", '&#8221;')
385 end
386
387 # Return the string, with "<tt>`backticks'</tt>"-style single quotes
388 # translated into HTML curly quote entities.
389 #
390 def educate_single_backticks(str)
391 str.gsub("`", '&#8216;').gsub("'", '&#8217;')
392 end
393
394 # Return the string, with "educated" curly quote HTML entities.
395 #
396 def educate_quotes(str)
397 punct_class = '[!"#\$\%\'()*+,\-.\/:;<=>?\@\[\\\\\]\^_`{|}~]'
398
399 str = str.dup
400
401 # Special case if the very first character is a quote followed by
402 # punctuation at a non-word-break. Close the quotes by brute
403 # force:
404 str.gsub!(/^'(?=#{punct_class}\B)/, '&#8217;')
405 str.gsub!(/^"(?=#{punct_class}\B)/, '&#8221;')
406
407 # Special case for double sets of quotes, e.g.:
408 # <p>He said, "'Quoted' words in a larger quote."</p>
409 str.gsub!(/"'(?=\w)/, '&#8220;&#8216;')
410 str.gsub!(/'"(?=\w)/, '&#8216;&#8220;')
411
412 # Special case for decade abbreviations (the '80s):
413 str.gsub!(/'(?=\d\ds)/, '&#8217;')
414
415 close_class = %![^\ \t\r\n\\[\{\(\-]!
416 dec_dashes = '&#8211;|&#8212;'
417
418 # Get most opening single quotes:
419 str.gsub!(/(\s|&nbsp;|--|&[mn]dash;|#{dec_dashes}|&#x201[34];)'(?=\w)/,
420 '\1&#8216;')
421 # Single closing quotes:
422 str.gsub!(/(#{close_class})'/, '\1&#8217;')
423 str.gsub!(/'(\s|s\b|$)/, '&#8217;\1')
424 # Any remaining single quotes should be opening ones:
425 str.gsub!(/'/, '&#8216;')
426
427 # Get most opening double quotes:
428 str.gsub!(/(\s|&nbsp;|--|&[mn]dash;|#{dec_dashes}|&#x201[34];)"(?=\w)/,
429 '\1&#8220;')
430 # Double closing quotes:
431 str.gsub!(/(#{close_class})"/, '\1&#8221;')
432 str.gsub!(/"(\s|s\b|$)/, '&#8221;\1')
433 # Any remaining quotes should be opening ones:
434 str.gsub!(/"/, '&#8220;')
435
436 str
437 end
438
439 # Return the string, with each RubyPants HTML entity translated to
440 # its ASCII counterpart.
441 #
442 # Note: This is not reversible (but exactly the same as in SmartyPants)
443 #
444 def stupefy_entities(str)
445 str.
446 gsub(/&#8211;/, '-'). # en-dash
447 gsub(/&#8212;/, '--'). # em-dash
448
449 gsub(/&#8216;/, "'"). # open single quote
450 gsub(/&#8217;/, "'"). # close single quote
451
452 gsub(/&#8220;/, '"'). # open double quote
453 gsub(/&#8221;/, '"'). # close double quote
454
455 gsub(/&#8230;/, '...') # ellipsis
456 end
457
458 # Return an array of the tokens comprising the string. Each token is
459 # either a tag (possibly with nested, tags contained therein, such
460 # as <tt><a href="<MTFoo>"></tt>, or a run of text between
461 # tags. Each element of the array is a two-element array; the first
462 # is either :tag or :text; the second is the actual value.
463 #
464 # Based on the <tt>_tokenize()</tt> subroutine from Brad Choate's
465 # MTRegex plugin. <http://www.bradchoate.com/past/mtregex.php>
466 #
467 # This is actually the easier variant using tag_soup, as used by
468 # Chad Miller in the Python port of SmartyPants.
469 #
470 def tokenize
471 tag_soup = /([^<]*)(<[^>]*>)/
472
473 tokens = []
474
475 prev_end = 0
476 scan(tag_soup) {
477 tokens << [:text, $1] if $1 != ""
478 tokens << [:tag, $2]
479
480 prev_end = $~.end(0)
481 }
482
483 if prev_end < size
484 tokens << [:text, self[prev_end..-1]]
485 end
486
487 tokens
488 end
489 end
Something went wrong with that request. Please try again.