Skip to content

HTTPS clone URL

Subversion checkout URL

You can clone with HTTPS or Subversion.

Download ZIP
Newer
Older
100644 483 lines (435 sloc) 16.377 kb
82d0251 Brandon Mathis improved starting point
imathis authored
1 #
2 # = RubyPants - SmartyPants ported to Ruby
3 #
4 # Ported by Christian Neukirchen <mailto:chneukirchen@gmail.com>
5 # Copyright (C) 2004 Christian Neukirchen
6 #
7 # Incooporates ideas, comments and documentation by Chad Miller
8 # Copyright (C) 2004 Chad Miller
9 #
10 # Original SmartyPants by John Gruber
11 # Copyright (C) 2003 John Gruber
12 #
13
14 #
15 # = RubyPants - SmartyPants ported to Ruby
16 #
17 # == Synopsis
18 #
19 # RubyPants is a Ruby port of the smart-quotes library SmartyPants.
20 #
21 # The original "SmartyPants" is a free web publishing plug-in for
22 # Movable Type, Blosxom, and BBEdit that easily translates plain ASCII
23 # punctuation characters into "smart" typographic punctuation HTML
24 # entities.
25 #
26 #
27 # == Description
28 #
29 # RubyPants can perform the following transformations:
30 #
31 # * Straight quotes (<tt>"</tt> and <tt>'</tt>) into "curly" quote
32 # HTML entities
33 # * Backticks-style quotes (<tt>``like this''</tt>) into "curly" quote
34 # HTML entities
35 # * Dashes (<tt>--</tt> and <tt>---</tt>) into en- and em-dash
36 # entities
37 # * Three consecutive dots (<tt>...</tt> or <tt>. . .</tt>) into an
38 # ellipsis entity
39 #
40 # This means you can write, edit, and save your posts using plain old
41 # ASCII straight quotes, plain dashes, and plain dots, but your
42 # published posts (and final HTML output) will appear with smart
43 # quotes, em-dashes, and proper ellipses.
44 #
45 # RubyPants does not modify characters within <tt><pre></tt>,
46 # <tt><code></tt>, <tt><kbd></tt>, <tt><math></tt> or
47 # <tt><script></tt> tag blocks. Typically, these tags are used to
48 # display text where smart quotes and other "smart punctuation" would
49 # not be appropriate, such as source code or example markup.
50 #
51 #
52 # == Backslash Escapes
53 #
54 # If you need to use literal straight quotes (or plain hyphens and
55 # periods), RubyPants accepts the following backslash escape sequences
56 # to force non-smart punctuation. It does so by transforming the
57 # escape sequence into a decimal-encoded HTML entity:
58 #
59 # \\ \" \' \. \- \`
60 #
61 # This is useful, for example, when you want to use straight quotes as
62 # foot and inch marks: 6'2" tall; a 17" iMac. (Use <tt>6\'2\"</tt>
63 # resp. <tt>17\"</tt>.)
64 #
65 #
66 # == Algorithmic Shortcomings
67 #
68 # One situation in which quotes will get curled the wrong way is when
69 # apostrophes are used at the start of leading contractions. For
70 # example:
71 #
72 # 'Twas the night before Christmas.
73 #
74 # In the case above, RubyPants will turn the apostrophe into an
75 # opening single-quote, when in fact it should be a closing one. I
76 # don't think this problem can be solved in the general case--every
77 # word processor I've tried gets this wrong as well. In such cases,
78 # it's best to use the proper HTML entity for closing single-quotes
79 # (``&#8217;``) by hand.
80 #
81 #
82 # == Bugs
83 #
84 # To file bug reports or feature requests (except see above) please
85 # send email to: mailto:chneukirchen@gmail.com
86 #
87 # If the bug involves quotes being curled the wrong way, please send
88 # example text to illustrate.
89 #
90 #
91 # == Authors
92 #
93 # John Gruber did all of the hard work of writing this software in
94 # Perl for Movable Type and almost all of this useful documentation.
95 # Chad Miller ported it to Python to use with Pyblosxom.
96 #
97 # Christian Neukirchen provided the Ruby port, as a general-purpose
98 # library that follows the *Cloth api.
99 #
100 #
101 # == Copyright and License
102 #
103 # === SmartyPants license:
104 #
105 # Copyright (c) 2003 John Gruber
106 # (http://daringfireball.net)
107 # All rights reserved.
108 #
109 # Redistribution and use in source and binary forms, with or without
110 # modification, are permitted provided that the following conditions
111 # are met:
112 #
113 # * Redistributions of source code must retain the above copyright
114 # notice, this list of conditions and the following disclaimer.
115 #
116 # * Redistributions in binary form must reproduce the above copyright
117 # notice, this list of conditions and the following disclaimer in
118 # the documentation and/or other materials provided with the
119 # distribution.
120 #
121 # * Neither the name "SmartyPants" nor the names of its contributors
122 # may be used to endorse or promote products derived from this
123 # software without specific prior written permission.
124 #
125 # This software is provided by the copyright holders and contributors
126 # "as is" and any express or implied warranties, including, but not
127 # limited to, the implied warranties of merchantability and fitness
128 # for a particular purpose are disclaimed. In no event shall the
129 # copyright owner or contributors be liable for any direct, indirect,
130 # incidental, special, exemplary, or consequential damages (including,
131 # but not limited to, procurement of substitute goods or services;
132 # loss of use, data, or profits; or business interruption) however
133 # caused and on any theory of liability, whether in contract, strict
134 # liability, or tort (including negligence or otherwise) arising in
135 # any way out of the use of this software, even if advised of the
136 # possibility of such damage.
137 #
138 # === RubyPants license
139 #
140 # RubyPants is a derivative work of SmartyPants and smartypants.py.
141 #
142 # Redistribution and use in source and binary forms, with or without
143 # modification, are permitted provided that the following conditions
144 # are met:
145 #
146 # * Redistributions of source code must retain the above copyright
147 # notice, this list of conditions and the following disclaimer.
148 #
149 # * Redistributions in binary form must reproduce the above copyright
150 # notice, this list of conditions and the following disclaimer in
151 # the documentation and/or other materials provided with the
152 # distribution.
153 #
154 # This software is provided by the copyright holders and contributors
155 # "as is" and any express or implied warranties, including, but not
156 # limited to, the implied warranties of merchantability and fitness
157 # for a particular purpose are disclaimed. In no event shall the
158 # copyright owner or contributors be liable for any direct, indirect,
159 # incidental, special, exemplary, or consequential damages (including,
160 # but not limited to, procurement of substitute goods or services;
161 # loss of use, data, or profits; or business interruption) however
162 # caused and on any theory of liability, whether in contract, strict
163 # liability, or tort (including negligence or otherwise) arising in
164 # any way out of the use of this software, even if advised of the
165 # possibility of such damage.
166 #
167 #
168 # == Links
169 #
170 # John Gruber:: http://daringfireball.net
171 # SmartyPants:: http://daringfireball.net/projects/smartypants
172 #
173 # Chad Miller:: http://web.chad.org
174 #
175 # Christian Neukirchen:: http://kronavita.de/chris
176 #
177
178
179 class RubyPants < String
180 VERSION = "0.1"
181
182 # Allowed elements in the options array:
183 #
184 # 0 :: do nothing
185 # 1 :: set all
186 # 2 :: set all, using old school en- and em- dash shortcuts
187 # 3 :: set all, using inverted old school en and em- dash shortcuts
188 # -1 :: stupefy (translate HTML entities to their ASCII-counterparts)
189 #
190 # <tt>:quotes</tt> :: quotes
191 # <tt>:backticks</tt> :: backtick quotes (``double'' only)
192 # <tt>:allbackticks</tt> :: backtick quotes (``double'' and `single')
193 # <tt>:dashes</tt> :: dashes
194 # <tt>:oldschool</tt> :: old school dashes
195 # <tt>:inverted</tt> :: inverted old school dashes
196 # <tt>:ellipses</tt> :: ellipses
197 # <tt>:convertquotes</tt> :: convert <tt>&quot;</tt> entities to
198 # <tt>"</tt> for Dreamweaver users
199 # <tt>:stupefy</tt> :: translate SmartyPants HTML entities
200 # to their ASCII counterparts.
201 #
202 def initialize(string, options=[2])
203 super string
204 @options = [*options]
205 end
206
207 # Apply SmartyPants transformations.
208 def to_html
209 do_quotes = do_backticks = do_dashes = do_ellipses = do_stupify = nil
210 convert_quotes = false
211
212 if @options.include? 0
213 # Do nothing.
214 return self
215 elsif @options.include? 1
216 # Do everything, turn all options on.
217 do_quotes = do_backticks = do_ellipses = true
218 do_dashes = :normal
219 elsif @options.include? 2
220 # Do everything, turn all options on, use old school dash shorthand.
221 do_quotes = do_backticks = do_ellipses = true
222 do_dashes = :oldschool
223 elsif @options.include? 3
224 # Do everything, turn all options on, use inverted old school
225 # dash shorthand.
226 do_quotes = do_backticks = do_ellipses = true
227 do_dashes = :inverted
228 elsif @options.include?(-1)
229 do_stupefy = true
230 else
231 do_quotes = @options.include? :quotes
232 do_backticks = @options.include? :backticks
233 do_backticks = :both if @options.include? :allbackticks
234 do_dashes = :normal if @options.include? :dashes
235 do_dashes = :oldschool if @options.include? :oldschool
236 do_dashes = :inverted if @options.include? :inverted
237 do_ellipses = @options.include? :ellipses
238 convert_quotes = @options.include? :convertquotes
239 do_stupefy = @options.include? :stupefy
240 end
241
242 # Parse the HTML
243 tokens = tokenize
244
245 # Keep track of when we're inside <pre> or <code> tags.
246 in_pre = false
247
248 # Here is the result stored in.
249 result = ""
250
251 # This is a cheat, used to get some context for one-character
252 # tokens that consist of just a quote char. What we do is remember
253 # the last character of the previous text token, to use as context
254 # to curl single- character quote tokens correctly.
255 prev_token_last_char = ""
256
257 tokens.each { |token|
258 if token.first == :tag
259 result << token[1]
260 if token[1] =~ %r!<(/?)(?:pre|code|kbd|script|math)[\s>]!
261 in_pre = ($1 != "/") # Opening or closing tag?
262 end
263 else
264 t = token[1]
265
266 # Remember last char of this token before processing.
267 last_char = t[-1]
268
269 unless in_pre
270 t = process_escapes t
271
272 t.gsub!(/&quot;/, '"') if convert_quotes
273
274 if do_dashes
275 t = educate_dashes t if do_dashes == :normal
276 t = educate_dashes_oldschool t if do_dashes == :oldschool
277 t = educate_dashes_inverted t if do_dashes == :inverted
278 end
279
280 t = educate_ellipses t if do_ellipses
281
282 # Note: backticks need to be processed before quotes.
283 if do_backticks
284 t = educate_backticks t
285 t = educate_single_backticks t if do_backticks == :both
286 end
287
288 if do_quotes
289 if t == "'"
290 # Special case: single-character ' token
291 if prev_token_last_char =~ /\S/
292 t = "&#8217;"
293 else
294 t = "&#8216;"
295 end
296 elsif t == '"'
297 # Special case: single-character " token
298 if prev_token_last_char =~ /\S/
299 t = "&#8221;"
300 else
301 t = "&#8220;"
302 end
303 else
304 # Normal case:
305 t = educate_quotes t
306 end
307 end
308
309 t = stupefy_entities t if do_stupefy
310 end
311
312 prev_token_last_char = last_char
313 result << t
314 end
315 }
316
317 # Done
318 result
319 end
320
321 protected
322
323 # Return the string, with after processing the following backslash
324 # escape sequences. This is useful if you want to force a "dumb" quote
325 # or other character to appear.
326 #
327 # Escaped are:
328 # \\ \" \' \. \- \`
329 #
330 def process_escapes(str)
331 str.gsub(/\\\\/, '&#92;').
332 gsub(/\\"/, '&#34;').
333 gsub(/\\'/, '&#39;').
334 gsub(/\\\./, '&#46;').
335 gsub(/\\-/, '&#45;').
336 gsub(/\\`/, '&#96;')
337 end
338
339 # The string, with each instance of "<tt>--</tt>" translated to an
340 # em-dash HTML entity.
341 #
342 def educate_dashes(str)
343 str.gsub(/--/, '&#8212;')
344 end
345
346 # The string, with each instance of "<tt>--</tt>" translated to an
347 # en-dash HTML entity, and each "<tt>---</tt>" translated to an
348 # em-dash HTML entity.
349 #
350 def educate_dashes_oldschool(str)
351 str.gsub(/---/, '&#8212;').gsub(/--/, '&#8211;')
352 end
353
354 # Return the string, with each instance of "<tt>--</tt>" translated
355 # to an em-dash HTML entity, and each "<tt>---</tt>" translated to
356 # an en-dash HTML entity. Two reasons why: First, unlike the en- and
357 # em-dash syntax supported by +educate_dashes_oldschool+, it's
358 # compatible with existing entries written before SmartyPants 1.1,
359 # back when "<tt>--</tt>" was only used for em-dashes. Second,
360 # em-dashes are more common than en-dashes, and so it sort of makes
361 # sense that the shortcut should be shorter to type. (Thanks to
362 # Aaron Swartz for the idea.)
363 #
364 def educate_dashes_inverted(str)
365 str.gsub(/---/, '&#8211;').gsub(/--/, '&#8212;')
366 end
367
368 # Return the string, with each instance of "<tt>...</tt>" translated
369 # to an ellipsis HTML entity. Also converts the case where there are
370 # spaces between the dots.
371 #
372 def educate_ellipses(str)
373 str.gsub('...', '&#8230;').gsub('. . .', '&#8230;')
374 end
375
376 # Return the string, with <tt>``backticks''</tt>-style single quotes
377 # translated into HTML curly quote entities.
378 #
379 def educate_backticks(str)
380 str.gsub("``", '&#8220;').gsub("''", '&#8221;')
381 end
382
383 # Return the string, with <tt>`backticks'</tt>-style single quotes
384 # translated into HTML curly quote entities.
385 #
386 def educate_single_backticks(str)
387 str.gsub("`", '&#8216;').gsub("'", '&#8217;')
388 end
389
390 # Return the string, with "educated" curly quote HTML entities.
391 #
392 def educate_quotes(str)
393 punct_class = '[!"#\$\%\'()*+,\-.\/:;<=>?\@\[\\\\\]\^_`{|}~]'
394
395 str = str.dup
396
397 # Special case if the very first character is a quote followed by
398 # punctuation at a non-word-break. Close the quotes by brute
399 # force:
400 str.gsub!(/^'(?=#{punct_class}\B)/, '&#8217;')
401 str.gsub!(/^"(?=#{punct_class}\B)/, '&#8221;')
402
403 # Special case for double sets of quotes, e.g.:
404 # <p>He said, "'Quoted' words in a larger quote."</p>
405 str.gsub!(/"'(?=\w)/, '&#8220;&#8216;')
406 str.gsub!(/'"(?=\w)/, '&#8216;&#8220;')
407
408 # Special case for decade abbreviations (the '80s):
409 str.gsub!(/'(?=\d\ds)/, '&#8217;')
410
411 close_class = %![^\ \t\r\n\\[\{\(\-]!
412 dec_dashes = '&#8211;|&#8212;'
413
414 # Get most opening single quotes:
415 str.gsub!(/(\s|&nbsp;|--|&[mn]dash;|#{dec_dashes}|&#x201[34];)'(?=\w)/,
416 '\1&#8216;')
417 # Single closing quotes:
418 str.gsub!(/(#{close_class})'/, '\1&#8217;')
419 str.gsub!(/'(\s|s\b|$)/, '&#8217;\1')
420 # Any remaining single quotes should be opening ones:
421 str.gsub!(/'/, '&#8216;')
422
423 # Get most opening double quotes:
424 str.gsub!(/(\s|&nbsp;|--|&[mn]dash;|#{dec_dashes}|&#x201[34];)"(?=\w)/,
425 '\1&#8220;')
426 # Double closing quotes:
427 str.gsub!(/(#{close_class})"/, '\1&#8221;')
428 str.gsub!(/"(\s|s\b|$)/, '&#8221;\1')
429 # Any remaining quotes should be opening ones:
430 str.gsub!(/"/, '&#8220;')
431
432 str
433 end
434
435 # Return the string, with each SmartyPants HTML entity translated to
436 # its ASCII counterpart.
437 #
438 def stupefy_entities(str)
439 str.
440 gsub(/&#8211;/, '-'). # en-dash
441 gsub(/&#8212;/, '--'). # em-dash
442
443 gsub(/&#8216;/, "'"). # open single quote
444 gsub(/&#8217;/, "'"). # close single quote
445
446 gsub(/&#8220;/, '"'). # open double quote
447 gsub(/&#8221;/, '"'). # close double quote
448
449 gsub(/&#8230;/, '...') # ellipsis
450 end
451
452 # Return an array of the tokens comprising the string. Each token is
453 # either a tag (possibly with nested, tags contained therein, such
454 # as <tt><a href="<MTFoo>"></tt>, or a run of text between
455 # tags. Each element of the array is a two-element array; the first
456 # is either :tag or :text; the second is the actual value.
457 #
458 # Based on the <tt>_tokenize()</tt> subroutine from Brad Choate's
459 # MTRegex plugin. <http://www.bradchoate.com/past/mtregex.php>
460 #
461 # This is actually the easier variant using tag_soup, as used by
462 # Chad Miller in the Python port of SmartyPants.
463 #
464 def tokenize
465 tag_soup = /([^<]*)(<[^>]*>)/
466
467 tokens = []
468
469 prev_end = 0
470 scan(tag_soup) {
471 tokens << [:text, $1] if $1 != ""
472 tokens << [:tag, $2]
473
474 prev_end = $~.end(0)
475 }
476
477 if prev_end < size
478 tokens << [:text, self[prev_end..-1]]
479 end
480
481 tokens
482 end
483 end
Something went wrong with that request. Please try again.