Skip to content

HTTPS clone URL

Subversion checkout URL

You can clone with
or
.
Download ZIP
Browse files

Significant overhaul, details follow. Closes #39

* Convert book source to DocBook XML 5.0.
* Write XSL stylesheets to convert the XML to HTML5.
* Delete railroad diagrams
* Add Glossary.
* Copy-edit messages.
* Change References to Bibliography.
* Switch to CodeRay for example highlighting.
* Temporarily remove Sinatra integration.
  • Loading branch information...
commit df3378fa142fff557e5266b8a94f00fe074d59ab 1 parent 7399338
@runpaint authored
Showing with 186 additions and 17,384 deletions.
  1. +4 −0 .gitignore
  2. +74 −65 Rakefile
  3. +0 −9 apache.mustache
  4. +34 −0 book.xml
  5. +0 −41 chapter.mustache
  6. +0 −3  chapter_css.mustache
  7. +0 −463 chapters/classes.html
  8. +0 −524 chapters/closures.html
  9. +0 −711 chapters/encoding.html
  10. +0 −297 chapters/enumerables.html
  11. +0 −1,025 chapters/files.html
  12. +0 −1,497 chapters/flow.html
  13. +0 −1,273 chapters/io.html
  14. +0 −102 chapters/keywords.html
  15. +0 −1,526 chapters/messages.html
  16. +0 −1,238 chapters/methods.html
  17. +0 −372 chapters/modules.html
  18. +0 −1,060 chapters/numerics.html
  19. +0 −704 chapters/objects.html
  20. +0 −413 chapters/programs.html
  21. +0 −368 chapters/punctuation.html
  22. +0 −130 chapters/references.html
  23. +0 −1,135 chapters/regexps.html
  24. +0 −1,748 chapters/strings.html
  25. +0 −1,380 chapters/variables.html
  26. +0 −1  css/chapter.mustache
  27. +0 −1  css/index.mustache
  28. +0 −1  css/main.mustache
  29. +0 −75 css/pygments.css
  30. +0 −72 css/syntax.css
  31. +0 −1  css/syntax.mustache
  32. +0 −31 css/toc.css
  33. +0 −1  css/toc.mustache
  34. +1 −0  docbook
  35. +0 −20 error.mustache
  36. +0 −17 errors/text-303.html
  37. +0 −1  example.mustache
  38. +4 −2 examples/_by-selector.rb
  39. +1 −0  examples/and-and-nil-receiver.txt
  40. +4 −11 examples/aref-music.rb
  41. +12 −0 examples/argf-shell.txt
  42. +14 −0 examples/argv-shell.txt
  43. +0 −3  examples/bang-messages.rb
  44. +4 −2 examples/call-selector.rb
  45. +8 −4 examples/each_-selector.rb
  46. +3 −6 examples/implicit-receiver-for-private-method.rb
  47. +0 −6 examples/message-chaining-inconsistent-return-value.rb
  48. +0 −2  examples/message-chaining-return-self.rb
  49. +2 −1  examples/message-chaining.rb
  50. +2 −1  examples/message-expressions-omit-parentheses-syntaxerror.rb
  51. +1 −0  examples/non-block-local-variables-clo.rb
  52. +0 −7 examples/non-block-local-variables-example.rb
  53. 0  figures/non-block-local-variables-example.rb → examples/non-block-local-variables.rb
  54. +3 −5 examples/object-send.rb
  55. +0 −5 examples/parentheses-disambiguate-bare-identifiers.rb
  56. +4 −2 examples/parentheses-omitted-in-message-expressions.rb
  57. +1 −1  examples/predicate-messages.rb
  58. +6 −3 examples/receiver-selector-arguments-syntax.rb
  59. +1 −1  examples/respond-to.rb
  60. +2 −0  examples/send-if-respond-to.txt
  61. +1 −0  examples/ternary-nil-receiver.txt
  62. +0 −2  figures/_by-selector.rb
  63. +0 −7 figures/alias-for-wrapping.rb
  64. +0 −1  figures/alias-statement.ebnf
  65. BIN  figures/alias-statement.png
  66. +0 −4 figures/ampersand-prefix-with-argument.rb
  67. +0 −6 figures/and-operator-fig.rb
  68. +0 −7 figures/anonymous-class.rb
  69. +0 −17 figures/aref-music.rb
  70. +0 −1  figures/argument-list.ebnf
  71. BIN  figures/argument-list.png
  72. +0 −2  figures/array-literal-w.rb
  73. +0 −1  figures/array-literal.ebnf
  74. BIN  figures/array-literal.png
  75. +0 −3  figures/asterisk-selector.rb
  76. +0 −8 figures/at_exit.rb
  77. +0 −4 figures/backticks.rb
  78. +0 −8 figures/bang-messages.rb
  79. +0 −8 figures/bare-selector-syntax.rb
  80. +0 −6 figures/basicobject-method-missing.rb
  81. +0 −1  figures/begin-block.ebnf
  82. BIN  figures/begin-block.png
  83. +0 −7 figures/begin-block.rb
  84. +0 −1  figures/begin-statement.ebnf
  85. BIN  figures/begin-statement.png
  86. +0 −1  figures/binary-literal.ebnf
  87. BIN  figures/binary-literal.png
  88. +0 −1  figures/block-literal.ebnf
  89. BIN  figures/block-literal.png
  90. +0 −9 figures/block-local-variables-example.rb
  91. +0 −9 figures/block-local-variables-nameerror.rb
  92. +0 −3  figures/block-local-variables-syntax.rb
  93. +0 −1  figures/block-parameter.ebnf
  94. BIN  figures/block-parameter.png
  95. +0 −1  figures/block.ebnf
  96. BIN  figures/block.png
  97. +0 −1  figures/boolean-and-operator.ebnf
  98. BIN  figures/boolean-and-operator.png
  99. +0 −1  figures/boolean-not-operator.ebnf
  100. BIN  figures/boolean-not-operator.png
  101. +0 −1  figures/boolean-or-operator.ebnf
  102. BIN  figures/boolean-or-operator.png
  103. +0 −1  figures/break-statement.ebnf
  104. BIN  figures/break-statement.png
  105. +0 −3  figures/break.rb
  106. +0 −1  figures/byte-escape.ebnf
  107. BIN  figures/byte-escape.png
  108. +0 −16 figures/call-selector.rb
  109. +0 −24 figures/case-cf-if.rb
  110. +0 −15 figures/case-statement-fig.rb
  111. +0 −1  figures/case-statement-syntax.ebnf
  112. BIN  figures/case-statement-syntax.png
  113. +0 −1  figures/catch-statement.ebnf
  114. BIN  figures/catch-statement.png
  115. +0 −18 figures/catch-throw.rb
  116. +0 −1  figures/character-escape.ebnf
  117. BIN  figures/character-escape.png
  118. +0 −6 figures/character-escape.rb
  119. +0 −1  figures/character-literal.ebnf
  120. BIN  figures/character-literal.png
  121. +0 −2  figures/class-keyword-open-class.rb
  122. +0 −2  figures/class-keyword.rb
  123. +0 −1  figures/class-new-constructor.rb
  124. +0 −12 figures/class-variables-inherited-top.rb
  125. +0 −9 figures/class-variables-inherited.rb
  126. +0 −10 figures/class-variables-overtaken.rb
  127. +0 −21 figures/class-variables-siblings.rb
  128. +0 −7 figures/closure-binding-nameerror.rb
  129. +0 −16 figures/closure-example.rb
  130. +0 −8 figures/closure-proc-binding.rb
  131. +0 −5 figures/closure-with-argument.rb
  132. +0 −1  figures/conditional-false.rb
  133. +0 −1  figures/conditional-true.rb
  134. +0 −9 figures/const-missing.rb
  135. +0 −1  figures/constant-reference.ebnf
  136. BIN  figures/constant-reference.png
  137. +0 −5 figures/constant-reference.rb
  138. +0 −1  figures/control-escape.ebnf
  139. BIN  figures/control-escape.png
  140. +0 −2  figures/control-escape.rb
  141. +0 −15 figures/custom-internal-iterator.rb
  142. +0 −1  figures/decimal-literal.ebnf
  143. BIN  figures/decimal-literal.png
  144. +0 −5 figures/def-explicit-return-multiple.rb
  145. +0 −1  figures/def-expression.ebnf
  146. BIN  figures/def-expression.png
  147. +0 −4 figures/def-implicit-return-multiple.rb
  148. +0 −3  figures/def-implicit-return.rb
  149. +0 −6 figures/def-instance-method.rb
  150. +0 −5 figures/def-non-ascii-name.rb
  151. +0 −6 figures/def-singleton-method.rb
  152. +0 −4 figures/define-method-class-eval.rb
  153. +0 −8 figures/define-method.rb
  154. +0 −4 figures/define-singleton-method.rb
  155. +0 −6 figures/defined.rb
  156. +0 −1  figures/digit.ebnf
  157. BIN  figures/digit.png
  158. +0 −1  figures/double-quoted-string-literal-q.ebnf
  159. BIN  figures/double-quoted-string-literal-q.png
  160. +0 −4 figures/double-quoted-string-literal-q.rb
  161. +0 −1  figures/double-quoted-string-literal.ebnf
  162. BIN  figures/double-quoted-string-literal.png
  163. +0 −3  figures/each-selector.rb
  164. +0 −5 figures/each_-selector.rb
  165. +0 −8 figures/element-set-selector.rb
  166. +0 −1  figures/else-clause.ebnf
  167. BIN  figures/else-clause.png
  168. +0 −1  figures/elsif-clause.ebnf
  169. BIN  figures/elsif-clause.png
  170. +0 −3  figures/empty-splat.rb
  171. +0 −1  figures/end-block.ebnf
  172. BIN  figures/end-block.png
  173. +0 −7 figures/end-block.rb
  174. +0 −1  figures/ensure-clause.ebnf
  175. BIN  figures/ensure-clause.png
  176. +0 −14 figures/ensure-clause.rb
  177. +0 −3  figures/equal-equal-selector.rb
  178. +0 −4 figures/equal-tilde-selector.rb
  179. +0 −1  figures/escape.ebnf
  180. BIN  figures/escape.png
  181. +0 −1  figures/exception-clauses.ebnf
  182. BIN  figures/exception-clauses.png
  183. +0 −1  figures/expression-list.ebnf
  184. BIN  figures/expression-list.png
  185. +0 −13 figures/extend-class.rb
  186. +0 −1  figures/float-literal.ebnf
  187. BIN  figures/float-literal.png
  188. +0 −1  figures/for-loop.ebnf
  189. BIN  figures/for-loop.png
  190. +0 −5 figures/for-loop.rb
  191. +0 −9 figures/global-variables.rb
  192. +0 −4 figures/greater-than-selector.rb
  193. +0 −1  figures/hash-literal.ebnf
  194. BIN  figures/hash-literal.png
  195. +0 −1  figures/hash-member.ebnf
  196. BIN  figures/hash-member.png
  197. +0 −9 figures/here-doc.ebnf
  198. BIN  figures/here-doc.png
  199. +0 −16 figures/here-doc.rb
  200. +0 −1  figures/hex-byte-escape.rb
  201. +0 −1  figures/hex-codepoint.ebnf
  202. BIN  figures/hex-codepoint.png
  203. +0 −1  figures/hex-digit.ebnf
  204. BIN  figures/hex-digit.png
  205. +0 −1  figures/hexadecimal-literal.ebnf
  206. BIN  figures/hexadecimal-literal.png
  207. +0 −1  figures/identifier.ebnf
  208. BIN  figures/identifier.png
  209. +0 −5 figures/if-statement-simple.rb
  210. +0 −6 figures/if-statement-with-else-clause.rb
  211. +0 −17 figures/if-statement-with-elsif-and-else-clause.rb
  212. +0 −1  figures/if-statement.ebnf
  213. BIN  figures/if-statement.png
  214. +0 −24 figures/implicit-receiver-for-private-method.rb
  215. +0 −5 figures/integer-downto-loop.rb
  216. +0 −1  figures/integer-downto.ebnf
  217. BIN  figures/integer-downto.png
  218. +0 −1  figures/integer-literal.ebnf
  219. +0 −3  figures/integer-times-loop.rb
  220. +0 −1  figures/integer-times.ebnf
  221. BIN  figures/integer-times.png
  222. +0 −5 figures/integer-upto-loop.rb
  223. +0 −1  figures/integer-upto.ebnf
  224. BIN  figures/integer-upto.png
  225. +0 −7 figures/kernel-binding-example.rb
  226. +0 −8 figures/lambda-keyword-examples.rb
  227. +0 −10 figures/lambda-literal-syntax.rb
  228. +0 −1  figures/lambda-literal.ebnf
  229. BIN  figures/lambda-literal.png
  230. +0 −7 figures/lambda-semantics-break.rb
  231. +0 −7 figures/lambda-semantics-return.rb
  232. +0 −3  figures/left-shift-selector.rb
  233. +0 −4 figures/less-than-selector.rb
  234. +0 −5 figures/local-variables-assignment.rb
  235. +0 −1  figures/local-variables-uninitialized.rb
  236. +0 −9 figures/local-variables.rb
  237. +0 −12 figures/loop-loop-fig.rb
  238. +0 −1  figures/loop-loop.ebnf
  239. BIN  figures/loop-loop.png
  240. +0 −10 figures/message-chaining-inconsistent-return-value.rb
  241. +0 −6 figures/message-chaining-return-self.rb
  242. +0 −4 figures/message-chaining.rb
  243. +0 −6 figures/message-expression-with-block.rb
  244. +0 −1  figures/message-expression.ebnf
  245. BIN  figures/message-expression.png
  246. +0 −6 figures/message-expressions-omit-parentheses-syntaxerror.rb
  247. +0 −1  figures/meta-escape.ebnf
  248. BIN  figures/meta-escape.png
  249. +0 −2  figures/meta-escape.rb
  250. +0 −9 figures/method-defined.rb
  251. +0 −7 figures/method-invocation-with-block-literal.rb
  252. +0 −21 figures/method-lookup.rb
  253. +0 −1  figures/method-parameter-list.ebnf
  254. BIN  figures/method-parameter-list.png
  255. +0 −4 figures/method-receiving-block-ref.rb
  256. +0 −5 figures/method-using-block-argument.rb
  257. +0 −5 figures/method-using-block-given-yield.rb
  258. +0 −5 figures/method-using-block-given.rb
  259. +0 −17 figures/module-attr-accessor.rb
  260. +0 −20 figures/module-function-extend-self.rb
  261. +0 −19 figures/module-function-private-method.rb
  262. +0 −16 figures/module-function.rb
  263. +0 −3  figures/module-namespace-and-mixin.rb
  264. +0 −30 figures/module-namespacing-extend-self.rb
  265. +0 −12 figures/module-namespacing.rb
  266. +0 −10 figures/module-new.rb
  267. +0 −1  figures/module.ebnf
  268. BIN  figures/module.png
  269. +0 −16 figures/named-arguments-with-defaults.rb
  270. +0 −16 figures/named-arguments-with-hash.rb
  271. +0 −1  figures/next-statement.ebnf
  272. BIN  figures/next-statement.png
  273. +0 −9 figures/next.rb
  274. +0 −4 figures/nil-guard.rb
  275. +0 −1  figures/non-symbol-character.ebnf
  276. BIN  figures/non-symbol-character.png
  277. +0 −1  figures/non-zero-digit.ebnf
  278. BIN  figures/non-zero-digit.png
  279. +0 −4 figures/not-operator-fig.rb
  280. +0 −8 figures/object-send-p.rb
  281. +0 −8 figures/object-send.rb
  282. +0 −9 figures/object-try.rb
  283. +0 −1  figures/octal-byte-escape.rb
  284. +0 −1  figures/octal-digit.ebnf
  285. BIN  figures/octal-digit.png
  286. +0 −1  figures/octal-literal.ebnf
  287. BIN  figures/octal-literal.png
  288. +0 −1  figures/operator-method-selector.ebnf
  289. BIN  figures/operator-method-selector.png
  290. +0 −11 figures/optional-arguments.rb
  291. +0 −1  figures/optional-parameter.ebnf
  292. BIN  figures/optional-parameter.png
  293. +0 −1  figures/optional-parameters.ebnf
  294. BIN  figures/optional-parameters.png
  295. +0 −14 figures/or-operator-fig.rb
  296. +0 −8 figures/parallel-assignment-equal.rb
  297. +0 −7 figures/parallel-assignment-more-lvalues.rb
  298. +0 −3  figures/parallel-assignment-more-rvalues.rb
  299. +0 −4 figures/parallel-assignment-one-lvalue.rb
  300. +0 −4 figures/parallel-assignment-one-rvalue-nil.rb
Sorry, we could not display the entire diff because too many files (742) changed.
View
4 .gitignore
@@ -1,3 +1,7 @@
*.swp
+*~
+\#*
+\.\#*
out/
+build/
examples/*.html
View
139 Rakefile
@@ -1,71 +1,91 @@
-require_relative 'lib/read-ruby'
-include ReadRuby
+# -*-Ruby-*-
+require 'pathname'
+
+# To validate locally, set the 'docbook' symlink to point to the system DocBook
+# schema/stylesheet directory. On Debian and its derivatives this is
+# /usr/share/xml/docbook/
+DOCBOOK_RNG = 'docbook/schema/rng/5.0/docbookxi.rng'
+BOOK_XML = 'book.xml'
+HTML_XSL = "xsl/html5.xsl"
+OUT_DIR = Pathname('out')
+EX_DIR = Pathname('examples/')
+PRISTINE_DIR = Pathname('www')
RSYNC_EXCLUDE = %w{*examples/*html apache.conf*}
-MINIFIER = {html: 'h5-min', css: 'yuicompressor', js: 'yuicompressor'}
-task :default => :html
+task :validate => [:relaxng, :nvdl, :h5_valid]
+
+task :relaxng do
+ sh "xmllint --xinclude --noout --relaxng #{DOCBOOK_RNG} #{BOOK_XML} 2>&1"
+end
+
+task :nvdl do
+ sh "onvdl #{BOOK_XML}"
+end
-desc 'Rebuild HTML, CSS, and JS'
task :html do
OUT_DIR.rmtree if OUT_DIR.exist?
cp_r PRISTINE_DIR, OUT_DIR
- Pathname.glob("#{OUT_DIR}/**/*").select(&:symlink?).each do |file|
- target, name = OUT_DIR.join(file.readlink), file
- next unless target.extname == TEMPLATE_EXT
- template = target.sub_ext('').basename.to_s
- rendered = begin
- Object.const_get(template.capitalize).new(name).tap do |o|
- o.fixup if o.respond_to?(:fixup)
- end.render
- rescue NameError => e
- Mustache.render(OUT_DIR.join(target).read)
- end
- name.unlink
- open(name, ?w){|f| f.print rendered}
+ sh "xsltproc --stringparam out_dir #{OUT_DIR.expand_path} " +
+ "--xinclude #{HTML_XSL} #{BOOK_XML} " +
+ " >#{IO::NULL}"
+end
+
+task :minify => :html do
+ OUT_DIR.each_child do |f|
+ next unless f.extname == '.html'
+ path = f.to_path
+ sh "h5-min #{path} >#{path}.min"
+ mv "#{path}.min", path
+ sh "gzip -cn #{path} >#{path}.gz"
end
end
-def minify ext
- OUT_DIR.find do |file|
- next unless file.file? and ext.include?(file.extname[1..-1])
- if min = MINIFIER[ :"#{file.extname[1..-1]}" ]
- sh "#{min} #{file} > #{file}.min"
- mv "#{file}.min", file
+task :h5_valid => :html do
+ OUT_DIR.each_child do |f|
+ next unless f.extname == '.html'
+ next if (path = f.to_path).include?('google')
+ results = `h5-valid #{f}`
+ unless $?.success?
+ warn "Invalid: #{f}"
+ warn results
end
- sh "gzip --best -cn #{file} > #{file}.gz"
end
end
-desc 'Minify HTML, CSS, and JS'
-task :minify do
- minify %w{js css}
- Rake::Task[:inline].invoke
- minify %w{html xml}
-end
+task :highlight => :html do
+ require 'nokogiri'
-[Example, Railroad].each do |klass|
- name = klass.to_s.downcase + 's'
- desc "Rebuild #{name}"
- task name.to_sym do
- klass.each do |source|
- file klass.target(source) => source do
- klass.generate source
- end.invoke
+ Pathname.glob("#{OUT_DIR}/*html").each do |html|
+ nok = Nokogiri::HTML(html.read)
+ next unless nok.at('code.ruby')
+ nok.css('code.ruby').each do |code|
+ ex = Pathname("#{EX_DIR}/#{code['id'].sub(/^ex\./,'')}.html")
+ next unless ex.exist?
+ code.parent.swap(ex.read)
end
+ open(html, ?w) {|f| f << nok.to_s}
end
end
-desc 'Push to GitHub'
-task :push do
- sh 'git push github'
-end
+CODERAY = %Q{
+ <a href=/examples/%s>
+ <pre class=ruby><code>%s</code></pre>
+ </a>
+}
-desc 'Rebuild everything & minify'
-task :all => [:examples, :railroads, :html, :minify]
+task :coderay do
+ require 'coderay'
-desc 'Rebuild everything then upload'
-task :upload => [:push, :all, :rsync]
+ EX_DIR.each_child do |pa|
+ next unless pa.extname == '.rb'
+ html = CodeRay.highlight_file(pa.to_path)
+ nok = Nokogiri::HTML(html)
+ open(pa.sub_ext('.html'), ?w) do |f|
+ f << CODERAY % [pa.basename, nok.at('pre').inner_html]
+ end
+ end
+end
desc 'Upload current build'
task :rsync do
@@ -75,22 +95,11 @@ task :rsync do
sh "rsync #{exclude} --delete -vazL out/ ruby:/home/public"
end
-desc 'Start webserver to browse locally'
-task :browse do
- system './lib/read-ruby/browse.rb'
-end
+desc 'Push to GitHub'
+task :push do
+ sh 'git push github'
+end
-desc 'Inline CSS'
-task :inline do
- OUT_DIR.each_child.select(&:file?).each do |file|
- if file.extname[1..-1] == 'html'
- nok = Nokogiri::HTML(file.read)
- nok.search('link[@rel=stylesheet]').each do |link|
- next unless link['href'].start_with?(?/)
- css = (OUT_DIR + (link['href'][1..-1] << '.css')).read
- link.swap("<style>#{css}</style>")
- end
- open(file, ?w){|f| f.print nok.to_s}
- end
- end
-end
+task :default => [:html, :highlight, :minify, :validate]
+
+# TODO: Add Sinatra integration back in
View
9 apache.mustache
@@ -1,9 +0,0 @@
-NameVirtualHost read-ruby:80
-
-<VirtualHost read-ruby:80>
- DocumentRoot {{{ dir }}}/out
- AccessFileName .htstatic
- <Directory />
- AllowOverride All
- </Directory>
-</VirtualHost>
View
34 book.xml
@@ -0,0 +1,34 @@
+<?xml version="1.0" encoding="utf-8"?>
+<book xmlns='http://docbook.org/ns/docbook'
+ xmlns:xi='http://www.w3.org/2001/XInclude'
+ xmlns:xlink="http://www.w3.org/1999/xlink"
+ xml:id="read-ruby"
+ xml:lang="en"
+ version="5.0">
+ <title>Read Ruby 1.9</title>
+ <!-- Chapters -->
+ <xi:include href="src/programs.xml"/>
+ <xi:include href="src/variables.xml"/>
+ <xi:include href="src/messages.xml"/>
+ <xi:include href="src/objects.xml"/>
+ <xi:include href="src/classes.xml"/>
+ <xi:include href="src/modules.xml"/>
+ <xi:include href="src/methods.xml"/>
+ <xi:include href="src/closures.xml"/>
+ <xi:include href="src/encoding.xml"/>
+ <xi:include href="src/enumerables.xml"/>
+ <xi:include href="src/files.xml"/>
+ <xi:include href="src/flow.xml"/>
+ <xi:include href="src/numerics.xml"/>
+ <xi:include href="src/strings.xml"/>
+ <xi:include href="src/regexps.xml"/>
+ <xi:include href="src/io.xml"/>
+
+ <xi:include href="src/bibliography.xml"/>
+
+ <!-- Appendices -->
+ <xi:include href="src/punctuation.xml"/>
+ <!-- Merge keywords.xml into glossary.xml -->
+ <!-- <xi:include href="src/keywords.xml"/> -->
+ <xi:include href="src/glossary.xml"/>
+</book>
View
41 chapter.mustache
@@ -1,41 +0,0 @@
-{{> preamble }}
-<link rel='index up' href=/toc>
-<title>{{{ title }}} (Read Ruby 1.9)</title>
-<link href=/c/chapter rel=stylesheet>
-{{#next_url}}
-<link href=/{{{ next_url }}} rel=next>
-{{/next_url}}
-
-<header>
- <h1>Read Ruby 1.9</h1>
-
- <nav>
- <ol>
- <li><a href=/toc>Read Ruby</a>
- <li>{{{ title }}} <span>(draft)</span>
- </ol>
- {{> search }}
- </nav>
-</header>
-
-<article>
- <header>
- <h1>{{{ title }}}</h1>
- {{#main_sections}}
- <nav>
- {{{ main_sections }}}
- </nav>
- {{/main_sections}}
- </header>
-
-{{{ article }}}
-</article>
-
-<footer>
- <a href=//github.com/runpaint/read-ruby>Text and figures</a> licensed under
- a <a rel=license
- href=//creativecommons.org/licenses/by-nc-sa/2.0/uk/>Creative Commons License</a>.
-</footer>
-
-<script src=//ajax.googleapis.com/ajax/libs/jquery/1.3.2/jquery.min.js></script>
-<script src=/ui.js></script>
View
3  chapter_css.mustache
@@ -1,3 +0,0 @@
-{{> css/main}}
-{{> css/chapter}}
-{{> css/syntax}}
View
463 chapters/classes.html
@@ -1,463 +0,0 @@
-<link rel=next href=/modules>
-<article>
- <h1 id=classes>Classes</h1>
-
- <!-- http://blog.rubybestpractices.com/posts/rklemme/018-Complete_Class.html -->
- <!-- What does it mean to dup or clone a class? -->
-
- <p>A <dfn id=class>class</dfn><a href=#fn-biological>†</a> is a
- <i>classification</i> of objects. It defines a set of methods, and can mint
- objects in its image.</p>
-
- <section>
- <h1 id=names>Names</h1>
-
- <!-- Mention Ruby's preference for flat hierarchies -->
-
- <p>A class<a href=#fn-non-ascii>†</a> is named with a <a
- href=/variables#constants>constant</a>, and this name can be retrieved
- as a <code>String</code> by <code>Class#name</code>. This does not apply
- to <a href=#anonymous>anonymous</a> classes, of course, which have a name
- of <code>nil</code>. Conventionally class names use camel-case
- capitalization: the initial letter of each word is capitalized, and spaces
- between the words are removed. For example:
- <code>RubyProgrammingLanguage</code> or <code>NutsAndBolts</code>. This
- distinguishes a class from a constant <i>qua</i> constant, as the latter
- is named in uppercase.
- </section>
-
- <section>
- <h1 id=inheritance>Inheritance</h1>
-
- <p>A class <dfn title=inheritance>inherits</dfn> behaviour and certain
- state-class variables and constants-from another class called its <dfn
- title=superclass>superclass</dfn>. The exception is
- <code>BasicObject</code>, which sits at the top of the class hierarchy.
- The default superclass is <code>Object</code>. Classes that inherit from a
- given class are its <dfn title=subclasses>subclasses</dfn>. A subclass is,
- therefore, a specialisation of its superclass.</p>
-
- <section>
- <h1 id=superclass>Superclass</h1>
-
- <p>The superclass is a <code>Class</code> object. It is typically specified as a
- constant literal, but can be any expression evaluating to the same. Once
- a class has been created, its superclass cannot be changed.
- <code>Class#superclass</code> returns the receiver’s superclass as a
- <code>Class</code> object, or <code>nil</code> if the receiver is
- <code>BasicObject</code>.
- </section>
-
- <section>
- <h1 id=ancestors>Ancestors</h1>
-
- <p>The <dfn>ancestors</dfn> of a class are the classes and modules it
- inherits from: its superclass and mixed-in modules, then their
- ancestors, and so on up until the root of the inheritance hierarchy.
- They are returned, in order, by <code>Module#ancestors</code> as an
- <code>Array</code> of <code>Class</code> objects.
- </section>
-
- <!-- http://github.com/rails/rails/blob/master/activesupport/lib/active_support/core_ext/class/removal.rb
- -->
-
- <p>Inheritance merely determines the initial behaviour of a class; the
- subclass can diverge by defining, redefining, or removing methods, or
- modifying state. It occurs because the method— and constant—lookup
- algorithms consider the superclass when unable to resolve a given name
- against the current class. It is worth stating explicitly that instance
- variables and class variables are <em>not</em> inherited. <a
- href=/references#refFLAN08 class=ref>Flanagan &amp; Matsumoto</a> (pp.
- 239–240) note a corollary: <q>If a subclass uses an instance variable with
- the same name as a variable used by one of its ancestors, it will
- overwrite the value of its ancestor’s variable.</q></p>
-
- <section>
- <h1 id=class-inherited><code>Class#inherited</code> Hook</h1>
-
- <p>If a class defines a singleton method named <code>:inherited</code>, it
- is invoked when the class is inherited with the subclass as its argument.
-
- <!-- TODO: Note order re:
- http://redmine.ruby-lang.org/issues/show/2793 -->
- </section>
- </section>
-
- <section>
- <h1 id=creation>Creation</h1>
-
- <section>
- <h1 id=class-keyword><code>class</code> Keyword</h1>
-
- <p>The <code>class <var>name</var> &lt; <var>superclass</var>…end</code>
- expression opens a class named <var>name</var>. If the constant
- <var>name</var> is already defined it must refer to an existing class,
- otherwise a <code>TypeError</code> is raised. If <var>name</var> was
- previously undefined, it is created to refer to a new <code>Class</code>
- object. The <code>&lt; <var>superclass</var></code> portion may be
- omitted, in which case the <var>superclass</var> defaults to
- <code>Object</code>. <var>superclass</var> may be any expression that
- evaluates to a <code>Class</code> object. The class body, which may be
- empty, is the elliptical region in the expression. It introduces a new
- context in which <code>self</code> refers to the class.</p>
-
- <figure class=left id=class-keyword-open-class.rb>
- <figcaption>Usage of the <code>class</code> keyword to <i>open</i>
- a class named <code>Dog</code>
- </figure>
-
- <section>
- <h1 id=reopening>Reopening Classes</h1>
-
- <p>If <code>class</code> is used with the name of a pre-existing class
- that class is <i>re-opened</i>. If a method is defined in a re-opened
- class with the same name as a pre-existing method in the same class
- the old method is overwritten with the new. Classes can be made
- immutable, effectively preventing them from being reopened by freezing
- the class object. Frozen classes raise <code>RuntimeError</code>s when
- methods are defined, or variables manipulated, in their context.</p>
-
- <figure class=left id=reopened-class.rb>
- <figcaption>Re-defining a method in an existing class
- </figure>
- </section>
- </section>
-
- <section>
- <h1 id=class-new><code>Class.new</code></h1>
-
- <p>The <code><var>name</var> = Class.new do…end</code> constructor may
- be used to similar effect. The principle difference being that existing
- classes are overwritten rather than reopened.</p>
-
- <figure class=left id=class-new-constructor.rb>
- <figcaption>Usage of the <code>Class.new</code> constructor to
- create a class named <code>Dog</code>
- </figure>
-
- <section>
- <h1 id=anonymous>Anonymous Classes</h1>
-
- <p>When a class is named with a constant it is accessible wherever
- that constant is in scope. If this behaviour is not desirable, a class
- can be made anonymous by assigning the value of <code>Class.new</code>
- to a local variable, thus restricting the class to the local scope.
- Subsequently assigning this variable to a constant, names the
- class.</p>
-
- <figure class=left id=anonymous-class.rb>
- <figcaption>Creating an anonymous class with
- <code>Class.new</code>
- </figure>
- </section>
- </section>
-
- <section>
- <h1 id=structs>Structs</h1>
-
- <p><code>Struct</code> is a class generator, particularly useful for
- classes that only need to wrap data. It it instantiated with a list
- of attribute names as <code>Symbol</code>s, and returns a
- <code>Class</code> object with accessors and writers for each
- attribute. The generated class can be instantiated with a list of
- arguments, which are assigned to the corresponding attributes.</p>
-
- <figure class=left id=struct-new.rb>
- <figcaption>Creating a class with <code>Struct.new</code>
- </figure>
-
- <p>A popular idiom is to create a class that inherits from a
- <code>Struct</code>: the <code>Struct</code> defines the simple
- attributes, and the class body adds behaviour/customisations.</p>
-
- <figure class=left id=struct-inheritance.rb>
- <figcaption>A class may inherit from a <code>Struct</code> to
- augment the structure’s behaviour
- </figure>
- <!-- OpenStruct -->
- </section>
-
- <section>
- <h1 id=nesting>Nesting</h1>
-
- <p>A class may be defined within the body of another class. The fully
- qualified name of the inner class is then
- <code><var>outer</var>::<var>inner</var></code>: the name of the
- enclosing class (<var>outer</var>) separated from that of the enclosed
- (<var>inner</var>) with the <a
- href=/variables#constants-references>scope operator</a>. This
- <dfn>nesting</dfn> behaviour is primarily used for namespacing, with <a
- href=/modules>modules</a> being an alternative. However, it does not
- affect inheritance: if the inner class is to inherit from the outer
- class, it must do so explicitly. The nesting of a class is returned as
- an <code>Array</code> of <code>Class</code> objects by
- <code>Module#nesting</code>, where the first element is the innermost
- class, and the last the outermost.
- </section>
- </section>
-
- <section>
- <h1 id=context>Context</h1>
-
- <p><code>Class#class_eval</code> takes a string or block which it
- evaluates in the receiver’s context, setting <code>self</code> to the
- receiver. The evaluated code can access the class’s state, invoke its
- singleton methods, and define methods. <code>Class#instance_exec</code> is
- similar, but accepts any number of arguments which it passes to the
- required block.
-
- <!-- TODO: Track
- http://blade.nagaokaut.ac.jp/cgi-bin/scat.rb/ruby/ruby-core/26774 in
- case this changes -->
- </section>
-
- <section>
- <h1 id=singleton>Singleton Classes</h1>
-
- <p>Every object is associated with two classes: that with which it was
- instantiated, and an anonymous class specific to the object: its
- <dfn>singleton class</dfn><a href=#fn-eigenclass>†</a>. That a singleton
- class is unique to a particular object means that methods defined within
- it-<a href=/methods#singleton>singleton</a> methods-are also unique to
- that object.
-
- <p>Further, an object’s class-i.e. the one which instantiated it-is the
- superclass of its singleton class. Upon receiving a message an object
- asks his singleton class for a method, the singleton class searches its
- instance methods and included modules, then repeats the query to his
- superclass. This process continues, recursively, up the inheritance
- hierarchy until a suitable method is located. Therefore, singleton methods
- override all others because the singleton class is the first place
- searched.
-
- <p>However, the singleton classes of <code>Class</code> objects behave
- slightly differently. Consider two classes, <var>c</var> and <var>p</var>.
- Now, if the superclass of <var>c</var> is <var>p</var>, then the
- superclass of the singleton class of <var>c</var> is the singleton class
- of <var>p</var>. This seemingly convoluted arrangement creates an
- inheritance hierarchy of singleton classes parallel to that of normal
- classes, allowing class methods to be inherited.</p>
-
- <figure id=class-singleton-class.rb>
- <figcaption>A class’s singleton class inherits from that of its
- superclass, therefore a class inherits its parent’s class methods.
- </figure>
-
- <p>The singleton class is a curious hybrid between class and module
- because although it has a superclass, it cannot be instantiated. However,
- the latter shortcoming is surely a blessing, as without it class
- hierarchies would be plexiform. Regardless, the abstractionists will
- delight in the fact that a singleton class has its own singleton class,
- <i>ad infinitum</i>&hellip;
-
- <p>Instances of the <code>Integer</code>, <code>Float</code>, and
- <code>Symbol</code> classes are the only objects not to have a
- singleton class; attempting to open one causes a <code>TypeError</code>
- to be raised.</p>
-
- <p>The <code>Kernel#singleton_class</code><a
- href=#fn-singleton-syntax>‡</a> method returns the receiver’s singleton
- class as a <code>Class</code> object. It is typically paired with
- <code>#class_eval</code> so as to operate within the context of the class.
- </section>
-
- <section>
- <h1 id=state>State</h1>
-
- <p>A class may store its state in <a href=/variables#class>class
- variables</a>, as discussed previously, however, due to the unpopular
- semantics of class variables, class instance variables may be used
- instead.</p>
-
- <!-- The canonical example of class variables is an object maintaing
- an initialization count. Show how to do this with class instance
- variables, e.g.
- self.class.class_eval{ defined?(@count) ? @count += 1 : @count = 1}
- Optionally with scaffolding to make accessing class variables from
- instance methods more palpable
- -->
- <section>
- <h1 id=class-instance-variables>Class Instance Variables</h1>
-
- <p>An instance variable used within a class definition, outside of an
- instance method, is a <dfn title="class instance variable">class
- instance variable</dfn>. It is not to be confused with a class
- variable. Both kinds of variables are associated with the class, as
- opposed to its instances. The primary advantage of class instance
- variables over class variables is that they don't exhibit the latter’s
- awkward sharing semantics: class instance variables are not shared with
- subclasses. However, class instance variables cannot be referenced in
- instance methods—as in that context they are normal instance
- variables—so are not necessarily appropriate substitutes.
-
- <p>Accessor methods can be created for class instance variables by using
- <code>Module#attr_accessor</code> and <code>Module#attr_writer</code>
- inside the class’s singleton class.</p>
-
- <figure class=left id=module-attr-accessor.rb>
- <figcaption>Accessors for a class’s class instance variables are
- created inside its singleton class with <code>attr_accessor</code>.
- </figure>
- <!-- TODO: Show more general approch for attr_ methods:
- Note:
- http://github.com/rails/rails/blob/master/activesupport/lib/active_support/core_ext/class/attribute_accessors.rb
- class Class
- def cattr_accessor(*syms)
- (class << self; self; end).instance_eval do
- attr_accessor *syms
- end
- end
- end
- -->
-
- </section>
- </section>
-
- <section>
- <h1 id=instances>Instances</h1>
-
- <p><code>ObjectSpace.each_object(<var>class</var>)</code> returns an
- <code>Enumerator</code> of a <var>class</var>’s instances.
- </section>
-
- <section>
- <h1 id=methods>Methods</h1>
-
- <p>The methods defined on a class can be listed with
- <code>Object#methods</code> or <code>Object#instance_methods</code>. They
- are returned as an <code>Array</code> of <code>Symbol</code>s. If either
- method is given an argument of <code>false</code>, superclass methods are
- omitted.</p>
-
- <section>
- <h1 id=method-defined><code>method_defined?</code> Predicate</h1>
-
- <p>The <code>Module#method_defined?</code> predicate accepts a method
- name as argument and returns <code>true</code> if the named instance
- method is defined on the receiver; <code>false</code> otherwise.
- <code>Module#public_method_defined?</code>,
- <code>Module#private_method_defined?</code>, and
- <code>Module#protected_method_defined?</code> behave in a similar
- fashion, but also require the named method to have the corresponding <a
- href=/methods#visibility>visibility</a> These predicates are clearly
- similar to <a href=/messages#responding><code>#respond_to?</code></a>
- but they differ as follows:
-
- <ul>
- <li>They test the instance methods of a class or module;
- <code>#respond_to?</code> tests the methods defined on its receiver.
- <li>They can only be used on classes or modules;
- <code>#respond_to?</code> with any object inheriting from
- <code>Object</code>.
- <li>They don’t consult <a
- href=/methods#respond-to-missing><code>#respond_to_missing?</code></a>—whereas
- <code>#respond_to?</code> does—which means that they don’t reflect
- methods defined with <a
- href=/methods#method-missing><code>method_missing</code></a>.
- <li>They return <code>true</code> for methods unimplemented on the
- user’s platform; <code>#respond_to?</code> behaves conversely.
- </ul>
-
- <figure class=left id=method-defined.rb>
- <figcaption>Contrasting <code>#method_defined?</code> with
- <code>#respond_to?</code>
- </figure>
-
- <p>Either approach is normally preferable to
- <code><var>object</var>.methods.include?(<var>selector</var>)</code>,
- which has all of the disadvantages of <code>#method_defined</code>,
- in addition to being more verbose and less efficient.
- </section>
- </section>
-
- <section>
- <h1 id=missing>Missing Classes</h1>
-
- <p>When a constant is used without being defined the enclosing class is
- sent a <code>:const_missing</code> message with the constant name as a
- <code>Symbol</code> argument. This is similar to
- <code>:method_missing</code>, but for classes.
- </section>
-
- <section>
- <h1 id=enumeration>Enumeration</h1>
-
- <p><code>ObjectSpace.each_object(Class)</code> enumerates all
- <code>Class</code> objects currently defined. Therefore, to enumerate the
- subclasses of a given class, this list must be filtered as shown in the
- figure below.</p>
-
- <!-- http://blade.nagaokaut.ac.jp/cgi-bin/scat.rb/ruby/ruby-core/18824 -->
- <figure id=enumerating-subclasses.rb class=left>
- <figcaption>Enumerating a class’s subclasses.
- </figure>
- </section>
-
- <section>
- <h1 id=type>Type</h1>
-
- <p><q>In many object-oriented languages, class names are used…for the
- type of objects generated from the class.</q> <a class=ref
- href=/references#refBRUCE02>Bruce</a> (p 20). <a
- href=/references#refKLAS95 class=ref>Klas &amp; Schrefl</a> concur:
- <q>A class…defines…the type of [its] instances</q> (p 10). Applying
- this notion to Ruby is problematic because while it is certainly
- possible for a method to dynamically <i>type check</i> its arguments
- with the <code>Kernel#is_a?(<var>class</var>)</code> predicate, this
- approach is both insufficient and unnecessary.
-
- <p>It is insufficient because an object’s class is not indicative of its
- suitability for a specific role. Class-based type checking rests on
- the premise that all objects of a given type will respond to the same
- messages in the same fashion. However, Ruby’s classes may be modified at
- will—allowing their methods to be redefined or removed—so two objects of
- the same class will not necessarily provide the same behaviour. Similarly,
- methods may be defined on, or removed from, individual objects, again
- breaking the assumption.
-
- <p>It is unnecessary because Ruby offers a superior, meritocratic
- alternative called “duck typing” (<a href=/references#refTHOM09
- class=ref>Thomas et al.</a>, p 372): if an object responds to the
- messages it would be sent in the course of a computation it
- constitutes suitable input. The yardstick is ability; not class.
-
- <p>An optimistic method simply assumes that its arguments are suitable;
- allowing them to raise a <code>NoMethodError</code> if sent a message they
- don't understand. This allows for particularly flexible <abbr
- title="Application Programming Interface">API</abbr>s at the cost of
- potentially obscure error messages for nonsensical arguments. More
- typically, the <code>Kernel#respond_to?(<var>selector</var>)</code>
- predicate is used to determine the suitability of an object. For instance,
- a method may raise an <code>ArgumentError</code> unless its argument
- responds to <code>:&lt;&lt;</code>. A refinement may to send an argument
- the appropriate <code>:try_convert</code> message, raising an exception if
- <code>nil</code> is returned.
- </section>
- <footer>
- <h1>Footnotes</h1>
-
- <ol>
- <li id=fn-biological>The term <i>class</i> is roughly analogous to its
- biological definition where it denotes a taxonomic rank, however this
- analogy does not extend to subclasses. That is, a subclass of a class is
- termed a <i>subclass</i>; not an <i>order</i>.
- <li id=fn-non-ascii>Class names containing non-ASCII characters cannot
- be referred to from source files with a different <a
- href=/programs#source-encoding>source encoding</a>. For example, a
- class name containing character the Greek small letter lamda (<code
- class=u title='λ'>U+03BB</code>) can only be referenced from source
- files using the UTF-8 source encoding. <!-- FIXME: What about other
- Unicode character sets? -->
- <li id=fn-eigenclass><a href=/references#refFLAN08>Flanagan &amp;
- Matsumoto</a> (2008, pp. 257–258) use the term <i>eigenclass</i>,
- instead, but the preferred nomenclature is now <i>singleton class</i>.
- See <a href=//redmine.ruby-lang.org/issues/show/1082>Feature #1082: add
- Object#singleton_class method</a> for the background.
- <li id=fn-singleton-syntax>Prior to Ruby 1.9.2 the peculiar <code>class
- &lt;&lt; <var>object</var>…end</code> construct— the
- <code>class</code> keyword followed by two less-than signs, an
- expression evaluating to an object, then the class body—was used to open
- the singleton class of <var>object</var>.
- </ol>
- </footer>
-</article>
View
524 chapters/closures.html
@@ -1,524 +0,0 @@
-<link rel=next href=/flow>
-<!-- TODO
- * Closures as thunks
- * Example of creating custom control structures:
- cond(conditional) ->{ ... }
- or
- cond {
- conditional1 => ->{ },
- conditional2 => true,
- }
-
-
-def where(args)
- ->(file) do
- stat = File.stat file
- args.all?{|k, v| stat.send(k) == v}
- end
-end
-Dir.glob('/etc/*').select &where(uid: 0, zero?: true)
--->
-<article>
- <h1 id=closures>Closures</h1>
-
- <p><q cite=/references#refGRAHAM96>A closure is a combination of a
- function and an environment.</q> (<a href=/references#refGRAHAM96
- class=ref>Graham</a>, 1996, pp. 107–109). The function is a parametrised
- block of executable code, and the <q
- cite=/references#refCOTT06>referencing environment</q> (<a
- href=/references#refSCOTT06 class=ref>Scott</a>, 2006, pp. 138–140), or
- <a href=#binding><dfn>binding</dfn></a>, is a
- reference to the lexical environment of the closure’s creation site. The
- binding represents its variables as references, which are de-referenced in
- the environment the closure is called, every time it is called. The figure
- below provides the typical demonstration of this concept:</p>
-
- <figure id=closure-example.rb class=left>
- <figcaption>A closure encapsulates a block of code and its associated
- binding
- </figure>
-
- <p>A closure is an instance of the <code>Proc</code> class, which provides
- methods for calling the closure and accessing its binding. The following
- example shows a closure being called with <code>Proc#[]</code> and an
- <a href=#parameters>argument</a>.</p>
-
- <figure id=closure-with-argument.rb class=left>
- <figcaption>A closure may accept arguments from its calling environment
- </figure>
-
- <!--
- This section is awkward. It does not describe a typical literal because
- it can't standalone. It applies to message expressions, certain
- keywords, and this section. Maybe introduce it in Programs?
- -->
- <section>
- <h1 id=procs><code>Proc</code> Literals</h1>
-
- <p>A block literal creates a <code>Proc</code> object which
- accepts the arguments provided in the optional parameter list
- and represents a sequence of zero or more statements. Unlike
- the other literals in this section, block literals must not
- appear in the top-level context; they must either terminate a
- message expression, an appropriate keyword expression, or
- lambda literal.</p> <!-- Is this restriction accurate? -->
-
- <!-- TODO: Block local variable declarations -->
- <figure class=railroad class=left>
- <img id=required-parameters.png>
- <img id=optional-parameter.png>
- <img id=optional-parameters.png>
- <div style=width:590px class=centre>
- <img class=inline id=rest-parameter.png>
- <img class=inline id=block-parameter.png>
- </div>
- <img id=parameter-list.png>
- <img id=block.png>
- <img id=block-literal.png>
- <figcaption>Syntax diagram of the block literal
- </figure>
-
- </section>
-
- <section>
- <h1 id=semantics>Semantics</h1>
-
- <p>A <code>Proc</code> has either <a href=#proc-semantics>proc
- semantics</a> or <a href=#lambda-semantics>lambda semantics</a>. Its
- semantics determine how it handles unexpected arguments and control flow
- statements, such as <code>return</code>, appearing within the body of
- the closure. The differences are summarised in the following table, and
- elaborated below.</p>
-
- <table id=closures-semantics-table>
- <caption>A comparison of <code>Proc</code> semantics</caption>
- <tr>
- <th></th>
- <th>Lambda
- <th>Proc
- </tr>
- <tr>
- <th>Extra arguments
- <td><a href=#lambda-semantics-arguments>Raise <code>ArgumentError</code></a>
- <td><a href=#proc-semantics-arguments>Ignored</a>
- </tr>
- <tr>
- <th>Omitted arguments
- <td><a href=#lambda-semantics-arguments>Raise <code>ArgumentError</code></a>
- <td><a href=#proc-semantics-arguments>Assigned <code>nil</code></a>
- </tr>
- <tr>
- <th><code>Array</code> arguments
- <td><a href=#lambda-semantics-arguments>Never exploded</a>
- <td><a href=#proc-semantics-arguments>Exploded if necessary</a>
- </tr>
- <tr>
- <th><code>return</code>
- <td><a href=#lambda-semantics-return>Returns from the lambda itself</a>
- <td><a href=#proc-semantics-return>Returns from the creation site
- method</a>
- </tr>
- <tr>
- <th><code>break</code>
- <td><a href=#lambda-semantics-break>Returns from the lambda itself</a>
- <td><a href=#proc-semantics-break>Returns from the call site method</a>
- </tr>
- </table>
-
- <section>
- <h1 id=lambda-predicate><code>#lambda?</code> Predicate</h1>
-
- <p><code>Proc#lambda?</code> is a predicate which returns
- <code>true</code> if the receiver has lambda semantics;
- <code>false</code> if it has proc semantics.
- </section>
-
- <section>
- <h1 id=proc-semantics>Proc Semantics</h1>
-
- <section>
- <h1 id=proc-semantics-arguments>Argument Passing</h1>
-
- <p>A <code>Proc</code> with proc semantics interprets the
- arguments it receives with <a href=/flow#yield-arguments>yield
- semantics</a>.
- </section>
-
- <section>
- <h1 id=proc-semantics-control-flow>Control Flow Statements</h1>
-
- <section>
- <h1 id=proc-semantics-return><code>return</code></h1>
-
- <p>Returns from the lexically enclosing method of the
- <code>Proc</code>’s <em>creation</em> site.</p>
-
- <figure id=proc-semantics-return.rb class=left>
- <figcaption>Within a <code>Proc</code> with proc
- semantics, <code>return</code> jumps to the <code>Proc</code>
- creation site
- </figure>
-
- <p>If the <code>Proc</code> was not created within a method,
- e.g. at the top level, or the method has already returned, a
- <code>LocalJumpError</code> is raised.</p>
-
- <figure id=proc-semantics-return-localjump.rb class=left>
- <figcaption>A <code>LocalJumpError</code> is raised
- when a <code>Proc</code> with proc semantics tries to
- <code>return</code> without an enclosing method.
- </figure>
- </section>
-
- <section>
- <h1 id=proc-semantics-break><code>break</code></h1>
-
- <p>Returns from the lexically enclosing method of the
- <code>Proc</code>’s <em>call</em> site.</p>
-
- <figure id=proc-semantics-break.rb class=left>
- <figcaption>Within a <code>Proc</code> with proc semantics,
- <code>break</code> jumps to the <code>Proc</code> call site
- </figure>
-
- <p>A <code>LocalJumpError</code> is raised if
- <code>break</code> is used from a block no longer in scope,
- e.g. at the top-level of a block created with
- <code>Proc.new</code> or <code>proc</code>.</p>
-
- <figure id=proc-semantics-break-localjump.rb class=left>
- <figcaption>A <code>Proc</code> created with
- <code>proc</code> or <code>Proc.new</code> cannot use
- <code>break</code> at the top level
- </figure>
- </section>
- </section>
- </section>
-
- <section>
- <h1 id=lambda-semantics>Lambda Semantics</h1>
-
- <section>
- <h1 id=lambda-semantics-arguments>Argument Passing</h1>
-
- <p>A <code>Proc</code> with lambda semantics interprets the
- arguments it receives with <dfn>invocation semantics</dfn>:
- according to the same rules as method invocation. This has the
- following implications:</p>
-
- <ul>
- <li>Superfluous arguments cause an <code>ArgumentError</code> to
- be raised.
- <li>Omitted arguments cause an <code>ArgumentError</code> to be
- raised.
- <li><code>Array</code> arguments are <em>not</em> automatically exploded.
- </ul>
- </section>
-
- <section>
- <h1 id=lambda-semantics-control-flow>Control Flow Statements</h1>
-
- <section>
- <h1 id=lambda-semantics-return><code>return</code></h1>
-
- <p>Returns from the <code>Proc</code> as if it were a method.</p>
-
- <figure id=lambda-semantics-return.rb class=left>
- <figcaption>Within a <code>Proc</code> with lambda semantics,
- <code>return</code> returns from the <code>Proc</code> itself
- </figure>
- </section>
-
- <section>
- <h1 id=lambda-semantics-break><code>break</code></h1>
-
- <p>Acts exactly like <code>return</code>.</p>
-
- <figure id=lambda-semantics-break.rb class=left>
- <figcaption>Within a <code>Proc</code> with lambda semantics,
- <code>break</code> returns from the <code>Proc</code> itself
- </figure>
- </section>
- </section>
- </section>
-
- <section>
- <h1 id=control-flow>Control Flow</h1>
-
- <p>Control flow statements other than <code>break</code> or
- <code>return</code> operate in the same way for both kinds of
- <code>Proc</code>s.</p>
-
- <dl>
- <dt><code>next</code>
- <dd>Returns its arguments to the <a
- href=/flow#yield><code>yield</code> statement</a> or method that
- invoked the <code>Proc</code>.
- <dt><code>redo</code>
- <dd>Jump to the beginning of the <code>Proc</code>
- <dt><code>retry</code>
- <dd>Always raises a <code>LocalJumpError</code>
- <dt><code>raise</code>
- <dd>Propagates the exception up the call stack: through any
- enclosing block, then to the invoking method.
- </dl>
- </section>
- </section>
-
- <section>
- <h1 id=creation>Creation</h1>
-
- <section>
- <h1 id=proc-new><code>Proc.new</code></h1>
-
- <p><code>Proc.new</code> creates a <code>Proc</code> with proc
- semantics from the given block.</p>
-
- <p>If the block is omitted, the block with which the lexically
- enclosing method was invoked is used in its place. If the method was
- not invoked with a block, or there is not an enclosing method, an
- <code>ArgumentError</code> is raised.
- </section>
-
- <section>
- <h1 id=proc-keyword><code>proc</code> keyword</h1>
-
- <p>The <code>proc</code> keyword is a synonym for <a
- href=#proc-new><code>Proc.new</code></a>: it creates a
- <code>Proc</code> with proc semantics from the given block. Without a
- block argument an <a href=#creation-with-implicit-block>implicit
- block</a> is assumed.
- </section>
-
- <section>
- <h1 id=ampersand-prefixed-parameter>&<var>parameter</code></h1>
-
- <p>A method or lambda whose parameter list includes an identifier
- prefixed with an ampersand, assigns to the parameter a
- <code>Proc</code> with proc semantics representing the block literal
- that the method/lambda was sent. For more details, see <a
- href=/methods#block-arguments>Methods: Block Arguments</a>, which
- includes an <a
- href=/methods#method-using-block-arguments>example</a>.
- </section>
-
- <section>
- <h1 id=lambda-keyword><code>lambda</code> keyword</h1>
-
- <p>The <code>lambda</code> keyword creates a <code>Proc</code> with
- lambda semantics from the given block. Without a
- block argument an <a href=#creation-with-implicit-block>implicit
- block</a> is assumed.</p>
-
- <figure id=lambda-keyword-examples.rb class=left>
- <figcaption><code>lambda</code> takes a block with which it creates
- a <code>Proc</code> with lambda characteristics
- </figure>
- </section>
-
- <section>
- <h1 id=lambda-literal>Lambda Literal (<code>->(){}</code>)</h1>
-
- <figure class=railroad class=left>
- <img id=lambda-literal.png>
- <figcaption>Syntax diagram of the lambda literal
- </figure>
-
- <p>A literal of the form
- <code>-&gt;(<var>parameter<sub>0</sub></var></code>…<code><var>parameter<sub>n</sub></var>)
- { <var>statements</var> }</code> instantiates a <code>Proc</code>
- object with lambda characteristics. The optional parameter list takes
- the same form as that used in method definitions <!-- link -->. It may
- be omitted entirely. <var>statements</var> is zero or more
- statements. For example, <code>-&gt; { 42 }</code>, or <code>-&gt;(a,
- b) { a + b }</code>.</p>
-
- <figure id=lambda-literal-syntax.rb class=left>
- <figcaption><code>-&gt;(){}</code> creates a <code>Proc</code> with
- lambda characteristics
- </figure>
- </section>
- </section>
-
- <section>
- <h1 id=calling>Calling</h1>
-
- <p>A <code>Proc</code> can be invoked in the following ways:</p>
-
- <dl>
- <dt><code>Proc#call(<var>arg</var><sub>0</sub>,…,<var>arg</var><sub>n</sub>)</code>
- <dd>Also invoked with the syntax below.
- <dt><code><var>proc</var>.(<var>arg</var><sub>0</sub>,…,<var>arg</var><sub>n</sub>)</code>
- <dd>A syntactical shortcut for <code>Proc#call</code>. The
- parentheses are mandatory, even if there are no arguments.
- <dt><code>Proc#yield(<var>arg</var><sub>0</sub>,…,<var>arg</var><sub>n</sub>)</code>
- <dd>An instance method with the selector <code>:yield</code>;
- distinct from the <code>yield</code> keyword.
- <dt><code>Proc#[<var>arg</var><sub>0</sub>,…,<var>arg</var><sub>n</sub>]</code>
- <dd>The square brackets are mandatory, even if there are no arguments.
- <dt><code>Proc#=== <var>arg</var></code>
- <dd>Allows <code>Proc</code>s to be used in
- <code>case</code> expressions. It requires exactly one argument, so is
- unsuitable for a <code>Proc</code> with <a
- href=#lambda-semantics>lambda semantics</a> that has an arity other than 1.
- </dl>
-
- <figure id=proc-calling.rb class=left>
- <figcaption>Syntax for invoking a <code>Proc</code>
- </figure>
- </section>
-
- <section>
- <h1 id=parameters>Parameters</h1>
-
- <p>A <code>Proc</code> may be defined with a parameter list, which
- describes the arguments it accepts. The syntax of the parameter list
- content mostly mirrors that of <a href=/methods#method-arguments>method
- parameter lists</a>, with the following differences:</p>
-
- <ul>
- <li>It is enclosed within a pair vertical lines (<code>|</code>),
- rather than parentheses, which are mandatory if a parameter list is
- specified.
- <li>It is specified as the first element of the block associated with
- the <code>Proc</code>: after the opening curly bracket or the
- <code>do</code> keyword.
- <li>If the <a href=#lambda-literal>lambda literal</a> syntax is used,
- the vertical lines must be omitted and the parameter list must be
- specified within the parentheses following <code>-&gt;</code> à la
- method parameter lists; <em>not</em> in the block.
- <li>The parameter list of a closure with <a href=#proc-semantics>proc
- semantics</a> may include a trailing comma after the last parameter.
- This has no syntactical meaning, but serves to indicate that
- additional arguments are explicitly ignored.
- </ul>
-
- <figure id=proc-parameters.rb class=left>
- <figcaption>The parameter list for a <code>Proc</code> is enclosed
- within vertical lines
- </figure>
-
- <section>
- <h1 id=block-local-variables>Block-Local Variables</h1>
-
- <p>A closure may define <dfn>block-local variables</dfn>: local
- variables which are distinct from those with the same name in an outer
- lexical scope.</p>
-
- <p>Block-local variables are defined in the parameter list after the
- non-block-local parameters, and before the closing vertical line. They
- are specified as a comma-separated list of identifiers, with a
- semicolon preceding the first:
- <code>|<var>param</var><sub>0</sub>,…,<var>param</var><sub>n</sub>;<var>block-local</var><sub>0</sub>,…,<var>block-local</var><sub>n</sub>|</code>.
- The semicolon is mandatory, even if the list of block-local variables
- is not preceded by any regular parameters.</p>
-
- <p>In the case of <a href=#lambda-literal>lambda literals</a>,
- block-local variables are specified in the same manner before the
- closing parentheses of the parameter list, i.e.
- <code>-&gt;(<var>param</var><sub>0</sub>,…,<var>param</var><sub>n</sub>;<var>block-local</var><sub>0</sub>,…,<var>block-local</var><sub>n</sub>){}</code>.</p>
-
- <figure id=block-local-variables-syntax.rb class=left>
- <figcaption>Block-local variables are specified after regular
- parameters and preceded by a semicolon
- </figure>
-
- <p>If a variable <var>v</var> is defined block-local:
- <ol>
- <li>If <var>v</var> was defined in an outer scope, its value is saved.
- <li>Within the block <var>v</var> is assigned <code>nil</code>,
- then behaves as any other local variable.
- <li>Upon leaving the block, <var>v</var> is assigned the value it
- had originally in the outer scope.
- </ol>
- </p>
-
- <figure id=block-local-variables-example.rb class=left>
- <figcaption>Block-local variables are are distinct from those with
- the same name in an outer lexical scope
- </figure>
-
- <p>By contrast, if <var>v</var> is not defined as block-local, it
- retains the value it was assigned inside the block, even after leaving
- the block scope. However, defining a variable, <var>w</var>, inside
- the block which did not exist in the outer scope, does not define it
- in the outer scope. In both examples, <var>w</var> is undefined upon
- leaving the block.</p>
-
- <figure id=non-block-local-variables-example.rb class=left>
- <figcaption>Non-block-local variables defined prior to the block
- retain the value they were assigned inside the block even after
- leaving it
- </figure>
- </section>
- </section>
-
- <section>
- <h1 id=binding>Binding</h1>
-
- <p>We have <a href=#closures>already introduced</a> the concept of a
- binding as a reference to the closure’s referencing environment. We have
- demonstrated that the binding is dynamic, resolving variables referenced
- within a closure relative to the environment in which it was called. An
- implication is that these variables must be defined in the closure
- itself or exist in the closure’s environment prior to its creation: they
- can be modified or re-assigned subsequently, but they must have been
- assigned.</p>
-
- <figure id=closure-binding-nameerror.rb class=left>
- <figcaption>Variables referenced, but not defined, by a closure must
- have exist in its environment at creation time
- </figure>
-
- <p>A closure is <q>self-contained: they contain everything the procedure
- needs in order to be applied.</q> (<a href=/references#refFRIEDMAN08
- class=ref>Friedman &amp; Wand</a>, 2008, pp. 79–82). Therefore, the
- binding must also <q>…hold all the information necessary to execute a
- method, such as the value of <code>self</code>, and the block, if any,
- that would be invoked by a <a
- href=/flow#yield><code>yield</code></a>.</q> (<a
- href=/references#refFLAN08 class=ref>Flanagan &amp; Matsumoto</a>,
- 2008, pp. 202–203).
-
- <p>A closure’s binding is encapsulated by a <code>Binding</code> object,
- which is obtained with <code>Proc#binding</code>. It can then be used to
- execute other code in the same environment with a method such as
- <code>eval</code>.</p>
-
- <figure id=closure-proc-binding.rb class=left>
- <figcaption><code>Proc#binding</code> returns a closure’s binding,
- allowing other code to be executed in its context
- </figure>
-
- <section class=note>
- <h1 id=kernel-binding><code>Kernel.binding</code></h1>
-
- <p><code>Kernel.binding</code> returns a <code>Binding</code> object
- representing the referencing environment at the time the method is
- invoked. That is, it generalises the concept of bindings to any
- object.</p>
-
- <figure id=kernel-binding-example.rb class=left>
- <figcaption><code>Kernel.binding</code> returns the binding of the
- call site
- </figure>
- </section>
- </section>
-
- <section>
- <h1 id=methods>Methods</h1>
-
- <p>A closure can be converted to a method with <a
- href=/methods#dynamic-definition><code>Module#define_method</code></a>.
- Likewise, a <a href=/methods#method-objects><code>Method</code>
- object</a> can be converted to a <code>Proc</code> with
- <code>Method#to_proc</code>.
-
- <p>However, <code>Method</code> objects are not closures: they do not
- have access to local variables in their parent scope. <q>The only
- binding retained by a <code>Method</code> object, therefore, is the
- value of <code>self</code>…</q> (<a class=ref
- href=/references#refFLAN08>Flanagan &amp; Matsumoto</a>, 2008, pp.
- 203–204)
- </section>
-</article>
View
711 chapters/encoding.html
@@ -1,711 +0,0 @@
-<link rel=next href=/strings>
-<article>
- <h1 id=encoding>Encoding</h1>
-
- <p>An encoding is a mapping between byte sequences and characters<a
- href=#fn-encoding-dfn>†</a>. Each program source file,
- <code>String</code>, <code>Symbol</code>, <code>Regexp</code>,
- <code>File</code>, and <code>IO</code> object is, relatively independently,
- associated with its own encoding. This <a
- href=/strings#associate>association</a> is simply a statement, that has
- either been made explicitly about a specific object, or derived from a
- corresponding default encoding. It may even be spurious.
-
- <p>The encoding associated with a source file-the <a href=#source><i>source
- encoding</i></a>-is by default US-ASCII. If a source file contains
- characters outside of this encoding, it must specify which one, otherwise
- Ruby refuses to load it.
-
- <p>The encoding associated with <a
- href=/strings#encoding><code>String</code></a>s, <a
- href=/strings#symbol-encoding><code>Symbol</code></a>s, and <a
- href=/regexps#encoding><code>Regexp</code></a>s, is by default the source
- encoding of the file in which they are contained. However, if their literals
- contain certain character escapes, this is changed implicitly. As with
- source files, this association can be overridden on a per-object basis.
-
- <p>An <code>IO</code> or <code>File</code> object represents an external
- data stream, whose encoding is termed its <a href=#external><i>external
- encoding</i></a>. Data read from a stream is associated with this
- encoding. Unless set explicitly, it defaults to an encoding inferred from
- the user’s environment. Both types of object <em>may</em> also be associated
- with an <a href=#internal><i>internal encoding</i></a>: that which the
- programmer desires it to have. If set explicitly, data read from the stream
- is transcoded to the internal encoding; while data written to it is
- transcoded to the external encoding. The internal encoding is never inferred
- or derived, so by default no transcoding occurs.
-
- <p><a href=#transcoding><i>Transcoding</i></a> is quite distinct from mere
- <i>association</i>. Whereas the latter changed an attribute of an object,
- the former converts its contents: translating its underlying bytes to their
- equivalent representation in another encoding. This chapter discusses both
- topics, but it is essential to be cognizant of their difference.</p>
-
- <section>
- <h1 id=class>The <code>Encoding</code> Class</h1>
-
- <p>Ruby represents the encodings that she understands as instances of
- the <code>Encoding</code> class, defining each as a constant under the
- <code>Encoding</code> namespace. The constant is named after the
- upper-case encoding name, with low lines replacing hyphen-minus
- characters. For example, <code>Encoding::UTF_8</code> or
- <code>Encoding::Windows_1250</code>. Given an encoding name as a
- <code>String</code>, the corresponding <code>Encoding</code> object may
- be retrieved with <code>Encoding.find(<var>name</var>)</code>.
-
- <p>A list of built-in encodings may be retrieved as an <code>Array</code>
- of <code>Encoding</code> objects with the <code>Encoding.list</code>
- method. <code>Encoding.aliases</code> returns a <code>Hash</code> whose
- keys are encoding aliases, and values are the corresponding built-in
- encoding. Methods that expect encodings as arguments accept instances of
- <code>Encoding</code>, or <code>String</code>s naming a built-in encoding
- or its alias. The <code>Encoding</code> object associated with a
- <code>String</code>, <code>Symbol</code>, or <code>Regexp</code> is
- returned by their <code>#encoding</code> method.
- </section>
-
- <section>
- <h1 id=source>Source Encoding</h1>
-
- <p>The <dfn>source encoding</dfn> is the character encoding of a given
- source file. It is US-ASCII by default. A <code>SyntaxError</code> is
- raised when a source file contains one or more characters invalid in the
- source encoding.
-
- <p>A file’s source encoding may be specified inline by means of a
- <dfn>coding comment</dfn>: a specially formatted comment that declares the
- encoding of the lines that follow. If omitted, the default source encoding
- is assumed. If a source file contains a <a
- href=/programs#shebang>shebang</a> line, the coding comment must appear
- on the second line; otherwise it must appear on the first.</p>
-
- <p>The coding comment is a US-ASCII string which begins with a number
- sign (<code class=u title='#'>U+0023</code>) and contains<a
- href=#fn-coding-contain>†</a> the string <code>coding</code>
- followed by an equals sign (<code class=u title='='>U+003D</code>) or
- colon (<code class=u title=':'>U+003A</code>) then the name of the
- source encoding. The encoding name is one of those returned by
- <code>Encoding.name_list</code> written in a case insensitive fashion.
-
- <p>The source encoding of the currently executing code can be obtained
- with the <a href=/programs#encoding><code>__ENCODING__</code></a>
- keyword.</p>
-
- <figure id=coding-comment.rb>
- <figcaption>Setting and querying the source encoding.
- </figure>
- </section>
-
- <section>
- <h1 id=external>External Encoding</h1>
-
- <p>The encoding of the data in an <code>IO</code> stream is known by Ruby
- as the object’s <dfn>external encoding</dfn>. Every <code>IO</code> object
- has an external encoding, so data read from it will be associated with the
- same. Ruby infers the default external encoding with the following steps<a
- href=#fn-external-encoding-algo>*</a>, stopping as soon as she finds one
- which is usable:
-
- <dl>
- <dt><code>Encoding.default_external=</code>
- <dd>If an <code>Encoding</code> object, or name, been assigned to
- <code>Encoding.default_external=</code>, that is the default external
- encoding.
- <dt>Interpreter’s <code>-E</code> switch
- <dd>If the Ruby interpreter was invoked with an
- <code>-E<var>encoding</var></code> option, <var>encoding</var> is the
- default external encoding.
- <dt>Locale encoding
- <dd>Use the encoding derived from the user’s environment, as explained
- below.
- </dl>
-
- <section>
- <h1 id=locale>Locale Encoding</h1>
-
- <p>The <dfn>locale encoding</dfn> is an encoding inferred from the
- user’s environment that Ruby supports. It is determined in two distinct
- stages, the first of which is to interrogate the user’s environment for
- his preferred encoding:
-
- <ol>
- <li>Inspect relevant environment variables, e.g.
- <code>LANG</code>, <code>LC_CTYPE</code>, or <code>LC_ALL</code>
- <li>Or, if the platform is Windows or Cygwin, by invoking C’s
- <code>nl_langinfo_codeset()</code> or Windows’ <a
- href=//msdn.microsoft.com/en-us/library/ms683162(VS.85).aspx><code>GetConsoleCP()</code></a>
- function.
- </ol>
-
- <p>The result of this search, or <code>nil</code> if it failed, is the
- <dfn>locale charmap</dfn> encoding, and is assigned to
- <code>Encoding.locale_charmap</code>. Ruby must now correlate this
- encoding with the encodings she supports to determine the <dfn>locale
- encoding</dfn>:
-
- <ol>
- <li>If the locale charmap encoding is known to Ruby-that
- is, <code>Encoding.find(Encoding.locale_charmap)</code> returns an
- <code>Encoding</code> object-that becomes the locale encoding.
- <li>If the locale charmap encoding couldn’t be determined, the locale
- encoding is US-ASCII.
- <li>Otherwise, the locale encoding is ASCII-8BIT.
- </ol>
- </section>
-
- <!--
- $ LANG=braile ruby -e 'p [Encoding.locale_charmap, Encoding.find("locale")]'
- ["ANSI_X3.4-1968", #<Encoding:US-ASCII>]
- -->
-
- <p>By default, as the name suggests, all <code>IO</code> objects have the
- default external encoding as their external encoding. However, this may
- also be set on a per-stream basis by specifying an external encoding when
- <a href=/io#open>opening</a> an I/O stream, or with
- <code>IO#set_encoding(<var>encoding</var>)</code>, where
- <var>encoding</var> is an encoding name or <code>Encoding</code> object.
- The external encoding of a stream may be queried with
- <code>IO#external_encoding</code>, which returns the corresponding
- <code>Encoding</code> object. Note, however, that if the stream is in
- write-only mode, and wasn’t explicitly assigned an external encoding, this
- method returns <code>nil</code>.
- </section>
-
- <section>
- <h1 id=internal>Internal Encoding</h1>
-
- <p>Optionally, an <code>IO</code> object may also be associated with an
- <dfn>internal encoding</dfn>. This is the encoding that the programmer
- wishes to use with the data in a stream. The default value of an
- <code>IO</code> object’s internal encoding is equal to the <dfn>default
- internal encoding</dfn> (<code>Encoding.default_internal</code>), which
- is determined as follows:</p>
-
- <dl>
- <dt><code>Encoding.default_internal=</code>
- <dd>If an <code>Encoding</code> object, or name, been assigned to
- <code>Encoding.default_internal=</code>, that is the default internal
- encoding.
- <dt>Interpreter’s <code>-E</code> switch
- <dd>If the Ruby interpreter was invoked with an
- <code>-E<var>external</var>:<var>internal</var></code> or
- <code>-E:<var>internal</var></code> option, where both
- <var>external</var> and <var>internal</var> are valid encoding names,
- the default internal encoding is <var>internal</var>.
- <dt>Interpreter’s <code>-U</code> switch
- <dd>If the interpreter was invoked with the <code>-U</code> switch, the
- default internal encoding is UTF-8
- </dl>
-
- <p>If all the above steps failed, the default internal encoding is
- <code>nil</code>. Therefore, unlike the default external encoding which is
- inferred automatically from one’s locale, the default internal encoding is
- <code>nil</code> unless set explicitly.
-
- <p>The internal encoding of an <code>IO</code> may be changed from this
- default on a per-stream basis by specifying an internal encoding when <a
- href=#open>opening</a> the stream, or with
- <code>IO#set_encoding(<var>external</var>, <var>internal</var>)</code>,
- where both <var>external</var> and <var>internal</var> are encoding names
- or <code>Encoding</code> objects. Alternatively,
- <code>IO#set_encoding</code> may be given a <code>String</code> of the
- form <code><var>external</var>:<var>internal</var></code>.
-
- <p>An internal encoding of <code>nil</code> means that the programmer does
- not express a preference for how the data he reads and writes via I/O is
- encoded. Accordingly, Ruby associates the data with the stream’s external
- encoding, and gets out of the way. This, too, is the situation if the
- stream’s external encoding is equal to its internal encoding: the data is
- already encoded how the programmer requires.
-
- <p>The internal encoding is relevant when it is both set-that is, has a
- non-<code>nil</code> value-and differs from the <a href=#external>external
- encoding</a>. To honour the internal encoding, Ruby <a
- href=#transcoding>transcodes</a> data read from a stream from the
- external to the internal encoding, and transcodes
- data written to the stream from the internal to the external encoding.
-
- <p>The transcoding works exactly the same as <a
- href=#transcoding><code>String#encode</code></a>, so the <a
- href=#encode-options-hash><code>#encode</code> options
- <code>Hash</code></a> may be merged with the <a
- href=#options-hash><code>IO</code></a> wherever the latter is accepted.
- For example, it may be supplied as the final argument of <a
- href=/io#init><code>IO.new</code></a> or <code>IO#set_encoding</code>.
- </section>
-
- <section>
- <h1 id=ascii-8bit>ASCII-8BIT</h1>
-
- <p>Ruby defines an encoding named <code>ASCII-8BIT</code>, with an alias
- of <code>BINARY</code>, which does not correspond to any known encoding.
- It is intended to be associated with binary data, such as the bytes that
- make up a <abbr title='Portable Network Graphics'>PNG</abbr> image, so has
- no restrictions on content. One byte always corresponds with one
- character. This allows a <code>String</code>, for instance, to be treated
- as <a href=/strings#bytes>bag of bytes</a> rather than a sequence of
- characters. <code>ASCII-8BIT</code>, then, effectively corresponds to the
- absence of an encoding, so methods that expect an encoding name recognise
- <code>nil</code> as a synonym.
- </section>
-
- <section>
- <h1 id=compatibility>Compatibility</h1>
-
- <p>Methods of <code>String</code> and <code>Regexp</code> that take
- another such object as an argument require the encodings associated with
- the objects to be <dfn>compatible</dfn>. An encoding is always
- compatible with itself, so operations involving two objects associated
- with the same encoding are allowed. Likewise, two objects are compatible
- if they are both <a href=/strings#ascii-only>ASCII-only</a>.
-
- <p>The compatibility of other combinations of encodings can be
- determined with <code>Encoding.compatible?</code>, which compares the
- encoding of its two arguments, which are either <code>Encoding</code>
- objects or objects associated with encodings. If they are compatible,
- the encoding which would result from their combination is returned;
- otherwise, <code>nil</code> results. Operating on objects with
- incompatible encodings causes an
- <code>Encoding::CompatibilityError</code> exception to be raised.
- </section>
-
- <section>
- <h1 id=transcoding>Transcoding</h1>
-
- <p><dfn>Transcoding</dfn> a <code>String</code> converts its bytes to the
- equivalent byte sequences in a given encoding, with which it associates
- the result. It is typically performed with <code>String#encode</code>, which
- returns its receiver transcoded from a <var>source</var> encoding to a
- <var>target</var> encoding. <code>String#encode!</code> operates in the
- same manner, but transcodes the receiver in-place.
-
- <p>By default, <var>source</var> is the receiver’s current encoding, and
- <var>target</var> is the <a href=#internal>default internal</a> encoding.
- When called with one encoding argument, this becomes the <var>target</var>
- encoding. When called with two encoding arguments, the first is the
- <var>target</var>, the second is the <var>source</var>. This last form is
- mainly useful when the <code>String</code> is associated with <a
- href=#ascii-8bit><code>ASCII-8BIT</code></a>: it associates the
- <code>String</code> with <var>source</var>, then transcodes from
- <var>source</var> to <var>target</var>.
-
- <p>If a character in the <code>String</code> does not exist in the
- <var>target</var> encoding, or the <code>String</code> contains bytes
- invalid in its current encoding, an exception is raised. This behaviour
- can be changed by supplying an <var>options</var> <code>Hash</code> as the
- final argument, whose form is described in the table that follows.</p>
-
- <table id=options-hash class=border>
- <caption>The keys and values that are recognised in the
- <var>options</var> <code>Hash</code> accepted by
- <code>String#encode</code> and <code>String#encode!</code>. The
- <i>Key</i> column names a key of the <code>Hash</code>, and the
- <i>Values</i> column specifies its possible values.
- <thead>
- <tr>
- <th>Key
- <th>Values
- <th>Description
- <tbody class=zebra>
- <tr>
- <td><code>:cr_newline</code>
- <td><code>true</code> or <code>false</code>
- <td>Whether to convert <code>\n</code> to <code>\r</code>.
- <tr>
- <td><code>:crlf_newline</code>
- <td><code>true</code> or <code>false</code>
- <td>Whether to convert <code>\n</code> to <code>\r\n</code>.
- <tr>
- <td><code>:invalid</code>
- <td><code>:replace</code> or <code>nil</code>
- <td>A value of <code>:replace</code> causes characters invalid in
- the source encoding to be substituted for the replacement string.
- A value of <code>nil</code>, which is the default, causes an
- <code>Encoding::InvalidByteSequenceError</code> exception to be
- raised in this scenario.
- <tr>
- <td><code>:replace</code>
- <td><code>String</code>
- <td>The <dfn>replacement string</dfn> used by the
- <code>:invalid</code> or <code>:undef</code> options. By default,
- it is <code class=u>U+FFFD</code> for Unicode encodings and
- <code>?</code> for others.
- <tr>
- <td><code>:undef</code>
- <td><code>:replace</code> or <code>nil</code>
- <td>A value of <code>:replace</code> causes characters invalid in
- the destination encoding to be substituted for the replacement
- string. A value of <code>nil</code>, which is the default, causes
- an <code>Encoding::UndefinedConversionError</code> exception to be
- raised in this scenario.
- <tr>
- <td><code>:universal_newline</code>
- <td><code>true</code> or <code>false</code>
- <td>When true, <code>\r\n</code> and <code>\r</code> are converted
- to <code>\n</code>.
- <tr>
- <td>:xml
- <td><code>:text</code> or <code>:attr</code>
- <td>Replaces <code>&amp;</code> with <code>&amp;amp;</code>,
- <code>&lt;</code> with <code>&amp;lt;</code>, <code>&gt;</code>
- with <code>&amp;gt;</code>, and undefined characters with a
- hexadecimal entity of the form
- <code>&amp;#x<var>hex</var>;</code>, where <var>hex</var> is a
- sequence of hexadecimal digits. In addition, when a value
- of <code>:attr</code> is supplied, <code>"</code> is replaced with
- <code>&amp;quot;</code>.
- </table>
-
- <section>
- <h1 id=converter><code>Encoding::Converter</code></h1>
-
- <p>The <code>Encoding::Converter</code> class provides additional
- control over the transcoding process.
- <code>Encoding::Converter.new</code> takes a source encoding as its
- first argument, and a destination encoding as its second. Both may be
- given as encoding names or <code>Encoding</code> objects. An <a
- href=#options-hash>options <code>Hash</code></a> may be supplied as a
- third argument.</p>
-
- <section>
- <h1 id=conversion-path>Conversion Path</h1>
-
- <p>Text is transcoded along a <dfn>conversion path</dfn>. Each step
- involves a source encoding and a destination encoding. In the simple
- case, the conversion path will have only one step: from the given
- source encoding to the given destination encoding. However, more
- complex transcoding requires intermediate stages, e.g. to transcode
- Big5 into ISO-8859-9, we must first transcode to UTF-8: Big5 to UTF-8,
- then UTF-8 to ISO-8859-9. The source and destination encodings that
- are currently in use are returned by
- <code>Encoding::Converter#source_encoding</code> and
- <code>Encoding::Converter#destination_encoding</code>, respectively,
- as <code>Encoding</code> objects.</p>
-
- <p>The various newline conversion options and those which perform
- escaping are termed <dfn title='transcoding
- decorators'>decorators</dfn>, and also feature in the conversion
- path. If the destination encoding is ASCII-compatible, they appear as
- the final steps, i.e. after any encoding pairs. Otherwise, they appear
- before the final step.
-
- <p><code>Encoding::Converter#convpath</code> returns an
- <code>Array</code> of steps in the conversion path. Steps which convert
- between two encodings are represented as an <code>Array</code> of the
- respective <code>Encoding</code> objects. A steps which applies a
- decorator appears as a <code>String</code> naming the decorator.
-
- <p><code>Encoding::Converter.new</code> may be invoked with an
- <code>Array</code> in this form as an argument. The instantiated
- converter then uses this conversion path rather than inferring one
- from its arguments.
- </section>
-
- <section>
- <h1 id=convert>Piecemeal Conversion</h1>
-
- <p>An <code>Encoding::Converter</code> object can perform piecemeal
- transcoding, by repeatedly calling
- <code>Encoding::Converter#convert</code> with the next fragment of
- input. The fragment is transcoded and returned, associated with the
- destination encoding. However, because each fragment is always assumed
- to be part of a larger source, it may legitimately end mid-character,
- i.e. prior to a character boundary. These trailing bytes are
- buffered internally, and the successfully transcoded characters are
- returned. Then, when <code>#convert</code> is called next, its
- argument is assumed to supply the remaining bytes. If an unambiguously
- invalid byte sequence is encountered, an exception is raised.
-
- <p>Conceptually, we can explain this process as follows. When an
- <code>Encoding::Converter</code> instance is created, an empty
- <var>pending</var> buffer is created. Each time it is called,
- <code>#convert</code> initialises two empty buffers of its own:
- <var>source</var> and <var>destination</var>. It copies its argument
- into <var>source</var>, which it then processes byte-by-byte:
-
- <ol>
- <li>The byte is appended to <var>pending</var> and removed
- from <var>source</var>. The next action depends on the contents of
- <var>pending</var>:
- <ol>
- <li>If it constitutes a valid character in the destination
- encoding, it is transcoded and written to
- <var>destination</var>. <var>pending</var> is emptied.
- <li>If it constitutes a byte sequence that could be valid in the
- source encoding, but currently isn’t, it is left in
- <var>pending</var> in the hope that the next call to
- <code>#convert</code> will supply the remaining bytes.
- <li>If it is invalid in the source encoding, regardless of
- subsequent input, an
- <code>Encoding::InvalidByteSequenceError</code> exception is
- raised.
- <li>If its valid in the source encoding, but a corresponding
- character does not exist in the destination encoding, an
- <code>Encoding::UndefinedConversionError</code> exception is
- raised.
- </ol>
- <li>When the source buffer is empty, the destination buffer is
- returned, then emptied.
- </ol>
-
- <p>Thus, after <code>#convert</code> returns <var>destination</var>,
- <var>pending</var> may not be empty. An implication is that a call to
- <code>#convert</code> may raise an exception because, when combined
- with the contents of <var>pending</var>, its argument was invalid.
- i.e. an exception may be raised even if the argument is in itself
- valid. Therefore, when there is no more text to transcode,
- <code>Encoding::Converter#finish</code> should be called to signal
- that the contents of <var>pending</var> should be transcoded and
- returned. If <var>pending</var> isn’t empty when <code>#finish</code>
- is called, this normally results in one of the aforementioned
- exceptions being raised, because if its contents constitute a valid
- character, it would have already been returned by
- <code>#convert</code>. However, if the destination encoding is a
- stateful encoding such as ISO/IEC 2022, there may legitimately be
- bytes left in <var>pending</var>, which <code>#finish</code> flushes
- out. The lesson is that <code>#finish</code> should always be called
- when there is no more text to transcode.
- </section>
-
- <section>
- <h1 id=primitive-convert>Primitive Conversion</h1>
-
- <p><code>Encoding::Converter#convert</code> is built atop
- <code>Encoding::Converter#primitive_convert</code>, which provides
- even more control over the process. Unlike <code>#convert</code>, the
- source and destination buffers must be specified explicitly: the
- former as the first argument, the latter as the second. Both should be
- <code>String</code>s holding, respectively, the text to be transcoded,
- and the <code>String</code> in which to store the result. If the
- source buffer is an empty <code>String</code> it may be given as
- <code>nil</code>, instead. Neither buffer can be frozen, as they are,
- respectively, depleted and replenished in the course of the operation.
-
- <p>Bytes are written from the source buffer to the destination buffer
- via the pending buffer, as with <code>#convert</code>. However, this
- time instead of exceptions being raised for erroneous input, a
- <code>Symbol</code> is returned, as explained subsequently, which
- describes the problem. Thus, the programmer may elect to resolve the
- error before calling <code>#primitive_convert</code> again to resume
- the conversion. Due to the use of the pending buffer,
- <code>Encoding::Converter#finish</code> should still be used, as
- described previously.
-
- <p>By default, the destination buffer is appended to. If an
- <code>Integer</code> offset is given as the third argument to
- <code>#primitive_convert</code>, it specifies the byte index after which
- the transcoded text should be written. An <code>ArgumentError</code> is
- raised if the offset is given and greater than the byte size of the
- destination buffer. If this argument is specified as <code>nil</code>,
- the default behaviour is followed.
-
- <p>An optional fourth argument, given as an <code>Integer</code>,
- specifies the maximum size in bytes of the destination buffer; by
- default this value is <code>nil</code> which denotes an absence of a
- limit. If this limit is non-<code>nil</code> and the size of the
- destination buffer reaches it, transcoding will stop and
- <code>:destination_buffer_full</code> will be returned.
-
- <p>An optional fifth argument specifies one or both of the following
- options as a <code>Hash</code> or a bitwise OR of the corresponding
- constants:
-
- <dl>
- <dt><code>after_output: true</code> / <code>Encoding::Converter::AFTER_OUTPUT</code>
- <dd>After writing a character to the destination buffer, stop, and
- return <code>:after_output</code>.
- <dt><code>partial_input: true</code> / <code>Encoding::Converter::PARTIAL_INPUT</code>
- <dd>The source buffer is known to be incomplete, i.e. it ends outside
- of a character boundary. If this option is given and the last byte(s)
- of the source buffer don’t correspond to a character in the destination
- encoding, <code>:source_buffer_empty</code> is returned. This
- indicates that the remainder of the source text should be assigned to
- the source buffer, and <code>#primitive_convert</code> called again.
- </dl>
-
- <p>The return value is one of the following <code>Symbol</code>s:
-
- <dl>
- <dt><code>:invalid_byte_sequence</code>
- <dd>The source buffer contains a byte sequence invalid in the
- destination encoding, regardless of any following bytes. Equivalent
- to the <code>Encoding::InvalidByteSequenceError</code> exception
- being raised.
- <dt><code>:incomplete_input</code>
- <dd>The source buffer ends prematurely, presumably prior to a
- character boundary, but is potentially valid if additional input is
- supplied. Nevertheless, this state is regarded as exceptional,
- equivalent to an <code>Encoding::InvalidByteSequenceError</code>
- being raised, because the <code>:partial_input</code> option is
- <code>false</code>. If, as expected, no more input is supplied, the
- result will end with an invalid byte sequence. Conversely, if
- <code>:partial_input</code> was <code>true</code>, the unexceptional
- <code>:source_buffer_empty</code> <code>Symbol</code> would be
- returned instead.
- <dt><code>:undefined_conversion</code>
- <dd>A character has been encountered in the source buffer which,
- although legal in the source encoding, has no equivalent in the
- destination encoding. Equivalent to an
- <code>Encoding::UndefinedConversionError</code> exception being
- raised.
- <dt><code>:after_output</code>
- <dd>If the <code>:after_output</code> option is given, after each
- character is converted this <code>Symbol</code> is returned.
- <dt><code>:destination_buffer_full</code>
- <dd>If a non-<code>nil</code> value has been given for the fourth
- argument, this <code>Symbol</code> indicates that the destination
- buffer has reached the given limit.
- <dt><code>:source_buffer_empty</code>
- <dd>The source buffer ends prematurely, presumably prior to a
- character boundary, and the <code>:partial_input</code> option has
- been given. The source buffer should be replenished and transcoding
- resumed.
- <dt><code>:finished</code>
- <dd>Conversion is finished, either naturally or because
- <code>Encoding::Converter#finish</code> has been called.
- </dl>
- </section>
-
- <section>
- <h1 id=error-context>Error Context</h1>
-
- <p>When an error occurs during transcoding, it is often necessary to
- understand its context so as to recover. The exceptions raised by
- <code>#convert</code>, are augmented with accessors for gleaning this
- information. Additionally,
- <code>Encoding::Converter#primitive_errinfo</code> provides detailed
- information about the last error in the form of an <code>Array</code>