Skip to content
Browse files

add docs

  • Loading branch information...
1 parent 4cb8d0a commit 6203524e4bc1a874408608130add629d2d71848e @mislav committed May 16, 2012
Showing with 152 additions and 4 deletions.
  1. +98 −0 Readme.md
  2. +5 −0 models.rb
  3. +49 −4 rfc.rb
View
98 Readme.md
@@ -0,0 +1,98 @@
+Pretty RFC
+==========
+
+The goal of this projects is to collect and reformat official RFC documents and
+popular drafts.
+
+RFCs, as published officially, are in unsightly and impractical paged format.
+What's worse, the official format of most RFCs is plain text, even though they
+are authored in richer formats such as XML.
+
+Running the app
+---------------
+
+Dependencies:
+
+* git
+* Ruby 1.9
+* rake
+* Bundler
+* libxml2
+* PostgreSQL
+
+By default, the app will try to connect to the database named "rfc" on localhost
+without a username or password. This can be affected with the `DATABASE_URL`
+environment variable. If the database doesn't exist, the boostrap script will
+try to create it.
+
+~~~ sh
+# initialize dependencies and database
+$ script/bootstrap
+
+# start the server
+$ bundle exec rackup
+
+# now visit http://localhost:9292/
+~~~
+
+The RFC index
+-------------
+
+The [index of all RFCs][index] is pulled from FTP:
+ftp://ftp.rfc-editor.org/in-notes/rfc-index.xml
+
+Then the metadata for each RFC entry is imported to the database. This is done
+by the ["import_index" rake task][rakefile] as part of the bootstrap process.
+
+The search index
+----------------
+
+Searching is done with [PostgreSQL full text searching][textsearch]. The
+necessary indexes, stored procedures and triggers for this are in [Searchable][]
+module.
+
+The search results ordering is not perfect, but it is improved by bringing in a
+[popularity score from faqs.org][pop]. This is done by the ["import_popular" rake
+task][rakefile] as part of the bootstrap process.
+
+Fetching and rendering RFCs
+---------------------------
+
+When an RFC is first requested and it has never been processed, the app tries to
+look up its source XML document and render it to HTML. The XML lookup goes as
+follows:
+
+1. The fetcher tries to find the XML in http://xml.resource.org/public/rfc/xml/
+ where some RFCs in the 2000–53xx range can be found.
+
+2. Failing that, it fetches the metadata for the RFC from
+ http://datatracker.ietf.org/doc/
+
+3. If there is a link to the XML from the datatracker, use that. There probably
+ won't be a link, though.
+
+4. When there is no XML link, the fetcher looks up the draft name for the RFC
+ and checks if it can at least find the XML for its draft at
+ http://www.ietf.org/id/
+
+**Note:** This process only discovers XML sources for a small subset of RFCs.
+This is the biggest problem I have right now. The XML and nroff files in which
+RFCs were authored are usually not published, but are archived by rfc-editor.org
+and available by request by email.
+
+I'm investigating is there a way for bulk retrieval of these source files.
+
+If unable to obtain them, I will have to reformat RFCs by parsing the current
+publications instead of the source XML. This might be a lot of work.
+
+When obtained, the XML is parsed and rendered to HTML by the [RFC][] module.
+The templates used for generating HTML are in [templates/][templates].
+
+
+ [index]: http://www.rfc-editor.org/getbulk.html
+ [rakefile]: https://github.com/mislav/rfc/blob/master/Rakefile
+ [searchable]: https://github.com/mislav/rfc/blob/master/searchable.rb
+ [rfc]: https://github.com/mislav/rfc/blob/master/rfc.rb
+ [templates]: https://github.com/mislav/rfc/tree/master/templates
+ [textsearch]: http://www.postgresql.org/docs/9.1/static/textsearch-intro.html
+ [pop]: http://www.faqs.org/rfc-pop1.html
View
5 models.rb
@@ -1,6 +1,8 @@
require_relative 'rfc'
require 'active_support/core_ext/date_time/conversions'
+# The main model which represents an RFC. It delegates persistance to RfcEntry
+# and XML fetching to RfcFetcher.
class RfcDocument
extend Forwardable
@@ -92,6 +94,7 @@ def fetch_and_render href_resolver
require 'dm-timestamps'
require_relative 'searchable'
+# A lighweight database model that stores metadata and rendered HTML for an RFC.
class RfcEntry
include DataMapper::Resource
extend Searchable
@@ -141,6 +144,8 @@ def keywords=(value)
require 'net/http'
require 'nokogiri'
+# Responsible for discovering and fetching of the XML source file for a
+# specific publication.
class RfcFetcher
XML_URL = 'http://xml.resource.org/public/rfc/xml/%s.xml'
DRAFTS_URL = 'http://www.ietf.org/id/'
View
53 rfc.rb
@@ -7,15 +7,22 @@
require 'active_support/core_ext/array/grouping'
require 'erubis'
+# This module and accompanying templates in "templates/" implements parsing of
+# RFCs in XML format (per RFC 2629) and rendering them to modern HTML.
+#
+# The XML elements are described in:
+# http://xml.resource.org/authoring/draft-mrose-writing-rfcs.html
module RFC
+ # The latest timestamp of when any of the dependent source files have changed.
def self.last_modified
@last_modified ||= begin
files = [__FILE__] + Dir['templates/**/*']
files.map {|f| File.mtime f }.max
end
end
- class NodeWrapper < DelegateClass(Nokogiri::XML::Node)
+ # A base class for decorating XML nodes as different data objects.
+ class NodeWrapper < DelegateClass(Nokogiri::XML::Node)
extend ActiveSupport::Memoizable
# a reference to the main Document object
@@ -38,6 +45,10 @@ def wrap(node, klass, *args)
element
end
+ # "iref" element is for adding terms to the index. There's no need for
+ # indexes in digital media, so this is ignored.
+ #
+ # "cref" is for internal comments in drafts.
IGNORED_ELEMENTS = %w[iref cref]
def element_names
@@ -60,6 +71,8 @@ def text_at(path)
node = at(path) and node.text
end
+ # Change the internal node that this object delegates to by performing a
+ # query. If a block is given, changes it only for the duration of the block.
def scope(path)
old_node = __getobj__
node = at(path)
@@ -82,7 +95,21 @@ def self.define_predicate(name)
define_method(:"#{name}?") { !self.send(name).blank? }
end
- # TODO: remove memoization
+ # For each public method added, set it to be memoized and define a
+ # same-named predicate method that tests if the original method's result is
+ # not blank.
+ #
+ # Examples
+ #
+ # # method is defined
+ # def role
+ # self['role']
+ # end
+ #
+ # # a predicate method is automatically available
+ # obj.role?
+ #
+ # TODO: remove implicit memoization
def self.method_added(name)
if public_method_defined?(name) and not method_defined?(:"_unmemoized_#{name}") and
name !~ /_unmemoized_|_memoizable$|^freeze$|[?!=]$/ and instance_method(name).arity.zero?
@@ -146,6 +173,7 @@ def elements
when 'texttable' then all << wrap(node, Table)
when 't'
text = wrap(node, Text)
+ # detect if this block of text actually belongs to a definition list
in_definition_list = all.last.is_a? DefinitionList
if text.definition? in_definition_list
all << DefinitionList.new(document) unless in_definition_list
@@ -210,6 +238,7 @@ def elements
end
end
+ # detect if this element is just a list container
def list?
element_names == %w[list] and text_children.all?(&:blank?)
end
@@ -257,6 +286,7 @@ def style
type
end
+ # detect when a list is used for indicating a note block
def note?
first_element_child.text =~ /\A\s*Note:\s/
end
@@ -320,7 +350,8 @@ def columns
end
def rows
- search('./c').map {|c| wrap(c, Text) }.in_groups_of(columns.size, false)
+ cells = search('./c').map {|c| wrap(c, Text) }
+ cells.in_groups_of(columns.size, false)
end
def preamble
@@ -362,9 +393,11 @@ def series
end
end
+ # Represents the parsed RFC document as a whole.
class Document < NodeWrapper
attr_accessor :href_resolver
+ # Initialize the document by parsing a string or IO stream as XML
def initialize(from)
super Nokogiri::XML(from)
scope '/rfc'
@@ -444,13 +477,17 @@ def keywords
all('./front/keyword/text()').map(&:text)
end
- # TODO: add memoization
+ # TODO: add memoization when implicit memoization is gone
def anchor_map
all('.//*[@anchor]').each_with_object({}) do |node, map|
map[node['anchor']] = node
end
end
+ # Look up where an anchor string is pointing to to figure out the string it
+ # should display at the point of reference.
+ #
+ # TODO: improve this mess
def lookup_anchor(name)
if node = anchor_map[name]
if 'reference' == node.node_name
@@ -465,6 +502,9 @@ def lookup_anchor(name)
end
end
+ # Resolve the target string as a URL or internal link.
+ #
+ # TODO: improve this mess and ensure that internal links are unique
def href_for(target)
if (target =~ /^[\w-]+:/) == 0
target
@@ -479,6 +519,7 @@ def references
end
end
+ # Template helpers for HTML rendering
module Helpers
def section_title(section)
"<h#{section.level}>#{h section.title}</h#{section.level}>"
@@ -530,8 +571,11 @@ def nbsp(str)
end
end
+ # Template rendering helpers.
module TemplateHelpers
extend self
+ # Templates are rendered using the provided object as execution context. The
+ # object is additionally decorated with Helpers and TemplateHelpers modules.
def render(obj, template = obj.template_name)
file = "templates/#{template}.erb"
eruby = Erubis::Eruby.new File.read(file), filename: File.basename(file)
@@ -543,6 +587,7 @@ def render(obj, template = obj.template_name)
end
end
+# If this script was called directly, render given XML and output HTML on STDOUT.
if __FILE__ == $0
rfc = RFC::Document.new ARGF

0 comments on commit 6203524

Please sign in to comment.
Something went wrong with that request. Please try again.