Skip to content

HTTPS clone URL

Subversion checkout URL

You can clone with HTTPS or Subversion.

Download ZIP
Newer
Older
100644 100 lines (72 sloc) 3.468 kb
6203524 Mislav Marohnić add docs
authored
1 Pretty RFC
2 ==========
3
4 The goal of this projects is to collect and reformat official RFC documents and
5 popular drafts.
6
7 RFCs, as published officially, are in unsightly and impractical paged format.
8 What's worse, the official format of most RFCs is plain text, even though they
9 are authored in richer formats such as XML.
10
11 Running the app
12 ---------------
13
14 Dependencies:
15
16 * git
17 * Ruby 1.9
18 * rake
19 * Bundler
20 * libxml2
21 * PostgreSQL
22
23 By default, the app will try to connect to the database named "rfc" on localhost
24 without a username or password. This can be affected with the `DATABASE_URL`
e42eb61 Mislav Marohnić improve bootstrap script
authored
25 environment variable. If the database doesn't exist, the [boostrap
26 script][bootstrap] will try to create it.
6203524 Mislav Marohnić add docs
authored
27
28 ~~~ sh
29 # initialize dependencies and database
30 $ script/bootstrap
31
32 # start the server
33 $ bundle exec rackup
34
35 # now visit http://localhost:9292/
36 ~~~
37
38 The RFC index
39 -------------
40
41 The [index of all RFCs][index] is pulled from FTP:
42 ftp://ftp.rfc-editor.org/in-notes/rfc-index.xml
43
44 Then the metadata for each RFC entry is imported to the database. This is done
45 by the ["import_index" rake task][rakefile] as part of the bootstrap process.
46
47 The search index
48 ----------------
49
50 Searching is done with [PostgreSQL full text searching][textsearch]. The
51 necessary indexes, stored procedures and triggers for this are in [Searchable][]
52 module.
53
54 The search results ordering is not perfect, but it is improved by bringing in a
55 [popularity score from faqs.org][pop]. This is done by the ["import_popular" rake
56 task][rakefile] as part of the bootstrap process.
57
58 Fetching and rendering RFCs
59 ---------------------------
60
61 When an RFC is first requested and it has never been processed, the app tries to
62 look up its source XML document and render it to HTML. The XML lookup goes as
63 follows:
64
65 1. The fetcher tries to find the XML in http://xml.resource.org/public/rfc/xml/
66 where some RFCs in the 2000–53xx range can be found.
67
68 2. Failing that, it fetches the metadata for the RFC from
69 http://datatracker.ietf.org/doc/
70
71 3. If there is a link to the XML from the datatracker, use that. There probably
72 won't be a link, though.
73
74 4. When there is no XML link, the fetcher looks up the draft name for the RFC
75 and checks if it can at least find the XML for its draft at
76 http://www.ietf.org/id/
77
78 **Note:** This process only discovers XML sources for a small subset of RFCs.
79 This is the biggest problem I have right now. The XML and nroff files in which
80 RFCs were authored are usually not published, but are archived by rfc-editor.org
81 and available by request by email.
82
83 I'm investigating is there a way for bulk retrieval of these source files.
84
85 If unable to obtain them, I will have to reformat RFCs by parsing the current
86 publications instead of the source XML. This might be a lot of work.
87
88 When obtained, the XML is parsed and rendered to HTML by the [RFC][] module.
89 The templates used for generating HTML are in [templates/][templates].
90
91
92 [index]: http://www.rfc-editor.org/getbulk.html
93 [rakefile]: https://github.com/mislav/rfc/blob/master/Rakefile
94 [searchable]: https://github.com/mislav/rfc/blob/master/searchable.rb
95 [rfc]: https://github.com/mislav/rfc/blob/master/rfc.rb
e42eb61 Mislav Marohnić improve bootstrap script
authored
96 [bootstrap]: https://github.com/mislav/rfc/blob/master/script/bootstrap
6203524 Mislav Marohnić add docs
authored
97 [templates]: https://github.com/mislav/rfc/tree/master/templates
98 [textsearch]: http://www.postgresql.org/docs/9.1/static/textsearch-intro.html
99 [pop]: http://www.faqs.org/rfc-pop1.html
Something went wrong with that request. Please try again.