Skip to content
This repository

HTTPS clone URL

Subversion checkout URL

You can clone with HTTPS or Subversion.

Download ZIP
Newer
Older
100755 348 lines (295 sloc) 11.459 kb
54e85fa2 »
2006-01-19 initial commit
1 #!/usr/bin/perl -w
2 #
62616b71 »
2007-07-20 proper order of sections, notes at end
3 # Version history:
4 #
5 # sep-offprint 1.0 - John MacFarlane - July 19, 2007
6 # + include supplements in the ordered they are linked to
7 # + always put notes at the end
8 # + preprocess supplements and notes, in addition to index.html
acddfb4d »
2007-03-08 fixed regex for stripping off SEP header (thanks to George Galfalvi)
9 # sep-offprint 0.9 - John MacFarlane - March 8, 2007
10 # + fixed regex for stripping off SEP header (thanks to George Galfalvi)
0dae5005 »
2007-02-22 strip off (S. E. P.) from HTML title
11 # sep-offprint 0.8 - John MacFarlane - February 22, 2007
12 # + strip off "(Stanford Encyclopedia of Philosophy)" from
364f4851 »
2007-02-22 thank Uri Nodelman instead of Ed Zalta
13 # HTML title (thanks to Uri Nodelman)
f0bf547c »
2007-01-23 Bug fixes due to Dan Robins
14 # sep-offprint 0.7 - John MacFarlane - January 23, 2007
a0cf069a »
2007-01-23 small comment change
15 # + include supplements, if present (thanks to Dan Robins)
afbab8aa »
2007-01-23 added error checking for presence of index.html
16 # + removed unnecessary call to lwp-rget (Dan Robins)
6ea32ffe »
2007-01-23 added --linkcolor option, fixed --version and --help
17 # + added --linkcolor option (JM and Dan Robins)
afbab8aa »
2007-01-23 added error checking for presence of index.html
18 # + added error checking: error exit if index.html not found
6ea32ffe »
2007-01-23 added --linkcolor option, fixed --version and --help
19 # + fixed '--version' and adjusted '--help' output
e916cb0b »
2006-08-30 changed version to 0.6
20 # sep-offprint 0.6 - John MacFarlane - August 30, 2006
fc5066ad »
2006-08-25 version bump
21 # sep-offprint 0.5 - John MacFarlane - August 25, 2006
c71db0ec »
2006-08-22 remove header; use local copy; change entities to pictures when needed
22 # sep-offprint 0.4 - John MacFarlane - August 22, 2006
54e85fa2 »
2006-01-19 initial commit
23 # sep-offprint 0.3 - John MacFarlane - May 25, 2005
24 #
62616b71 »
2007-07-20 proper order of sections, notes at end
25 # Synopsis:
26 #
54e85fa2 »
2006-01-19 initial commit
27 # produces a PDF or postscript "offprint" of a Stanford
28 # Encyclopedia of Philosophy (SEP) article
29 #
30 # Argument is an entry name from SEP, as it appears in the URL.
31 # For example, to get the article on classical logic, which is at
32 # http://plato.stanford.edu/entries/logic-classical/, just type
33 #
34 # perl sep-offprint logic-classical
35 #
f0bf547c »
2007-01-23 Bug fixes due to Dan Robins
36 # and it will create logic-classical.pdf.
54e85fa2 »
2006-01-19 initial commit
37 #
38 # There are many command-line options. For a list, type
39 #
f0bf547c »
2007-01-23 Bug fixes due to Dan Robins
40 # perl sep-offprint --help
54e85fa2 »
2006-01-19 initial commit
41 #
b294df26 »
2006-08-23 use lwp-rget instead of wget
42 # The programs html2ps and ps2pdf must be in the user's path:
54e85fa2 »
2006-01-19 initial commit
43 #
44 # html2ps can be found at http://user.it.uu.se/~jan/html2ps.html.
45 # Download the tarball or zip file and run the "install" script.
46 #
47 # ps2pdf is part of Ghostscript -- many users will have it
48 # already: http://www.cs.wisc.edu/~ghost/doc/AFPL/get851.htm
49 #
b294df26 »
2006-08-23 use lwp-rget instead of wget
50 # In addition, the LWP package for Perl must be installed.
51 #
54e85fa2 »
2006-01-19 initial commit
52 # For more information and updates, see
53 # http://philosophy.berkeley.edu/macfarlane/sep-offprint.html
54
62616b71 »
2007-07-20 proper order of sections, notes at end
55 my $version_number = '1.0';
6ea32ffe »
2007-01-23 added --linkcolor option, fixed --version and --help
56
54e85fa2 »
2006-01-19 initial commit
57 use Getopt::Long;
c71db0ec »
2006-08-22 remove header; use local copy; change entities to pictures when needed
58 use File::Temp qw/ tempdir /;
4e7bee91 »
2006-08-23 use File::Copy instead of cp; other minor improvements
59 use File::Copy;
c71db0ec »
2006-08-22 remove header; use local copy; change entities to pictures when needed
60 use Cwd;
54e85fa2 »
2006-01-19 initial commit
61
9e792504 »
2007-07-20 print completion messages to STDERR
62 # printhelp - returns a usage message
62616b71 »
2007-07-20 proper order of sections, notes at end
63
54e85fa2 »
2006-01-19 initial commit
64 sub printhelp {
65 die
66 "Produces a PDF offprint from a Stanford Encyclopedia of Philosophy article.
67 (http://plato.stanford.edu/)
68
c71db0ec »
2006-08-22 remove header; use local copy; change entities to pictures when needed
69 Usage: sep-offprint [options] <entry name>
54e85fa2 »
2006-01-19 initial commit
70
71 Examples: sep-offprint russell
72 sep-offprint --1up --ps --paper a4 frege
73
6ea32ffe »
2007-01-23 added --linkcolor option, fixed --version and --help
74 Options (* indicates a default):
54e85fa2 »
2006-01-19 initial commit
75
76 --1up print one page per sheet, portrait orientation
6ea32ffe »
2007-01-23 added --linkcolor option, fixed --version and --help
77 --2up print two pages per sheet, landscape orientation*
54e85fa2 »
2006-01-19 initial commit
78 --ps produce postscript (PS) output
6ea32ffe »
2007-01-23 added --linkcolor option, fixed --version and --help
79 --pdf produce PDF output*
80 --font <font> use <font> (Times*, Helvetica, Palatino, Courier)
81 --size <size> use <size> (10pt, 12pt, 14pt*, 16pt)
82 --align <align> use <align> (left, justified*)
83 --paper <papersize> specify <papersize> (letter*, legal, a4)
84 --linkcolor <color> specify color of hyperlinks (black*, gray, blue, ...)
85 --localpath <path> look for entry in a subdirectory of <path>
54e85fa2 »
2006-01-19 initial commit
86 --help this message
87 --version prints version number\n";
88 }
89
62616b71 »
2007-07-20 proper order of sections, notes at end
90 # slurp - slurps contents of a file and returns as a string;
91 # takes filename as argument
92
93 sub slurp {
94 my $file = shift;
95 local( $/, *FILE );
96 open(FILE, "< $file") or die "Couldn't open $file to read";
97 my $contents = <FILE>;
98 close(FILE);
99 return $contents;
100 }
101
102 # preprocess html - preprocess HTML file, stripping out navigation bars,
103 # etc., and replacing entity references with appropriate characters or images.
104 # takes filename as argument
105
106 sub preprocess_html {
107 my $file = $_;
108 my $contents = slurp $file;
109
110 # get rid of header stuff
111 $contents =~ s/<body>.*?<!--DO NOT MODIFY THIS LINE AND ABOVE-->/<body><div id="content"><div id="aueditable">/gs;
112
113 # get rid of "(Stanford Encyclopedia of Philosophy)" in title:
114 $contents =~ s/<title>(.*)\ \(Stanford Encyclopedia of Philosophy\)/<title>$1/;
115
116 # make publication date into regular paragraph
117 $contents =~ s/<br \/><span class="xsmall">(.*)<\/span><\/h1>/<\/h1><p>$1<\/p>/g;
118
119 # center copyright notice
120 $contents =~ s/<div id="foot">(.*?)<\/div>/<center>$1<\/center>/gs;
121
122 # replace unicode character references
123 %replacements = (
124 "&\#133;" => "&hellip;",
125 "&\#145;" => "&lsquo;",
126 "&\#146;" => "&rsquo;",
127 "&\#147;" => "&ldquo;",
128 "&\#148;" => "&rdquo;",
129 "&\#149;" => "&bull;",
130 "&\#150;" => "&minus;",
131 "&\#257;" => "a",
132 "&\#261;" => "a",
133 "&\#263;" => "c",
134 "&\#269;" => "c",
135 "&\#281;" => "e",
136 "&\#299;" => "i",
137 "&\#321;" => "L",
138 "&\#322;" => "l",
139 "&\#324;" => "n",
140 "&\#333;" => "o",
141 "&\#345;" => "r",
142 "&\#346;" => "S",
143 "&\#347;" => "s",
144 "&\#351;" => "s",
145 "&\#363;" => "u",
146 "&\#365;" => "u",
147 "&\#369;" => "u",
148 "&\#378;" => "z",
149 "&\#380;" => "z",
150 "&\#381;" => "Z",
151 "&\#599;" => "u",
152 "&\#768;" => "",
153 "&\#769;" => "",
154 "&\#770;" => "",
155 "&\#771;" => "",
156 "&\#772;" => "",
157 "&\#773;" => "",
158 "&\#775;" => "",
159 "&\#803;" => "",
160 "&\#8209;" => "-",
161 "&\#8600;" => "<img alt=\"southeast-arrow\" src=\"http:\/\/plato.stanford.edu\/symbols\/searrow.gif\">",
162 "<sup>&\#9484;<\/sup>" => "<img alt=\"left-corner-quote\" src=\"http:\/\/plato.stanford.edu\/symbols\/l-corner-quote.gif\">",
163 "<sup>&\#9488;<\/sup>" => "<img alt=\"right-corner-quote\" src=\"http:\/\/plato.stanford.edu\/symbols\/r-corner-quote.gif\">",
164 "&\#8463;" => "<img alt=\"hbar\" src=\"http:\/\/plato.stanford.edu\/symbols\/hbar.gif\">",
165 "&\#9633;" => "<img alt=\"Box\" src=\"http:\/\/plato.stanford.edu\/symbols\/Box.gif\">"
166 );
167 while ( my ($ref, $rep) = each(%replacements) ) {
168 $contents =~ s/$ref/$rep/g;
169 }
170
171 # write back to file
172 open(FILE, "> $file") or die "Couldn't open $file to write";
173 print FILE $contents;
174 close(FILE);
175 }
176
177 #
178 # parse command-line options
179 #
180
54e85fa2 »
2006-01-19 initial commit
181 GetOptions( '1up|1' => \$oneup,
182 '2up|2' => \$twoup,
183 'ps' => \$ps,
184 'pdf' => \$pdf,
185 'font=s' => \$fontfamily,
186 'size=s' => \$fontsize,
187 'align=s' => \$textalign,
188 'paper=s' => \$papersize,
6ea32ffe »
2007-01-23 added --linkcolor option, fixed --version and --help
189 'linkcolor=s' => \$linkcolor,
54e85fa2 »
2006-01-19 initial commit
190 'localpath=s' => \$localpath,
191 'help|h' => \$help,
192 'version|v' => \$version);
193
6ea32ffe »
2007-01-23 added --linkcolor option, fixed --version and --help
194 if ($version) {die "sep-offprint $version_number\n";};
195
c71db0ec »
2006-08-22 remove header; use local copy; change entities to pictures when needed
196 if ($#ARGV < 0) {&printhelp;};
197 $entryname = $ARGV[0];
198
9e792504 »
2007-07-20 print completion messages to STDERR
199 # derive entry name from argument:
62616b71 »
2007-07-20 proper order of sections, notes at end
200 # remove uppercase and spaces
c71db0ec »
2006-08-22 remove header; use local copy; change entities to pictures when needed
201 $entryname =~ tr/A-Z/a-z/;
202 $entryname =~ tr/ /-/;
203
62616b71 »
2007-07-20 proper order of sections, notes at end
204 # remove SEP url if specified
c71db0ec »
2006-08-22 remove header; use local copy; change entities to pictures when needed
205 $entryname =~ s{http://plato.stanford.edu/entries/}{};
62616b71 »
2007-07-20 proper order of sections, notes at end
206
207 # remove /index.html if specified
c71db0ec »
2006-08-22 remove header; use local copy; change entities to pictures when needed
208 $entryname =~ s{/.*}{};
209
54e85fa2 »
2006-01-19 initial commit
210 if ($help) {&printhelp;};
211 if (not ($pdf or $ps)) {$pdf=1};
212 if ($oneup) {$twoup = 0} else {$twoup = 1};
213 if (not $fontsize) {$fontsize = "14pt"};
214 if (not $fontfamily) {$fontfamily = "Times"};
215 if (not $textalign) {$textalign = "justify"};
216 if (not $papersize) {$papersize = "letter"};
6ea32ffe »
2007-01-23 added --linkcolor option, fixed --version and --help
217 if (not $linkcolor) {$linkcolor = "black"};
54e85fa2 »
2006-01-19 initial commit
218
62616b71 »
2007-07-20 proper order of sections, notes at end
219 # create temporary directory
c71db0ec »
2006-08-22 remove header; use local copy; change entities to pictures when needed
220 $temp = tempdir ( CLEANUP => 1 );
62616b71 »
2007-07-20 proper order of sections, notes at end
221 $current = getcwd; # working directory from which sep-offprint is run
c71db0ec »
2006-08-22 remove header; use local copy; change entities to pictures when needed
222
62616b71 »
2007-07-20 proper order of sections, notes at end
223 # get all the source files and put them in temp directory,
224 # then change to temp directory
c71db0ec »
2006-08-22 remove header; use local copy; change entities to pictures when needed
225
226 if ($localpath) {
6c7a3cf5 »
2006-08-26 spacing changes
227 $footer = "$localpath/$entryname/";
b8298baa »
2006-08-25 added support for more entities, fixed copy bug
228 while (<$localpath/$entryname/*.*>) {
f0bf547c »
2007-01-23 Bug fixes due to Dan Robins
229 copy($_,$temp)
62616b71 »
2007-07-20 proper order of sections, notes at end
230 };
c71db0ec »
2006-08-22 remove header; use local copy; change entities to pictures when needed
231 chdir $temp;
9f1e4854 »
2007-01-23 added newlines to error messages if index.html not found
232 (-e "index.html") or die "Could not find index.html in $localpath/$entryname/\n";
62616b71 »
2007-07-20 proper order of sections, notes at end
233 }
c71db0ec »
2006-08-22 remove header; use local copy; change entities to pictures when needed
234 else {
6c7a3cf5 »
2006-08-26 spacing changes
235 $footer = "http://plato.stanford.edu/entries/$entryname/";
c71db0ec »
2006-08-22 remove header; use local copy; change entities to pictures when needed
236 chdir $temp;
62616b71 »
2007-07-20 proper order of sections, notes at end
237 # download all the HTML files
238 system("lwp-rget --quiet http://plato.stanford.edu/entries/$entryname/");
9f1e4854 »
2007-01-23 added newlines to error messages if index.html not found
239 (-e "index.html") or die "Could not download files from http://plato.stanford.edu/entries/$entryname/\nAre you sure you have the right entry name?\n";
62616b71 »
2007-07-20 proper order of sections, notes at end
240 };
c71db0ec »
2006-08-22 remove header; use local copy; change entities to pictures when needed
241
62616b71 »
2007-07-20 proper order of sections, notes at end
242 # create blank html file to work around html2ps bug.
243 # without this blank file after notes.html, html2ps will cut off
54e85fa2 »
2006-01-19 initial commit
244 # the last page of an entry if it occurs in the left column in 2up mode.
245
f0bf547c »
2007-01-23 Bug fixes due to Dan Robins
246 $blank = "blankpage";
54e85fa2 »
2006-01-19 initial commit
247
248 open FILE, ">$blank" or die "unable to open $blank: $!";
249
250 print FILE <<EOF;
251 <html>
252 <head>
253 <title>&nbsp;</title>
254 </head>
255 <body>
256 <p>&nbsp;</p>
257 </body>
258 </html>
259 EOF
260
261 close FILE;
262
62616b71 »
2007-07-20 proper order of sections, notes at end
263 # create a configuration file with appropriate footers
54e85fa2 »
2006-01-19 initial commit
264
62616b71 »
2007-07-20 proper order of sections, notes at end
265 $html2psrc = "html2psrc";
54e85fa2 »
2006-01-19 initial commit
266
267 open FILE, ">$html2psrc" or die "unable to open $html2psrc: $!";
268
269 print FILE <<EOF;
270 BODY {
271 font-size: $fontsize;
272 font-family: $fontfamily;
273 text-align: $textalign;
274 }
f0bf547c »
2007-01-23 Bug fixes due to Dan Robins
275 A:link {
6ea32ffe »
2007-01-23 added --linkcolor option, fixed --version and --help
276 color: $linkcolor;
f0bf547c »
2007-01-23 Bug fixes due to Dan Robins
277 }
54e85fa2 »
2006-01-19 initial commit
278 \@page {
279 margin-left: 2.5cm;
280 margin-right: 2.5cm;
281 margin-top: 2.5cm;
282 margin-bottom: 2.5cm;
283 }
284 \@html2ps {
285 option {
286 twoup: $twoup;
287 landscape: $twoup;
288 number: 0;
289 }
290 paper { type: $papersize }
291 header {
292 right: "STANFORD ENCYCLOPEDIA OF PHILOSOPHY";
293 left: \$T;
294 }
295 footer {
296 left: \$N;
4e7bee91 »
2006-08-23 use File::Copy instead of cp; other minor improvements
297 right: $footer;
54e85fa2 »
2006-01-19 initial commit
298 }
299 }
300 EOF
301
302 close FILE;
303
304 # name of temporary file to hold postscript output of html2ps
4e7bee91 »
2006-08-23 use File::Copy instead of cp; other minor improvements
305 $pstemp = "pstemp";
54e85fa2 »
2006-01-19 initial commit
306
62616b71 »
2007-07-20 proper order of sections, notes at end
307 # preprocess all the html files in the working (i.e., temp) directory
308 preprocess_html foreach <*.html>;
c71db0ec »
2006-08-22 remove header; use local copy; change entities to pictures when needed
309
62616b71 »
2007-07-20 proper order of sections, notes at end
310 #
311 # determine the order in which the HTML pages should be processed:
312 #
d4664e07 »
2006-08-30 center copyright notice
313
62616b71 »
2007-07-20 proper order of sections, notes at end
314 # go through index.html and make a list of the .html files called
315 # in order
c71db0ec »
2006-08-22 remove header; use local copy; change entities to pictures when needed
316
62616b71 »
2007-07-20 proper order of sections, notes at end
317 my $indexContents = slurp "index.html";
318 my @indexWords = split(/"|#/, $indexContents); # ...a href=" blah.html #yada"...
319 my @localHtmlRefs = grep(/^\w+\.html$/, @indexWords);
320
321 # this is the perl version of uniq -- remove duplicates,
322 # preserving the order of the original
323 undef %seen;
324 my @uniqueHtmlRefs = grep(!$seen{$_}++, @localHtmlRefs);
325
326 # make a space-separated list of the HTML files to process, in order
327 my $orderedHtmlFiles = join(' ', @uniqueHtmlRefs);
328
329 # set $notes to "notes.html" if there are notes
330 my $notes = "";
331 if ($orderedHtmlFiles =~ /notes\.html/) {
332 $notes = "notes.html"
333 }
334
335 # discard index.html and notes.html from the list
336 $orderedHtmlFiles =~ s/(index|notes)\.html//g;
c71db0ec »
2006-08-22 remove header; use local copy; change entities to pictures when needed
337
62616b71 »
2007-07-20 proper order of sections, notes at end
338 # call html2ps to create the postscript version of the entry
339 system("html2ps -D -U -f $html2psrc -o $pstemp index.html " . $orderedHtmlFiles . " $notes $blank");
c71db0ec »
2006-08-22 remove header; use local copy; change entities to pictures when needed
340
62616b71 »
2007-07-20 proper order of sections, notes at end
341 # create pdf if requested
9e792504 »
2007-07-20 print completion messages to STDERR
342 if ($pdf) {system("ps2pdf -sPAPERSIZE=$papersize $pstemp $current/$entryname.pdf") || print STDERR "Created $entryname.pdf\n";};
54e85fa2 »
2006-01-19 initial commit
343
62616b71 »
2007-07-20 proper order of sections, notes at end
344 # copy ps file if requested
9e792504 »
2007-07-20 print completion messages to STDERR
345 if ($ps) {copy($pstemp, "$current/$entryname.ps") && print STDERR "Created $entryname.ps\n";};
54e85fa2 »
2006-01-19 initial commit
346
b2fde1b8 »
2006-08-22 cleanup on sep-offprint
347 # note: temporary directory will be deleted automatically on exit
Something went wrong with that request. Please try again.