This read-me.pod file has a double purpose.
Describe how the layout system functions.
Be used as an input file for a demo of the layout system.
In other words, either you read the POD file as is to learn what will happen, or you read the HTML generated file or the PDF generated file to learn what has juste happened.
I have developed this layout system because I noticed a boardgame, the rules of which are available on the net. I though it would be nice if this game had a larger audience, including French-speaking people. In addition, the rules were rather short, so I decided to translate them in French.
As many Perl programmers, I like using the POD format, because it is simple and yet it provides most basic features: chapters, lists, links, bold chars, italics. The only missing basic feature is tables. Alas, there were several tables in the game rules. Of course, when using POD, you can always show a table by using preformatted text, as this:
.-----.-----.-----.
| 8 | 1 | 6 |
.-----.-----.-----.
| 3 | 5 | 7 |
.-----.-----.-----.
| 4 | 9 | 2 |
.-----.-----.-----.
The result is acceptable when using perldoc
to display the POD or when reading it in your favourite text editor, but the same thing in HTML with <pre>
tags or in LATEX within a verbatim
environment is quite frustrating. Having the following is much better:
+-----+-----+-----+
| 8 | 1 | 6 |
+-----+-----+-----+
| 3 | 5 | 7 |
+-----+-----+-----+
| 4 | 9 | 2 |
+-----+-----+-----+
(Of course, if you read this using perldoc
or you favourite text editor, or if you browse it on the Git Hub website, you will not see much difference).
There is the Pseudo::POD
module, which allows us to generate pretty tables in HTML, LATEX and other advanced display formats, but when you edit your source file under Emacs / vi / other or when you display the file with perldoc
, the result is disappointing.
Another solution consists in generating the HTML / LATEX file, and then edit it using Emacs, to replace the ASCII-art table with a proper table. See the table-recognize
and table-generate-source
functions, as well as the other table-
xxx functions. The downside is that this is incompatible with automatic processing and makefiles.
This is why I created these scripts, to layout nice tables from a makefile procedure, without requiring any intervention once make
is running.
You just need to download the repository and store its files into the same directory. Then, run make
for a demo of what the programmes do.
There are a few prerequisites for the programmes.
A complete installation of Perl. If you have installed Perl with your distro's .rpm's or .deb's, make sure that
perldoc
,pod2html
andpod2latex
have been installed, too. Sometimes, they are not in the main Perl package.A rather recent version of Emacs. Version 23 is fine, I think that version 22 is fine, too, but version 21 is too much old.
An HTML browser, compatible with UTF-8 and tables. But nowadays, is there still anybody who has not such a browser?
A LATEX installation, including the commands
latex
andpdflatex
. You will need also the Palatino (ppl
), Helvetica (phv
) and Courier (pcr
) fonts, and thebabel
,fontenc
,textcomp
,graphicx
,hyperref
etbreakurl
packages, even ifpod2latex
does not benefit frombreakurl
. Actually, I always use all these packages, in the same manner many Perl programmers usestrict
andwarnings
. But if you do not like these fonts and packages, you can edit the .skel files to remove them and possibly add others.
You can read the chapter about table-based text in the Emacs documentation. Anyway, here is what you need to know, even if you use another text editor.
The table must be interpreted by pod2html
and pod2latex
as pre-formatted text. Therefore, each line must begin with some space.
In addition, the tables must be recognized as such by Emacs. Therefore, the lines delimiting the rows and the columns must be minus signs -
and pipes |
. The junctions between lines must be plus signs +
.
Note: For technical reasons, I will show examples below, where the line junctions are dots instead of plus signs. Just imagine they are plus signs.
In this table, all cells have the same size.
.----.----.----.----.
| 13 | 2 | 3 | 16 |
.----.----.----.----.
| 1 | 14 | 15 | 4 |
.----.----.----.----.
| 8 | 11 | 10 | 5 |
.----.----.----.----.
| 12 | 7 | 6 | 9 |
.----.----.----.----.
which results in:
+----+----+----+----+
| 13 | 2 | 3 | 16 |
+----+----+----+----+
| 1 | 14 | 15 | 4 |
+----+----+----+----+
| 8 | 11 | 10 | 5 |
+----+----+----+----+
| 12 | 7 | 6 | 9 |
+----+----+----+----+
In this table, several cells from the same row have been merged into a wide cell.
.----.----.----.----.----.----.----.----.
| H | | He |
.----.----.----.----.----.----.----.----.
| Li | Be | B | C | N | O | F | Ne |
.----.----.----.----.----.----.----.----.
| Na | Mg | Al | Si | P | S | Cl | Ar |
.----.----.----.----.----.----.----.----.
which results in:
+----+----+----+----+----+----+----+----+
| H | | He |
+----+----+----+----+----+----+----+----+
| Li | Be | B | C | N | O | F | Ne |
+----+----+----+----+----+----+----+----+
| Na | Mg | Al | Si | P | S | Cl | Ar |
+----+----+----+----+----+----+----+----+
Yet, for the upper line, the line junctions are not really line junctions. Therefore, the plus signs can be replaced by dashes, for an horizontal plain line.
.----.-----------------------------.----.
| H | | He |
.----.----.----.----.----.----.----.----.
| Li | Be | B | C | N | O | F | Ne |
.----.----.----.----.----.----.----.----.
| Na | Mg | Al | Si | P | S | Cl | Ar |
.----.----.----.----.----.----.----.----.
which results in:
+----+-----------------------------+----+
| H | | He |
+----+----+----+----+----+----+----+----+
| Li | Be | B | C | N | O | F | Ne |
+----+----+----+----+----+----+----+----+
| Na | Mg | Al | Si | P | S | Cl | Ar |
+----+----+----+----+----+----+----+----+
In the following table, some cells are the result of a merger of basic cells from the same column. For example:
.------------------.--------------.------.------.
| | Cretaceous | 130 | 65 |
| .--------------.------.------.
| Secondary | Jurassic | 200 | 130 |
| .--------------.------.------.
| | Triassic | 235 | 200 |
.------------------.--------------.------.------.
| | Permian | 280 | 235 |
| .--------------.------.------.
| | Carboniferous| 350 | 280 |
| .--------------.------.------.
| | Devonian | 400 | 350 |
| Primary .--------------.------.------.
| | Silurian | 440 | 400 |
| .--------------.------.------.
| | Ordovician | 500 | 440 |
| .--------------.------.------.
| | Cambrian | 540 | 500 |
.------------------.--------------.------.------.
results in:
+------------------+--------------+------+------+
| | Cretaceous | 130 | 65 |
| +--------------+------+------+
| Secondary | Jurassic | 200 | 130 |
| +--------------+------+------+
| | Triassic | 235 | 200 |
+------------------+--------------+------+------+
| | Permian | 280 | 235 |
| +--------------+------+------+
| | Carboniferous| 350 | 280 |
| +--------------+------+------+
| | Devonian | 400 | 350 |
| Primary +--------------+------+------+
| | Silurian | 440 | 400 |
| +--------------+------+------+
| | Ordovician | 500 | 440 |
| +--------------+------+------+
| | Cambrian | 540 | 500 |
+------------------+--------------+------+------+
Note that if you read a LATEX-generated version, there is a problem: the word "Primary" has disappeared. This is because it is aligned with a partial horizontal line and the LATEX table generator cannot cope with this situation. On the other side, the HTML version is fine. To obtain a better version in LATEX, you can shift the word one line upwards or downward:
.------------------.--------------.------.------.
| | Cretaceous | 130 | 65 |
| .--------------.------.------.
| Secondary | Jurassic | 200 | 130 |
| .--------------.------.------.
| | Triassic | 235 | 200 |
.------------------.--------------.------.------.
| | Permian | 280 | 235 |
| .--------------.------.------.
| | Carboniferous| 350 | 280 |
| .--------------.------.------.
| Primary | Devonian | 400 | 350 |
| .--------------.------.------.
| | Silurian | 440 | 400 |
| .--------------.------.------.
| | Ordovician | 500 | 440 |
| .--------------.------.------.
| | Cambrian | 540 | 500 |
.------------------.--------------.------.------.
which results in:
+------------------+--------------+------+------+
| | Cretaceous | 130 | 65 |
| +--------------+------+------+
| Secondary | Jurassic | 200 | 130 |
| +--------------+------+------+
| | Triassic | 235 | 200 |
+------------------+--------------+------+------+
| | Permian | 280 | 235 |
| +--------------+------+------+
| | Carboniferous| 350 | 280 |
| +--------------+------+------+
| Primary | Devonian | 400 | 350 |
| +--------------+------+------+
| | Silurian | 440 | 400 |
| +--------------+------+------+
| | Ordovician | 500 | 440 |
| +--------------+------+------+
| | Cambrian | 540 | 500 |
+------------------+--------------+------+------+
The downside is that now, in the HTML version, the "Primary" word is no longer center-aligned vertically. I must admit that the absence of vertical alignment would not have been noticed if my script had not removed all <br />
tags.
Go easy on the rowspans. In some cases, the table-generate
function gets confused while generating HTML. For example:
.----.----.----.
| 1 | 11 | 21 |
| 2 | 12 | 22 |
| 3 .----.----.
| 4 | 14 | 24 |
.----. 15 | 25 |
| 6 | 16 | 26 |
| 7 .----. 27 |
| 8 | 18 | 28 |
.----.----.----.
yields:
+----+----+----+
| 1 | 11 | 21 |
| 2 | 12 | 22 |
| 3 +----+----+
| 4 | 14 | 24 |
+----+ 15 | 25 |
| 6 | 16 | 26 |
| 7 +----+ 27 |
| 8 | 18 | 28 |
+----+----+----+
In LATEX, the result is correct, but not so in HTML.
According to Emacs' documentation, you can have cells that spans several columns and several rows at the same time, provided they are rectangular. For example,
.--.-----------------------------------------------.--.
|H | |He|
.--.--. .--.--.--.--.--.--.
|Li|Be| |B |C |N |O |F |Ne|
.--.--. .--.--.--.--.--.--.
|Na|Mg| |Al|Si|P |S |Cl|Ar|
.--.--.--.--.--.--.--.--.--.--.--.--.--.--.--.--.--.--.
|K |Ca|Sc|Ti|V |Cr|Mn|Fe|Co|Ni|Cu|Zn|Ga|Ge|As|Se|Br|Kr|
.--.--.--.--.--.--.--.--.--.--.--.--.--.--.--.--.--.--.
|Rb|Sr|Y |Zr|Nb|Mo|Tc|Ru|Rh|Pd|Ag|Cd|In|Sn|Sb|Te|I |Xe|
.--.--.--.--.--.--.--.--.--.--.--.--.--.--.--.--.--.--.
is forbidden. On the other side, you can have:
.--.--.-----------------------------.--------------.--.
|H | | | |He|
.--.--. .--.--.--.--.--.--.
|Li|Be| |B |C |N |O |F |Ne|
.--.--. .--.--.--.--.--.--.
|Na|Mg| |Al|Si|P |S |Cl|Ar|
.--.--.--.--.--.--.--.--.--.--.--.--.--.--.--.--.--.--.
|K |Ca|Sc|Ti|V |Cr|Mn|Fe|Co|Ni|Cu|Zn|Ga|Ge|As|Se|Br|Kr|
.--.--.--.--.--.--.--.--.--.--.--.--.--.--.--.--.--.--.
|Rb|Sr|Y |Zr|Nb|Mo|Tc|Ru|Rh|Pd|Ag|Cd|In|Sn|Sb|Te|I |Xe|
.--.--.--.--.--.--.--.--.--.--.--.--.--.--.--.--.--.--.
or:
.--.-----------------------------------------------.--.
|H | |He|
.--.--.-----------------------------.--.--.--.--.--.--.
|Li|Be| |B |C |N |O |F |Ne|
.--.--. .--.--.--.--.--.--.
|Na|Mg| |Al|Si|P |S |Cl|Ar|
.--.--.--.--.--.--.--.--.--.--.--.--.--.--.--.--.--.--.
|K |Ca|Sc|Ti|V |Cr|Mn|Fe|Co|Ni|Cu|Zn|Ga|Ge|As|Se|Br|Kr|
.--.--.--.--.--.--.--.--.--.--.--.--.--.--.--.--.--.--.
|Rb|Sr|Y |Zr|Nb|Mo|Tc|Ru|Rh|Pd|Ag|Cd|In|Sn|Sb|Te|I |Xe|
.--.--.--.--.--.--.--.--.--.--.--.--.--.--.--.--.--.--.
which give respectively:
+--+--+-----------------------------+--------------+--+
|H | | | |He|
+--+--+ +--+--+--+--+--+--+
|Li|Be| |B |C |N |O |F |Ne|
+--+--+ +--+--+--+--+--+--+
|Na|Mg| |Al|Si|P |S |Cl|Ar|
+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+
|K |Ca|Sc|Ti|V |Cr|Mn|Fe|Co|Ni|Cu|Zn|Ga|Ge|As|Se|Br|Kr|
+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+
|Rb|Sr|Y |Zr|Nb|Mo|Tc|Ru|Rh|Pd|Ag|Cd|In|Sn|Sb|Te|I |Xe|
+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+
+--+-----------------------------------------------+--+
|H | |He|
+--+--+-----------------------------+--+--+--+--+--+--+
|Li|Be| |B |C |N |O |F |Ne|
+--+--+ +--+--+--+--+--+--+
|Na|Mg| |Al|Si|P |S |Cl|Ar|
+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+
|K |Ca|Sc|Ti|V |Cr|Mn|Fe|Co|Ni|Cu|Zn|Ga|Ge|As|Se|Br|Kr|
+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+
|Rb|Sr|Y |Zr|Nb|Mo|Tc|Ru|Rh|Pd|Ag|Cd|In|Sn|Sb|Te|I |Xe|
+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+
Actually, the first possibility, with the one empty cell, is still recognised by Emacs as a table, although there is some confusion about the extent of this cell. The HTML generation gives incorrect results, while the LATEX generation gives a quite acceptable result. See for yourself:
+--+-----------------------------------------------+--+
|H | |He|
+--+--+ +--+--+--+--+--+--+
|Li|Be| |B |C |N |O |F |Ne|
+--+--+ +--+--+--+--+--+--+
|Na|Mg| |Al|Si|P |S |Cl|Ar|
+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+
|K |Ca|Sc|Ti|V |Cr|Mn|Fe|Co|Ni|Cu|Zn|Ga|Ge|As|Se|Br|Kr|
+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+
|Rb|Sr|Y |Zr|Nb|Mo|Tc|Ru|Rh|Pd|Ag|Cd|In|Sn|Sb|Te|I |Xe|
+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+
The layout system functions with the following steps.
The standard conversion programme
pod2html
orpod2latex
An E-lisp script, interpreted by Emacs in batch mode, because you can launche Emacs in batch mode, that is, without opening a window and without input from the human-operated keyboard. This script searches for tables inside
<pre>
tags in the generated HTML, or insideverbatim
environments in the generated LATEX file, and replace them with real<table>
tables or realtabular
environments.A Perl script, which performs various updates.
That's all for the HTML file, but for LATEX, you still have to generate
.dvi
or.pdf
files.
Ths E-lisp script includes two interactive functions, which are nothing more than wrappers calling a third function with the proper parameteres. Even if the script is meant to be used in batch mode, it can be useful from time to time to call these functions interactively from Emacs, to debug them or to add new features.
The script main body is a loop with a regexp search for a table horizontal line. Usually, if the search finds nothing, this triggers an exception which halts the script. As a result, the lines after the end of the loop are not executed, including the save-file command. In addition, in a makefile
, the script ends with a status code indicating a failure and the make
process stops. So we prefer a search which does not halt execution, but which returns an internal value indicating that the process must leave the loop. This is achieved by giving a third parameter to the re-search-forward
function. As a result, we have to provide a second parameter to the function. But this is no big deal, the point-max
function is perfect for this purpose.
Each iteration deals with a different table. The iteration uses the following steps:
The script locates the beginning of the text span that contains the table, that is, the
<pre>
tag, or the LATEX statement\begin{verbatim}
.The script places the
point
inside the table and activates the table functions of Emacs (by usingtable-recognize
).The script generates the source of the table in the target language, HTML or LATEX.
The script locates the end of the text span that contains the table, that is, the
</pre>
tag, or the LATEX statement\end{verbatim}
.The script deletes the
pre
orverbatim
source.The script switches to the buffer with the generated source. It makes a few amendments. For example, for HTML, I do not care about the precise amount of space and the exact number of line breaks. So the script removes the
entities and the<br />
tags. And for LATEX, the script inserts vertical spaces to prevent the table from being glued to the preceding and following paragraphs.The script copies and pastes the generated source into the HTML or LATEX file.
And the script loops, searching a new table to process.
Upon exiting the loop, the buffer is written into a file, the name of which is the input filename, shortened by one char. So the extension .html1 becomes .html and the extension .tex1 becomes .tex.
If the table is not correct, for example if the top line or the bottom line are missing, the result will be a truncated array. Thus:
| Vanish | Evaporated | Invisible |
.---------------.---------------.---------------.
| Stays | Present | Permanent |
.---------------.---------------.---------------.
| Missing | Hidden | Absent |
gives :
| Vanish | Evaporated | Invisible |
+---------------+---------------+---------------+
| Stays | Present | Permanent |
+---------------+---------------+---------------+
| Missing | Hidden | Absent |
And even the script crashes with:
| Vanish | Evaporated | Invisible |
.---------------.---------------.---------------.
| Missing | Hidden | Absent |
Also, if two arrays are contiguous, without a blank line, only the first one will appear in the generated file. Thus:
.---------------.---------------.
| Stays | Present |
.---------------.---------------.
| Constant | Permanent |
.---------------.---------------.
.---------------.---------------.---------------.
| Vanish | Evaporated | Invisible |
.---------------.---------------.---------------.
| Missing | Hidden | Absent |
.---------------.---------------.---------------.
will result in:
+---------------+---------------+
| Stays | Present |
+---------------+---------------+
| Constant | Permanent |
+---------------+---------------+
+---------------+---------------+---------------+
| Vanish | Evaporated | Invisible |
+---------------+---------------+---------------+
| Missing | Hidden | Absent |
+---------------+---------------+---------------+
The Perl scripts executes a few simple modifications, but necessary ones.
The text I translated had a copyright and a trademark notices. No problem for the copyright, the POD E<copy>
string gives the proper result. But I do not know anything similar for the trademark symbol. So the Perl script looks for the \bTM\b
regexp with word boundaries (so the "ATMOSPHERE" and "HETMAN" words and especially the "HTML" acronym will not be modified) and replaces it with the proper HTML sequence.
Same thing, for the present read-me.pod
file, the \bLATEX\b
regexp is replaced with the sequence including vertical shifts of the letters A
and epsilon E
.
This script has the same functions as ajust-html, plus a few others.
Firstly, for the LATEX file, we switch from utf-8
encoding to ISO-8859-1
encoding, which was not necessary for HTML.
Secondly, in the rules I translated, there were several references to other paragraphs of the rules, such as:
(see Short background --- p.1)
I have translated these references in the POD translation and I have added L< >
tags around the paragraph title, while keeping the page number. So the generated HTML will contain meaningless page numbers, because if you print the rules, your page numbers will vary a lot, depending on the font size you use. On the other hand, the LATEX output (.dvi
or .pdf
) will have stable page numbers, so it is worth using the right page numbers in the generated file. LATEX provides a function for this purpose: \thepageref
, but pod2latex
does not generate it. So the Perl script parses the LATEX source to insert this function where appropriate.
This is done in two passes. The first pass locates all \section
statements and similar and stores the corresponding label. The second pass locates all references
(see Short background --- p.1)
extracts the label associated to the paragraph title ("Short background" in this case) and replaces the page number with the \pageref
statement. That gives:
Cf. "Presentation of the read-me file" --- p.1, cf. "Short Background" --- p.1, cf. Installation --- p.1, cf. "How to write tables in your editor" --- p.1, cf. "Basic Example" --- p.1, cf. "Wide Cell" --- p.1, cf. "Heightened Cell" --- p.1, cf. "Two-Dimension Extensions" --- p.1, cf. "How It Works" --- p.1, cf. "How the E-lisp script works" --- p.1, cf. "How the Perl Scripts Works" --- p.1.
This means that you parse a LATEX file with regexps, which is as dirty as parsing HTML with regexps. But the LATEX source is a very simple one and therefore you can use simple regexps to search for simple patterns in a simple source text.
Thirdly, the LATEX file generated by pod2latex
is not a standalone file, like the HTML generated file. So the ajust-tex script merges this generated file with a skeleton file.
A last remark, which applies not only to my layout system, but also to LATEX in general. When a LATEX document contains page references, it can happen that some page numbers are skewed, because the generation uses an old .aux
file. This is why the latex
or pdflatex
command is used twice in a row. The first time, its purpose is to refresh the .aux
file with up-to-date page numbers. The second time, its purpose is to generate the output file with the proper page numbers.
Coding in E-lisp is fine, but coding in Perl is better. I plan to write a Perl module which would do the same as table-recognize
and perhaps also table-generate
(although there are already modules generating HTML tables on CPAN). I do not know yet if this will be a module dealing with plain text or a pod2html
and pod2latex
plug-in.
The programmes and the additional files are published under the same terms as Perl: GNU General Public License (GPL) or Artistic License.