Write tables in POD and convert them into HTML tables and LATEX tables
Perl
Switch branches/tags
Nothing to show
Clone or download
Fetching latest commit…
Cannot retrieve the latest commit at this time.
Permalink
Failed to load latest commit information.
.gitignore
README.pod
README.skel
ajust-html
ajust-tex
lisez-moi.pod
lisez-moi.skel
makefile
tableau
test.html.ref
test.html1
test.skel
test.tex.ref
test.tex1

README.pod

READ-ME

Presentation of the read-me file

This read-me.pod file has a double purpose.

  1. Describe how the layout system functions.

  2. Be used as an input file for a demo of the layout system.

In other words, either you read the POD file as is to learn what will happen, or you read the HTML generated file or the PDF generated file to learn what has juste happened.

Short Background

I have developed this layout system because I noticed a boardgame, the rules of which are available on the net. I though it would be nice if this game had a larger audience, including French-speaking people. In addition, the rules were rather short, so I decided to translate them in French.

As many Perl programmers, I like using the POD format, because it is simple and yet it provides most basic features: chapters, lists, links, bold chars, italics. The only missing basic feature is tables. Alas, there were several tables in the game rules. Of course, when using POD, you can always show a table by using preformatted text, as this:

.-----.-----.-----.
|  8  |  1  |  6  |
.-----.-----.-----.
|  3  |  5  |  7  |
.-----.-----.-----.
|  4  |  9  |  2  |
.-----.-----.-----.

The result is acceptable when using perldoc to display the POD or when reading it in your favourite text editor, but the same thing in HTML with <pre> tags or in LATEX within a verbatim environment is quite frustrating. Having the following is much better:

+-----+-----+-----+
|  8  |  1  |  6  |
+-----+-----+-----+
|  3  |  5  |  7  |
+-----+-----+-----+
|  4  |  9  |  2  |
+-----+-----+-----+

(Of course, if you read this using perldoc or you favourite text editor, or if you browse it on the Git Hub website, you will not see much difference).

There is the Pseudo::POD module, which allows us to generate pretty tables in HTML, LATEX and other advanced display formats, but when you edit your source file under Emacs / vi / other or when you display the file with perldoc, the result is disappointing.

Another solution consists in generating the HTML / LATEX file, and then edit it using Emacs, to replace the ASCII-art table with a proper table. See the table-recognize and table-generate-source functions, as well as the other table-xxx functions. The downside is that this is incompatible with automatic processing and makefiles.

This is why I created these scripts, to layout nice tables from a makefile procedure, without requiring any intervention once make is running.

Installation

You just need to download the repository and store its files into the same directory. Then, run make for a demo of what the programmes do.

There are a few prerequisites for the programmes.

  • A complete installation of Perl. If you have installed Perl with your distro's .rpm's or .deb's, make sure that perldoc, pod2html and pod2latex have been installed, too. Sometimes, they are not in the main Perl package.

  • A rather recent version of Emacs. Version 23 is fine, I think that version 22 is fine, too, but version 21 is too much old.

  • An HTML browser, compatible with UTF-8 and tables. But nowadays, is there still anybody who has not such a browser?

  • A LATEX installation, including the commands latex and pdflatex. You will need also the Palatino (ppl), Helvetica (phv) and Courier (pcr) fonts, and the babel, fontenc, textcomp, graphicx, hyperref et breakurl packages, even if pod2latex does not benefit from breakurl. Actually, I always use all these packages, in the same manner many Perl programmers use strict and warnings. But if you do not like these fonts and packages, you can edit the .skel files to remove them and possibly add others.

How to write tables in your editor

You can read the chapter about table-based text in the Emacs documentation. Anyway, here is what you need to know, even if you use another text editor.

The table must be interpreted by pod2html and pod2latex as pre-formatted text. Therefore, each line must begin with some space.

In addition, the tables must be recognized as such by Emacs. Therefore, the lines delimiting the rows and the columns must be minus signs - and pipes |. The junctions between lines must be plus signs +.

Note: For technical reasons, I will show examples below, where the line junctions are dots instead of plus signs. Just imagine they are plus signs.

Basic Example

In this table, all cells have the same size.

.----.----.----.----.
| 13 |  2 |  3 | 16 |
.----.----.----.----.
|  1 | 14 | 15 |  4 |
.----.----.----.----.
|  8 | 11 | 10 |  5 |
.----.----.----.----.
| 12 |  7 |  6 |  9 |
.----.----.----.----.

which results in:

+----+----+----+----+
| 13 |  2 |  3 | 16 |
+----+----+----+----+
|  1 | 14 | 15 |  4 |
+----+----+----+----+
|  8 | 11 | 10 |  5 |
+----+----+----+----+
| 12 |  7 |  6 |  9 |
+----+----+----+----+

Wide Cell

In this table, several cells from the same row have been merged into a wide cell.

.----.----.----.----.----.----.----.----.
| H  |                             | He |
.----.----.----.----.----.----.----.----.
| Li | Be | B  | C  | N  | O  | F  | Ne |
.----.----.----.----.----.----.----.----.
| Na | Mg | Al | Si | P  | S  | Cl | Ar |
.----.----.----.----.----.----.----.----.

which results in:

+----+----+----+----+----+----+----+----+
| H  |                             | He |
+----+----+----+----+----+----+----+----+
| Li | Be | B  | C  | N  | O  | F  | Ne |
+----+----+----+----+----+----+----+----+
| Na | Mg | Al | Si | P  | S  | Cl | Ar |
+----+----+----+----+----+----+----+----+

Yet, for the upper line, the line junctions are not really line junctions. Therefore, the plus signs can be replaced by dashes, for an horizontal plain line.

.----.-----------------------------.----.
| H  |                             | He |
.----.----.----.----.----.----.----.----.
| Li | Be | B  | C  | N  | O  | F  | Ne |
.----.----.----.----.----.----.----.----.
| Na | Mg | Al | Si | P  | S  | Cl | Ar |
.----.----.----.----.----.----.----.----.

which results in:

+----+-----------------------------+----+
| H  |                             | He |
+----+----+----+----+----+----+----+----+
| Li | Be | B  | C  | N  | O  | F  | Ne |
+----+----+----+----+----+----+----+----+
| Na | Mg | Al | Si | P  | S  | Cl | Ar |
+----+----+----+----+----+----+----+----+

Heightened Cell

In the following table, some cells are the result of a merger of basic cells from the same column. For example:

.------------------.--------------.------.------.
|                  | Cretaceous   |  130 |   65 |
|                  .--------------.------.------.
| Secondary        | Jurassic     |  200 |  130 |
|                  .--------------.------.------.
|                  | Triassic     |  235 |  200 |
.------------------.--------------.------.------.
|                  | Permian      |  280 |  235 |
|                  .--------------.------.------.
|                  | Carboniferous|  350 |  280 |
|                  .--------------.------.------.
|                  | Devonian     |  400 |  350 |
| Primary          .--------------.------.------.
|                  | Silurian     |  440 |  400 |
|                  .--------------.------.------.
|                  | Ordovician   |  500 |  440 |
|                  .--------------.------.------.
|                  | Cambrian     |  540 |  500 |
.------------------.--------------.------.------.

results in:

+------------------+--------------+------+------+
|                  | Cretaceous   |  130 |   65 |
|                  +--------------+------+------+
| Secondary        | Jurassic     |  200 |  130 |
|                  +--------------+------+------+
|                  | Triassic     |  235 |  200 |
+------------------+--------------+------+------+
|                  | Permian      |  280 |  235 |
|                  +--------------+------+------+
|                  | Carboniferous|  350 |  280 |
|                  +--------------+------+------+
|                  | Devonian     |  400 |  350 |
| Primary          +--------------+------+------+
|                  | Silurian     |  440 |  400 |
|                  +--------------+------+------+
|                  | Ordovician   |  500 |  440 |
|                  +--------------+------+------+
|                  | Cambrian     |  540 |  500 |
+------------------+--------------+------+------+

Note that if you read a LATEX-generated version, there is a problem: the word "Primary" has disappeared. This is because it is aligned with a partial horizontal line and the LATEX table generator cannot cope with this situation. On the other side, the HTML version is fine. To obtain a better version in LATEX, you can shift the word one line upwards or downward:

.------------------.--------------.------.------.
|                  | Cretaceous   |  130 |   65 |
|                  .--------------.------.------.
| Secondary        | Jurassic     |  200 |  130 |
|                  .--------------.------.------.
|                  | Triassic     |  235 |  200 |
.------------------.--------------.------.------.
|                  | Permian      |  280 |  235 |
|                  .--------------.------.------.
|                  | Carboniferous|  350 |  280 |
|                  .--------------.------.------.
| Primary          | Devonian     |  400 |  350 |
|                  .--------------.------.------.
|                  | Silurian     |  440 |  400 |
|                  .--------------.------.------.
|                  | Ordovician   |  500 |  440 |
|                  .--------------.------.------.
|                  | Cambrian     |  540 |  500 |
.------------------.--------------.------.------.

which results in:

+------------------+--------------+------+------+
|                  | Cretaceous   |  130 |   65 |
|                  +--------------+------+------+
| Secondary        | Jurassic     |  200 |  130 |
|                  +--------------+------+------+
|                  | Triassic     |  235 |  200 |
+------------------+--------------+------+------+
|                  | Permian      |  280 |  235 |
|                  +--------------+------+------+
|                  | Carboniferous|  350 |  280 |
|                  +--------------+------+------+
| Primary          | Devonian     |  400 |  350 |
|                  +--------------+------+------+
|                  | Silurian     |  440 |  400 |
|                  +--------------+------+------+
|                  | Ordovician   |  500 |  440 |
|                  +--------------+------+------+
|                  | Cambrian     |  540 |  500 |
+------------------+--------------+------+------+

The downside is that now, in the HTML version, the "Primary" word is no longer center-aligned vertically. I must admit that the absence of vertical alignment would not have been noticed if my script had not removed all <br /> tags.

Go easy on the rowspans. In some cases, the table-generate function gets confused while generating HTML. For example:

.----.----.----.
|  1 | 11 | 21 |
|  2 | 12 | 22 |
|  3 .----.----.
|  4 | 14 | 24 |
.----. 15 | 25 |
|  6 | 16 | 26 |
|  7 .----. 27 |
|  8 | 18 | 28 |
.----.----.----.

yields:

+----+----+----+
|  1 | 11 | 21 |
|  2 | 12 | 22 |
|  3 +----+----+
|  4 | 14 | 24 |
+----+ 15 | 25 |
|  6 | 16 | 26 |
|  7 +----+ 27 |
|  8 | 18 | 28 |
+----+----+----+

In LATEX, the result is correct, but not so in HTML.

Two-Dimension Extensions

According to Emacs' documentation, you can have cells that spans several columns and several rows at the same time, provided they are rectangular. For example,

.--.-----------------------------------------------.--.
|H |                                               |He|
.--.--.                             .--.--.--.--.--.--.
|Li|Be|                             |B |C |N |O |F |Ne|
.--.--.                             .--.--.--.--.--.--.
|Na|Mg|                             |Al|Si|P |S |Cl|Ar|
.--.--.--.--.--.--.--.--.--.--.--.--.--.--.--.--.--.--.
|K |Ca|Sc|Ti|V |Cr|Mn|Fe|Co|Ni|Cu|Zn|Ga|Ge|As|Se|Br|Kr|
.--.--.--.--.--.--.--.--.--.--.--.--.--.--.--.--.--.--.
|Rb|Sr|Y |Zr|Nb|Mo|Tc|Ru|Rh|Pd|Ag|Cd|In|Sn|Sb|Te|I |Xe|
.--.--.--.--.--.--.--.--.--.--.--.--.--.--.--.--.--.--.

is forbidden. On the other side, you can have:

.--.--.-----------------------------.--------------.--.
|H |  |                             |              |He|
.--.--.                             .--.--.--.--.--.--.
|Li|Be|                             |B |C |N |O |F |Ne|
.--.--.                             .--.--.--.--.--.--.
|Na|Mg|                             |Al|Si|P |S |Cl|Ar|
.--.--.--.--.--.--.--.--.--.--.--.--.--.--.--.--.--.--.
|K |Ca|Sc|Ti|V |Cr|Mn|Fe|Co|Ni|Cu|Zn|Ga|Ge|As|Se|Br|Kr|
.--.--.--.--.--.--.--.--.--.--.--.--.--.--.--.--.--.--.
|Rb|Sr|Y |Zr|Nb|Mo|Tc|Ru|Rh|Pd|Ag|Cd|In|Sn|Sb|Te|I |Xe|
.--.--.--.--.--.--.--.--.--.--.--.--.--.--.--.--.--.--.

or:

.--.-----------------------------------------------.--.
|H |                                               |He|
.--.--.-----------------------------.--.--.--.--.--.--.
|Li|Be|                             |B |C |N |O |F |Ne|
.--.--.                             .--.--.--.--.--.--.
|Na|Mg|                             |Al|Si|P |S |Cl|Ar|
.--.--.--.--.--.--.--.--.--.--.--.--.--.--.--.--.--.--.
|K |Ca|Sc|Ti|V |Cr|Mn|Fe|Co|Ni|Cu|Zn|Ga|Ge|As|Se|Br|Kr|
.--.--.--.--.--.--.--.--.--.--.--.--.--.--.--.--.--.--.
|Rb|Sr|Y |Zr|Nb|Mo|Tc|Ru|Rh|Pd|Ag|Cd|In|Sn|Sb|Te|I |Xe|
.--.--.--.--.--.--.--.--.--.--.--.--.--.--.--.--.--.--.

which give respectively:

+--+--+-----------------------------+--------------+--+
|H |  |                             |              |He|
+--+--+                             +--+--+--+--+--+--+
|Li|Be|                             |B |C |N |O |F |Ne|
+--+--+                             +--+--+--+--+--+--+
|Na|Mg|                             |Al|Si|P |S |Cl|Ar|
+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+
|K |Ca|Sc|Ti|V |Cr|Mn|Fe|Co|Ni|Cu|Zn|Ga|Ge|As|Se|Br|Kr|
+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+
|Rb|Sr|Y |Zr|Nb|Mo|Tc|Ru|Rh|Pd|Ag|Cd|In|Sn|Sb|Te|I |Xe|
+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+

+--+-----------------------------------------------+--+
|H |                                               |He|
+--+--+-----------------------------+--+--+--+--+--+--+
|Li|Be|                             |B |C |N |O |F |Ne|
+--+--+                             +--+--+--+--+--+--+
|Na|Mg|                             |Al|Si|P |S |Cl|Ar|
+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+
|K |Ca|Sc|Ti|V |Cr|Mn|Fe|Co|Ni|Cu|Zn|Ga|Ge|As|Se|Br|Kr|
+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+
|Rb|Sr|Y |Zr|Nb|Mo|Tc|Ru|Rh|Pd|Ag|Cd|In|Sn|Sb|Te|I |Xe|
+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+

Actually, the first possibility, with the one empty cell, is still recognised by Emacs as a table, although there is some confusion about the extent of this cell. The HTML generation gives incorrect results, while the LATEX generation gives a quite acceptable result. See for yourself:

+--+-----------------------------------------------+--+
|H |                                               |He|
+--+--+                             +--+--+--+--+--+--+
|Li|Be|                             |B |C |N |O |F |Ne|
+--+--+                             +--+--+--+--+--+--+
|Na|Mg|                             |Al|Si|P |S |Cl|Ar|
+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+
|K |Ca|Sc|Ti|V |Cr|Mn|Fe|Co|Ni|Cu|Zn|Ga|Ge|As|Se|Br|Kr|
+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+
|Rb|Sr|Y |Zr|Nb|Mo|Tc|Ru|Rh|Pd|Ag|Cd|In|Sn|Sb|Te|I |Xe|
+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+
 

How It Works

The layout system functions with the following steps.

  1. The standard conversion programme pod2html or pod2latex

  2. An E-lisp script, interpreted by Emacs in batch mode, because you can launche Emacs in batch mode, that is, without opening a window and without input from the human-operated keyboard. This script searches for tables inside <pre> tags in the generated HTML, or inside verbatim environments in the generated LATEX file, and replace them with real <table> tables or real tabular environments.

  3. A Perl script, which performs various updates.

  4. That's all for the HTML file, but for LATEX, you still have to generate .dvi or .pdf files.

How the E-lisp script works

Ths E-lisp script includes two interactive functions, which are nothing more than wrappers calling a third function with the proper parameteres. Even if the script is meant to be used in batch mode, it can be useful from time to time to call these functions interactively from Emacs, to debug them or to add new features.

The script main body is a loop with a regexp search for a table horizontal line. Usually, if the search finds nothing, this triggers an exception which halts the script. As a result, the lines after the end of the loop are not executed, including the save-file command. In addition, in a makefile, the script ends with a status code indicating a failure and the make process stops. So we prefer a search which does not halt execution, but which returns an internal value indicating that the process must leave the loop. This is achieved by giving a third parameter to the re-search-forward function. As a result, we have to provide a second parameter to the function. But this is no big deal, the point-max function is perfect for this purpose.

Each iteration deals with a different table. The iteration uses the following steps:

  1. The script locates the beginning of the text span that contains the table, that is, the <pre> tag, or the LATEX statement \begin{verbatim}.

  2. The script places the point inside the table and activates the table functions of Emacs (by using table-recognize).

  3. The script generates the source of the table in the target language, HTML or LATEX.

  4. The script locates the end of the text span that contains the table, that is, the </pre> tag, or the LATEX statement \end{verbatim}.

  5. The script deletes the pre or verbatim source.

  6. The script switches to the buffer with the generated source. It makes a few amendments. For example, for HTML, I do not care about the precise amount of space and the exact number of line breaks. So the script removes the &nbsp; entities and the <br /> tags. And for LATEX, the script inserts vertical spaces to prevent the table from being glued to the preceding and following paragraphs.

  7. The script copies and pastes the generated source into the HTML or LATEX file.

And the script loops, searching a new table to process.

Upon exiting the loop, the buffer is written into a file, the name of which is the input filename, shortened by one char. So the extension .html1 becomes .html and the extension .tex1 becomes .tex.

Known problems

If the table is not correct, for example if the top line or the bottom line are missing, the result will be a truncated array. Thus:

| Vanish        | Evaporated    | Invisible     |
.---------------.---------------.---------------.
| Stays         | Present       | Permanent     |
.---------------.---------------.---------------.
| Missing       | Hidden        | Absent        |

gives :

| Vanish        | Evaporated    | Invisible     |
+---------------+---------------+---------------+
| Stays         | Present       | Permanent     |
+---------------+---------------+---------------+
| Missing       | Hidden        | Absent        |

And even the script crashes with:

| Vanish        | Evaporated    | Invisible     |
.---------------.---------------.---------------.
| Missing       | Hidden        | Absent        |

Also, if two arrays are contiguous, without a blank line, only the first one will appear in the generated file. Thus:

.---------------.---------------.
| Stays         | Present       |
.---------------.---------------.
| Constant      | Permanent     |
.---------------.---------------.
.---------------.---------------.---------------.
| Vanish        | Evaporated    | Invisible     |
.---------------.---------------.---------------.
| Missing       | Hidden        | Absent        |
.---------------.---------------.---------------.

will result in:

+---------------+---------------+
| Stays         | Present       |
+---------------+---------------+
| Constant      | Permanent     |
+---------------+---------------+
+---------------+---------------+---------------+
| Vanish        | Evaporated    | Invisible     |
+---------------+---------------+---------------+
| Missing       | Hidden        | Absent        |
+---------------+---------------+---------------+

How the Perl Scripts Works

The Perl scripts executes a few simple modifications, but necessary ones.

ajust-html

The text I translated had a copyright and a trademark notices. No problem for the copyright, the POD E<copy> string gives the proper result. But I do not know anything similar for the trademark symbol. So the Perl script looks for the \bTM\b regexp with word boundaries (so the "ATMOSPHERE" and "HETMAN" words and especially the "HTML" acronym will not be modified) and replaces it with the proper HTML sequence.

Same thing, for the present read-me.pod file, the \bLATEX\b regexp is replaced with the sequence including vertical shifts of the letters A and epsilon E.

ajust-tex

This script has the same functions as ajust-html, plus a few others.

Firstly, for the LATEX file, we switch from utf-8 encoding to ISO-8859-1 encoding, which was not necessary for HTML.

Secondly, in the rules I translated, there were several references to other paragraphs of the rules, such as:

(see Short background --- p.1)

I have translated these references in the POD translation and I have added L< > tags around the paragraph title, while keeping the page number. So the generated HTML will contain meaningless page numbers, because if you print the rules, your page numbers will vary a lot, depending on the font size you use. On the other hand, the LATEX output (.dvi or .pdf) will have stable page numbers, so it is worth using the right page numbers in the generated file. LATEX provides a function for this purpose: \thepageref, but pod2latex does not generate it. So the Perl script parses the LATEX source to insert this function where appropriate.

This is done in two passes. The first pass locates all \section statements and similar and stores the corresponding label. The second pass locates all references

(see Short background --- p.1)

extracts the label associated to the paragraph title ("Short background" in this case) and replaces the page number with the \pageref statement. That gives:

Cf. "Presentation of the read-me file" --- p.1, cf. "Short Background" --- p.1, cf. Installation --- p.1, cf. "How to write tables in your editor" --- p.1, cf. "Basic Example" --- p.1, cf. "Wide Cell" --- p.1, cf. "Heightened Cell" --- p.1, cf. "Two-Dimension Extensions" --- p.1, cf. "How It Works" --- p.1, cf. "How the E-lisp script works" --- p.1, cf. "How the Perl Scripts Works" --- p.1.

This means that you parse a LATEX file with regexps, which is as dirty as parsing HTML with regexps. But the LATEX source is a very simple one and therefore you can use simple regexps to search for simple patterns in a simple source text.

Thirdly, the LATEX file generated by pod2latex is not a standalone file, like the HTML generated file. So the ajust-tex script merges this generated file with a skeleton file.

A last remark, which applies not only to my layout system, but also to LATEX in general. When a LATEX document contains page references, it can happen that some page numbers are skewed, because the generation uses an old .aux file. This is why the latex or pdflatex command is used twice in a row. The first time, its purpose is to refresh the .aux file with up-to-date page numbers. The second time, its purpose is to generate the output file with the proper page numbers.

What next?

Coding in E-lisp is fine, but coding in Perl is better. I plan to write a Perl module which would do the same as table-recognize and perhaps also table-generate (although there are already modules generating HTML tables on CPAN). I do not know yet if this will be a module dealing with plain text or a pod2html and pod2latex plug-in.

License

The programmes and the additional files are published under the same terms as Perl: GNU General Public License (GPL) or Artistic License.