Emacs Lisp XSLT
Fetching latest commit…
Cannot retrieve the latest commit at this time.
|Failed to load latest commit information.|
====================== README for ``xxml.el`` ====================== .. contents:: .. sectnum:: Presentation ============ Lennart Staflin's PSGML is a wonderful Emacs tool for editing SGML files, these cover HTML and XML files as well. ``xxml.el`` builds over PSGML by providing highlighting features which I like better. It also provides commands for re-indentation of lines and refilling of entities, in such a way that the SGML structure is visually restored or preserved. ``xxml.el`` depends on SGML characters like ``<``and ``>`` not being changed. This documentation exists as http://xxml.progiciels-bpi.ca/index.html in HTML form, or as a plain text `README file`__. The distribution also contains administrative files. The URL http://xxml.progiciels-bpi.ca/xxml.el repeats file ``xxml.el`` from the distribution, but this is merely for convenience. The SuSE distribution includes ``xxml.el`` within the ``psgml`` package. __ https://github.com/pinard/xxml/blob/master/README ``xxml.el`` seems pretty usable as it stands, despite we know that some problems remain, see below. Please gently report problems, suggestions or other comments to François Pinard, mailto:email@example.com. Installation ============ This code has been initially written for Emacs 20.3.11 and PSGML 1.1.5. It seems to work now with Emacs 21.2.92 and PSGML 1.2.3, so I guess it should work for intermediate versions as well. When one uses Emacs to visit an SGML or HTML file, or even a DTD, and with the proper setup, PSGML loads itself and installs a special edition mode. The idea is to modify this mechanic slightly, so the ``xxml.el`` file gets loaded as well, to provide a few extra features. Here is how I link this module from my ``.emacs`` file:: (autoload 'sgml-mode "psgml" "Major mode to edit SGML files." t) (autoload 'html-mode "xxml" "Major mode to edit HTML files." t) (autoload 'xxml-mode-routine "xxml") (add-hook 'sgml-mode-hook 'xxml-mode-routine) Highlighting ============ Principles ---------- ``xxml.el`` goal, as far as highlighting goes, was first to give a better appearance to opening tags though lighter separate coloring for attribute names and values. Closing tags colouring was unrelated with opening tags, ``xxml.el`` rather recycles the colouring of angular brackets from opening tags into the brackets of closing tags: brackets ``<`` and ``>`` have a uniform color for all kind of tags, yet within tags, colour gives a quick clue at the kind of tag. We gain legibility. Character entities, either symbolic, decimal or hexadecimal, are rendered specially. However, I prefer avoiding entities when easily doable, favouring real characters instead. In particular, `` `` is clumsy for the eye, while non breakable spaces can be used directly, and are displayed as a grey underline. Recipe for usage ---------------- Unbreakable spaces are easily produced with command ``M-_``, which ``xxml.el`` adds to PSGML. Known problems -------------- A small problem is still unsolved. The comment block containing PSGML options, at end of file, is not always *fontified* on initial visit. One has to revisit the file once more to get it right. This is strange, but innocuous, so I did not spend much time on this one. Indenting ========= Principles ---------- The indenting step has to be greater than zero, otherwise one needs a deep and vivid knowledge of the associated DTD to quickly interpret SGML code. An indenting step of 2, which is implicit in SGML mode, seems too aggressive on horizontal space, which is a precious resource. Happily for us, the only intermediate value is quite acceptable. There are editing difficulties when closing tags appear far below opening tags or when nesting is deep. Genuine PSGML mode comes to the rescue through its moving commands (like ``M-C-f``, ``M-C-b`` or ``M-C-u`` for a few useful ones), or throught its synoptic abilities which allow for visually folding parts of the SGML structure. There is no ``M-C-q`` command in PSGML for adequately re-indenting all lines in the scope of an SGML element, as Emacs permits for LISP, Perl or C statements or expressions. So ``xxml.el`` provides one. Indenting a lengthy text with ``xxml.el`` may take quite a while, a progress indicator is updated every second or so. The delay may be adjusted. Recipe for usage ---------------- The ``M-C-q`` command finds out the smallest SGML element around the cursor and re-indents those lines. If the cursor is close to the beginning of file, it is likely that this command will indent more lines and be slower. Since this command relies on PSGML, best is to declare the DTD properly. CHECK: The ``M-C-q`` command discovers which SGML element holds the cursor, then re-indents all lines of this element, without otherwise modifying the lines. More lines are processed when the cursor is located near the outside of the overall structure. When the cursor is at the beginning of the file, the whole file is processed. Of course, ``xxml.el`` depends on PSGML for analysing the text structure, so at the time, the DTD ought to be correctly declared. The command uses the default indentation step, but it may be overridden through the usage of a prefix argument. Value 0 forces the removal of all indentation, making all tags appear flush with the left margin. A negative prefix argument flags that white lines around tags should get removed, in which case the absolute value of the prefix argument is used as the indentation step. This command tries to split or merge lines as needed with the goal of making the structural information very explicit, often at the expense of vertical space. Yet, all attributes are packed after the opening tag, all on one possibly long line. Re-indentation has side effects under control of user options. It may for example remove end tags which are forbidden. CHECK: Unless the command is prefixed, it manages so each tag gets alone on its line. This underlines the structural information even more, as each tag is then indented separately. If a tag spans many lines because it has numerous attributes, they all get merged in a single long line. This may look strange, but it helps later analysis for structural refilling, and the tag may also be exploded onto many lines through any ``M-q`` command (see below). CHECK: So, by default, the ``M-C-q`` command cuts lines around tags while indenting, because experience taught us that this is the most useful thing to do on average. One has to use ``C-u M-C-q`` to inhibit cuts. Known problems -------------- While lines are being re-indented or re-cut, ``xxml.el`` makes a special effort to protect suffix white space. On the other hand, re-indenting after cut removes all meaning to prefixes, and I do not know if this creates a practical problem or not. If yes, this is a delicate problem with no evident solution. Another difficulty is that cutting lines might introduce spurious ``#CDATA`` holding only white space, where DTD just does not permit. I vaguely remember diagnostics from ``nsgmls`` yielding me to think that SGML is not that "free field" after all. If some cuts are just unwelcome, my approach may convey a serious problem. With enough luck, PSGML might give enough access to the digested DTD so cuts could be inhibited, depending on the needs and spots in an SGML text. Reordering of attributes sometimes mangle text, so I inactivated it. Refilling ========= Principles ---------- About filling lines, SGML is not different from most languages. There are many ways to tackle the problem and decide how to proceed. ``xxml.el`` uses a few strong principles, for cutting down possibilities and guiding decisions. This command tries to get rid of whitespace, within preset left and right margins, while leaving visual clues to the logical imbrication structure. In SGML as well as for most languages, there is no single solution to the refilling problem, so arbitrary guidelines have to be preset and followed. Here are a few of those we selected: - an increase of the margin means a deeper dive into the SGML structure; - whitespace may be spared more aggressively, as highlighting offers clues; - start tags indentation is to be more prominent than for end tags; - end tags are batched on one line exactly as their start tags have been; - within text, marked annotations (like bold, say) are handled atomically; - white lines are to be left alone if possible. CHECK: A closing tag has to be either on the same horizontal line as the corresponding opening tag, or else, it ought to be vertically aligned with it. This rules out the frequent habit of bunching many opening tags at the beginning of a paragraph, then bunching them in reverse order at the end of the paragraph, refilling everything together. All the contrary, this principle says that if opening tags are bunched on a line, the closing tags have to be bunched on the exact same line. Another consequence is that a textual paragraph holds annotations (an italic fragment, for example), refilling the paragraph may not split the annotation on many lines, all spaces within the annotation have to be locally considered as non breakable, as if the annotation was some kind of structural super-word, matter of speaking. CHECK: Some closing tags may be elided, as per the DTD. When an opening tag does not have an explicit corresponding closing tag, the alignment rule above does not hold, because there is nothing to align. Consequently, in such case, refilling may be a little more aggressive and effective. CHECK: Not everything is refilled. Refilling ignores SGML comments or SGML declarations. I might change this if the need arises, but I did not feel that need so far. A distinction is needed between structural refilling and textual refilling. Some elements need their CDATA unaltered, the most common example being ``<pre>`` within HTML, so ``xxml.el`` should never blindly refill all textual data. CHECK: Care has been taken for cursor to apparently stick with its context while (indentation and) refilling goes on. Cursor should merely move over the wave. Recipe for usage ---------------- The ``M-q`` command finds out the smallest SGML element around the cursor, then does a structural refilling of all lines for this element to the value of ``fill-column``, trying to find the most compact layout which would respect both the edition margins and the refilling principles. If the cursor is close to the beginning of file, it is likely that this command will refill more lines and be slower. The command uses the default indentation step, but it may be overridden through the usage of a prefix argument. Value 0 forces the removal of all indentation, making all tags appear flush with the left margin. A negative prefix argument flags that white lines around tags should get removed, in which case the absolute value of the prefix argument is used as the indentation step. Refilling has side effects under control of user few options. It may for example adjust the case of tag or attribute ids, yet if this is not done, start tags and end tags still correspond if their id only differ by the case used. Refilling is also shy of modifying SGML comments or SGML declarations, which have to be refilled "by hand", at least for now. CHECK: The prefixed version of the same command, ``C-u M-q``, triggers a more aggressive refilling, in which refilling is textual as well as structural. (An option exists for always forcing this aggressiveness, so command prefix may be omitted and still yield the same effect.) CHECK: A simple heuristic makes usage much simpler. If the cursor is positioned within a text at the time of the ``M-q`` command, then the filling goes textual without the need of prefixing the command. CHECK: Since refilling depends heavily on correct indentation, ``xxml.el`` does not take any chance, and refilling is always preceded by automatic re-indentation. It is never required to separately trigger indentation through a separate command. It may take some more time, but the results are more dependable. It practically means that the ``M-q`` command is sufficient in most situations. Known problems -------------- Trailing space on lines is not always removed while refilling. The first line of a refilled text is not truncated when too long. While refilling goes, tag names and attribute names are automatically down-cased. An option variable exists to inhibit this behaviour. All ``xxml.el`` code tries to disregard case when recognising names, so an opening tag and a closing tag may be seen as identical even if written with different casing. This is OK for SGML and HTML, but I think I read that DSSSL cares about casing. Some later option might be needed for fine-tuning this aspect. Checking whether if there is a closing tag for every opening tag requires more CPU cycles, so it might require more time to refill a big text. Consequently, I generalised progress indicators so they could be used for refilling as well as for indentation. However, PSGML produces its own diagnostics while repositioning, these were overrunning ``xxml.el`` progress indicators, I implemented an ugly stunt meant to silence PSGML. ``is-breakable`` should apply after (implied) end tag, and include <p>. ``is-splittable-before`` not used anymore and so, no recently tested. ``is-splittable-after`` has never been implemented, maybe not useful. ``is-shrink-wrappable`` to be rethought and debugged, now inactive. Clean-up ======== Principles ---------- There is a lot of noise out there, especially in the realm of HTML. Some people debug their HTML structures using lenient browsers rather than good conformance tools. A few HTML composition programs or specialised editors, sometimes going as far as claiming themselves as "experts", produce random abomination and garbage. These should be avoided like plague. Before starting to work on an existing set of SGML files, like the HTML of a Web site, one would ideally do a serious job merely to clean out that site, and get a conforming set of well indented pages. This is normally done once, and not required anymore as long as work habits stay reasonable afterwards. Recipe for usage ---------------- As cleaning is only needed when one takes a old site in charge, and not afterwards, there is no short key binding for cleaning operations. The command ``M-x xxml-cleanup`` currently does little. It transforms Microsoft end of lines into Unix end of lines and recodes character entities representing a non breakable space to the Latin-1 character. It also removes ClarisWorks specific garbage. The command ``C-u M-x xxml-cleanup`` has the supplementary effect of ensuring a file prologue and epilogue. Unless the file already declares some DTD, the prologue will receive the value of ``xxml-default-prolog`` when not nil. The epilogue gets edition options for PSGML. Known problems -------------- A lot is needed in the area of cleaning out HTML created by various monsters. I consider much more a relief than a problem that I was not exposed to various garbage generators, long enough to need more cleaning functions within ``xxml.el``. But surely, many are less lucky than me, and may consider that ``xxml.el`` is lacking in this area. For one, I'll surely add more cleaning functions if I ever need them. History ======= I originally wrote ``xxml.el`` mainly for my associate Laurent and me, for direct SGML (and HTML) editing. Karl Eichwalder much helped me at getting started with PSGML and with SGML matters in general, so I was happy to give him a copy of ``xxml.el``. His suggestions and criticism allowed for a quicker stabilisation of the package in its beginnings. Debugging ``xxml.el`` has been a bit difficult, as it progressively relied on a few Emacs features I was not very familiar with, and for which I discovered and experimented strengths and limitations along the way. I wrote once about ``xxml.el`` to the Gnits gang, and mailing lists for SGMLtools et DocBook. Someone wrote me he was working on a similar project, which was announcing to be difficult without PSGML, on the other hand, he said he correctly interfaced with Emacs "Speed bar", which ``xxml.el`` (or rather I :-) is not familiar with. Nowadays, I do not edit SGML as often as I used to, but Laurent never stopped, he keeps telling me that he uses ``xxml.el`` heavily. Refilling (and the automatic indentation going with it) is probably his most heavily used command. Highlighting is undoubtedly comfortable, but yet, refilling is probably the main ``xxml.el`` feature for us.