Skip to content

jfisteus/html2xhtml

Repository files navigation

Html2xhtml

Html2xhtml is a command-line tool that converts HTML files to XHTML files. The path of the HTML input file can be provided as a command- line argument. If not, it is read from stdin.

Xhtml2xhtml tries always to generate valid XHTML files. It is able to correct many common errors in input HTML files without loose of infor‐ mation. However, for some errors, html2xhtml may decide to loose some information in order to generate a valid XHTML output. This can be avoided with the -e option, which allows html2xhtml to generate non- valid output in these cases.

Html2xhtml can generate the XHTML output compliant to one of the fol‐ lowing document types: XHTML 1.0 (Transitional, Strict and Frameset), XHTML 1.1, XHTML Basic and XHTML Mobile Profile.

HOW TO RUN THE PROGRAM

For full information about how to run the program see doc/html2xhtml.txt in the source code distribution, the html2xhtml.txt file in the Windows binaries ZIP file or the html2xhtml manpage. Some examples are shown below.

  • By default, the program reads the input file from its standard input and dumps the output file to its standard output:
cat input.html | html2xhtml
  • The input can also be specified as a command line argument:
html2xhtml input.html
  • In order to save the output to a file, redirect the standard output:
html2xhtml input.html > output.html
  • Alternatively, you can specify the output file name with the -o option:
html2xhtml input.html -o output.html
  • Select the document type of the output with -t:
html2xhtml input.html -t 1.1 -o output.html

The available values are:

  • transitional: XHTML 1.0 Transitional
  • frameset: XHTML 1.0 Frameset
  • strict: XHTML 1.0 Strict
  • 1.1: XHTML 1.1
  • basic-1.0: XHTML Basic 1.0
  • basic-1.1: XHTML Basic 1.1
  • mp: XHTML Mobile Profile
  • print-1.0: XHTML Print 1.0

Use "transitional" if you just want to tidy up the markup.

Choose an output character encoding (by default, the program uses the character encoding detected in the input):

html2xhtml input.html --ocs utf-8 -o output.html

Get the list of available character sets:

./src/html2xhtml --lcs

HOW TO COMPILE AND INSTALL THE PROGRAM FROM THE SOURCE TARBALL

Enter the main directory of the source distribution and type:

$ ./configure
$ make

You can run the test battery in order to check that the program is working as expected:

$ cd tests
$ ./test.sh
$ cd ..

If you want to install the program in your system, type then (it may require root priviledges):

$ make install 

See ./INSTALL for more information.

The program has been tested to compile on GNU/Linux and MinGW in Windows. In MinGW the actual EXE file to use is the one the compiler creates inside src\.libs instead of the one in src. It depends on the libiconv-2.dll file, which is distributed with MinGW (inside the bin\ subdirectory of the main MinGW installation directory).

HOW TO COMPILE AND INSTALL THE PROGRAM FROM THE GIT SOURCES

The source code in the Git repository does not include the files generated by the autotools. In order to build the ./configure script, run the following commands from the main directory of the sources:

$ aclocal
$ libtoolize
$ touch config.rpath
$ autoheader
$ automake --add-missing
$ autoconf

In OS X you need to use the glibtoolize command instead of libtoolize.

After that, you should get the ./configure script and proceed as explained above:

$ ./configure
$ make