perl-web-markup - A pure-perl HTML and XML processor
Following modules are available:
- Web::HTML::Parser
-
An HTML parser.
- Web::XML::Parser
-
An XML parser.
- Web::HTML::Serializer
-
An HTML serializer.
- Web::XML::Serializer
-
An XML serializer.
- Web::XPath::Parser
-
An XPath 1.0 parser.
- Web::XPath::Evaluator
-
An XPath 1.0 evaluator.
- Web::HTML::Table
-
An implementation of HTML table model.
- Web::HTML::Microdata
-
An implementation of HTML microdata.
- Web::RDF::XML::Parser
-
An RDF/XML parser.
- Web::Feed::Parser
-
A RSS and Atom parser.
- Web::HTML::Validator
-
A DOM conformance checker (for HTML and XML).
- Web::GPX::Parser
-
A GPX parser.
These modules require Perl 5.14 or later. They requires Encode, which is included in the Perl distribution, and modules from the perl-web-encodings package <https://github.com/manakai/perl-web-encodings>, which is a submodule of the Git repository. The Web::RDF::XML::Parser module has more submodule dependency (see its documentation for details).
In addition, a DOM implementation is required as input (and output) to these modules, although there is no direct dependency. For the XPath modules, see Web::XPath::Evaluator for its requirements on the DOM implementation. For other modules, the DOM implementation must support a subset of features defined in DOM Standard, DOM Parsing and Serialization Standard, DOM3 Core, DOM Document Type Definitions, DOM Perl Binding, and manakai's DOM Extensions. An example of such a DOM implementation is the Web::DOM modules in the perl-web-dom package <https://github.com/manakai/perl-web-dom>.
The Web::Feed::Parser module and the Web::GPX::Parser module require modules from perl-web-datetime <https://github.com/manakai/perl-web-datetime> and perl-web-url <https://github.com/manakai/perl-web-url> packages.
Validator modules such as Web::HTML::Validator and Web::RDF::Checker require additional external modules; see their documentations.
The perl-web-dom package <https://github.com/manakai/perl-web-dom> implements DOM interfaces, which contains standard ways to parse or serialize HTML/XML documents. They are implemented using the perl-web-markup package.
Most of these modules are originally developed under the name of "Whatpm" in 2007-2008 <https://suika.suikawiki.org/www/markup/html/whatpm/readme> and then merged into the manakai-core package <https://suika.suikawiki.org/www/manakai-core/doc/web/>. Those modules are split again into this separate package in 2013.
The latest version of these modules are available at the GitHub repository: <https://github.com/manakai/perl-web-markup>.
Test results can be reviewed at Travis CI <https://travis-ci.org/manakai/perl-web-markup>.
Known issues are recorded at <https://manakai.g.hatena.ne.jp/task/4/> and GitHub Issues.
Wakaba <wakaba@suikawiki.org>.
Copyright 2007-2021 Wakaba <wakaba@suikawiki.org>.
This library is free software; you can redistribute it and/or modify it under the same terms as Perl itself.