Web::DOM - A Perl DOM implementation
use Web::DOM::Document;
my $doc = new Web::DOM::Document;
my $el = $doc->create_element ('a');
$el->set_attribute (href => 'http://www.whatwg.org/');
$doc->append_child ($el);
The Web::DOM
modules is a pure-Perl DOM implementation. It implements various Web standard specifications, including DOM Living Standard and HTML Living Standard.
The Web::DOM::Document module provides the new
method returning a new document object, which corresponds to the new Document ()
constructor in JavaScript Web browser environment.
my $doc = new Web::DOM::Document; # XML document by default
$doc->manakai_is_html (1); # Change to HTML document
Using the document object, the application can create various DOM object, using standard DOM methods:
my $el = $doc->create_element ('p'); # HTML element
my $el = $doc->create_element_ns ($nsurl, $qname);
$el->set_attribute (class => 'hoge fuga');
my $text = $doc->create_text_node ('text');
my $comment = $doc->create_comment ('data');
Please note that DOM attributes and methods are available in perllish_underscored_name rather than domSpecificationsCamelCaseName.
Alternatively, you can instantiate the document object from an HTML or XML string, using the DOMParser
interface:
my $parser = new Web::DOM::Parser;
my $doc = $parser->parse_from_string ($string, 'text/html');
my $doc = $parser->parse_from_string ($string, 'application/xhtml+xml');
Your favorite query methods are also available:
$el = $doc->get_element_by_id ('site-logo');
$el = $doc->query_selector ('article > p:first-child');
$el = $doc->evaluate ('//div[child::p]', $doc)->iterate_next;
$col = $doc->get_elements_by_tag_name ('p');
$col = $doc->get_elements_by_class_name ('blog-entry');
$col = $doc->images;
For more information, see documentation of relevant modules. For example, methods available on the document object is listed in the Web::DOM::Document documentation. Frequently used modules include:
- Web::DOM::Document
-
The
Document
interface. - Web::DOM::Element
-
The
Element
interface. - Web::DOM::Exception
-
The
DOMException
interface. - Web::DOM::HTMLCollection
-
The
HTMLCollection
interface. - Web::DOM::Parser
-
The
DOMParser
interface.
The modules implement the manakai's DOM Perl Binding specification <http://suika.suikawiki.org/~wakaba/wiki/sw/n/manakai%27s%20DOM%20Perl%20Binding>, which defines the mapping between WebIDL/DOM and Perl.
As a general rule, the object implementing the DOM interface I is an instance of the class (or the class that is a subclass of the class) Web::DOM::I
. However, applications should not rely on this, as the class inheritance hierarchy could be different from the interface's one, and could be changed in future revision of the module implementation. In particular, applications should not test whether the object is an instance of the interface that is defined with the [NoInterfaceObject]
extended attribute. For example, the ParentNode
interface is defined with the extended attribute. The Web::DOM::Document class inherits the Web::DOM::ParentNode class, as the Document
interface implements the ParentNode
interface according to the DOM Standard, but applications should not test $node->isa ('Web::DOM::ParentNode')
.
The constructor of a DOM interface, if any, is implemented as the new
class method. For example, the constructor of the Document
interface can be invoked by Web::DOM::Document->new
.
Attributes, methods, and constants of a DOM interface can be accessible as methods of the object implementing the interface. For example, the innerHTML
attribute of the Element
interface is accessible as the inner_html
method of the element objects. If a method corresponding to the attribute is invoked with no argument, it acts as the getter of the attribute. If the method is invoked with an argument, it acts as the setter of the attribute.
$string_returned_by_getter = $el->inner_html;
$el->inner_html ($string_received_by_setter);
$string_returned_by_method = $el->get_attribute ($string);
$el->node_type == $el->ELEMENT_NODE;
Some objects accept array operations:
@children = @{$el->child_nodes};
$length = @{$el->child_nodes};
$first_child = $el->child_nodes->[0];
$second_child = $el->child_nodes->[1];
$second_last_child = $el->child_nodes->[-2];
Following classes have the constructor (i.e. the new
method):
- Web::DOM::Document
- Web::DOM::Event and its subclasses
- Web::DOM::Implementation
- Web::DOM::Parser
- Web::DOM::XMLSerializer
- Web::DOM::XPathEvaluator
Following modules export constants (by loading them using the use
statement):
- Web::DOM::Attr
- Web::DOM::AttributeDefinition
- Web::DOM::Event
- Web::DOM::Exception
- Web::DOM::HTMLTrackElement
- Web::DOM::KeyboardEvent
- Web::DOM::Node
- Web::DOM::NodeFilter
- Web::DOM::XPathResult
- Web::DOM::WheelEvent
Some classes contain private methods and variables. Applications must not invoke or use them. As a general rule methods with name starting by _
is private, although there might be exceptions (e.g. _manakai_border_spacing_x
method, reflecting CSS -manakai-border-spacing-x
property, is not a private method). Anything EXCEPT for followings are private and should not be used:
- DOM APIs as documented in relevant pod documentation
-
For example,
Web::DOM::Node::child_nodes
,Web::DOM::Implementation::create_document
,Web::DOM::Event::new
, andWeb::DOM::Node::ELEMENT_NODE
are explicitly mentioned in their pod section. - Perl standard operations
-
For example,
can
andisa
methods of any object,""
and0+
operation of any object,$Web::DOM::Document::VERSION
variable,use Web::DOM::Node
operation (which implicitly invokes theWeb::DOM::Node::import
method).Applications can also rely on
isa
method with class name derived from DOM interface name whose definition does not contain[NoInterfaceObject]
. For example,$object->isa ('Web::DOM::Node')
does (and will) work as intended, while$object->isa ('Web::DOM::CanvasPathMethod')
(defined with[NoInterfaceObject]
) or$object->isa ('Web::DOM::StringArray')
(not derived from a DOM interface name) might not. However, it is not considered a good practice to compare objects by its class name in sophiscated object-oriented programs.
Public APIs are not intended to be changed backward incompatibly in later stage of the development of these modules unless it is really necessary for some significant reasons (e.g. security concerns, or to resolve spec compatibility issues). Anything else could be changed, including package/file mapping of classes which do not provide constructors or constants.
Specifications defining features supported by the modules include:
- DOM
-
DOM Standard <http://dom.spec.whatwg.org/>.
- DOMPARSING
-
DOM Parsing and Serialization Standard <http://domparsing.spec.whatwg.org/>.
- DOM3CORE
-
Document Object Model (DOM) Level 3 Core Specification <http://www.w3.org/TR/2004/REC-DOM-Level-3-Core-20040407/DOM3-Core.html>.
- DOMXPATH
-
Document Object Model XPath <http://www.w3.org/TR/DOM-Level-3-XPath/xpath.html>.
- HTML
-
HTML Standard <http://www.whatwg.org/specs/web-apps/current-work/>.
- DOMDTDEF
-
DOM Document Type Definitions <http://suika.suikawiki.org/www/markup/xml/domdtdef/domdtdef>.
- DOMPERL
-
manakai's DOM Perl Binding <http://suika.suikawiki.org/~wakaba/wiki/sw/n/manakai%27s%20DOM%20Perl%20Binding>.
- MANAKAI
-
manakai DOM Extensions <http://suika.suikawiki.org/~wakaba/wiki/sw/n/manakai%20DOM%20Extensions>.
For the complete list of relevant specifications, see documentations of the modules.
The modules require Perl 5.10 or later.
Following features require the perl-web-markup package <https://github.com/manakai/perl-web-markup> (Web::HTML::Parser and its family): inner_html
, outer_html
, insert_adjacent_html
, DOMParser
, and XMLSerializer
; (Web::XPath::Parser and related modules): XPathEvaluator
and XPathExpression
; (Web::HTML::Microdata: manakai_get_properties
).
Following features require the perl-web-css package <https://github.com/manakai/perl-web-css>: query_selector
, query_selector_all
, CSSStyleSheet
, CSSRule
and its subclasses, and CSSStyleDeclaration
.
Following features require the perl-web-encodings package <https://github.com/manakai/perl-web-encodings>: setter of input_encoding
method of Document
and Entity
.
Features performing URL-related operations require the perl-web-url package <https://github.com/manakai/perl-web-url>, which depends on the perl-web-encodings package <https://github.com/manakai/perl-web-encodings>. Such features include: base_uri
, manakai_set_url
, manakai_entity_uri
, manakai_entity_base_uri
, declaration_base_uri
, manakai_declaration_base_uri
, action
, cite
, codebase
, data
, formaction
, href
, longdesc
, object
, ping
, poster
, and src
.
Following features require modules in the perl-web-datetime package <https://github.com/manakai/perl-web-datetime>: value
of Web::DOM::AtomDateConstruct, create_atom_feed_document
, create_atom_entry_element
, updated_element
, and published_element
.
How CSS style sheets are parsed and how CSSOM tree structure looks like depend on how much of CSS features are supported by the user agent. Since the web-dom module set by itself is not a rendering engine, most CSS features are considered as "not supported", therefore by default parsing discards most of CSS declarations. If you'd like to construct a CSS-based application on the top of the web-dom module set, you should turn on features you are supporting, through Web::CSS::MediaResolver module in the web-css package. The Web::CSS::MediaResolver object for a document's CSS parser can be accessed like this:
use Web::CSS::Parser;
my $parser = Web::CSS::Parser->get_parser_for_document ($doc);
$resolver = $parser->media_resolver;
... where $doc is the document node with which the CSS style sheet in question will be associated. Then, you can set the "supported" flag of features you are supporting, like this:
$resolver->{prop}->{display} = 1;
$resolver->{prop_value}->{display}->{block} = 1;
For more information on usage of the resolver, see Web::CSS::MediaResolver in the web-css package.
Latest version of the modules is available from the GitHub repository: <https://github.com/manakai/perl-web-dom>.
Test results can be reviewed at: <https://travis-ci.org/manakai/perl-web-dom>.
The manakai project has been developed several generations of DOM implementation. The current DOM3 implementation <https://github.com/wakaba/manakai/tree/master/lib/Message/DOM> had been worked since 2007.
The Web::DOM
modules has been developed as replacement for those modules, supporting the current DOM Standard. It does not reuse most of the code of the older implementation, and many useless DOM3 features are not implemented. However, it does implement some DOM3 features that is really necessary for backward compatibility, as well as non-standard manakai extensions. It should be possible for applications using the old implementation to migrate to the new implementation by just replacing class name and as such.
Following features fully or partially implemented in previous versions of manakai DOM implementations are considered obsolete and will not be implemented by these modules unless they are reintroduced by some DOM specification or found to be necessary for backward compatibility:
DOMImplementationRegistry, DOMImplementationSource, DOMImplementationList, DOM features, DOMStringList, StringExtended, read-only nodes, EntityReference, CDATASection, replaceWholeText, isElementContentWhitespace, specified setter, hasReplacementTree setter, DOM3 configuration parameters, configuration parameters for DOM3 spec compatible DTD-based node operations, DOM3 DOMError, DOM Standard DOMError, DOMErrorHandler, UserDataHandler, DOMLocator, isId and family, internalSubset, TypeInfo and schemaTypeInfo, DOM3 LS, namespaces for DOM3 events, EventException, MutationEvent, MutationNameEvent, TextEvent, DocumentEvent->canDispatch, DocumentType->implementation, Document->createXHTMLDocument, URIReference, InternetMediaType, MANAKAI_FILTER_OPAQUE, Document->manakaiCreateSerialWalker, SerialWalker. HTMLElement->irrelevant, HTMLAnchorElement->media, HTMLAreaElement->media, HTMLCommandElement, HTMLDataGridElement, HTMLEventSourceElement, HTMLIsIndexElement, HTMLLegendElement->form, HTMLMenuElement->autosubmit, HTMLBlockquoteElement, HTMLStrictlyInlineContainerExtended, HTMLStructuredInlineContainerExtended, HTMLStructuredInlineContainerExtended, HTMLSectioningElementExtended, HTMLListElementExtended, HTMLDListElementExtended, CSSStyleDeclaration->styleFloat. Overloaded operators ==
, !=
, and .=
, write operations through overloaded @{}
and %{}
operators for NodeList, NamedNodeMap, and HTMLCollection. Attr, Entity, and AttributeDefinition nodes can no longer contain Text nodes.
By default the DocumentType
node can no longer contain ProcessingInstruction
nodes as children. The old behavior can be restored by setting a true value to the manakai-allow-doctype-children
configuration parameter (See Web::DOM::Configuration).
The strict_error_checking
attribute no longer disables random exceptions as defined in DOM3 specification; its scope is formally defined in the manakai DOM Extensions specification [MANAKAI].
The initial milestone of the project is reimplementing the subset of DOM supported by the original manakai's DOM implementation <https://github.com/wakaba/manakai/tree/master/lib/Message/DOM>, except for obsolete features. Following features will be (re)implemented in due course:
- CSSOM Cascading API
-
getComputedStyle [CSSOM], Element.prototype.manakaiComputedStyle, Window.prototype.manakaiGetComputedStyle, Window.prototype.setDocument [MANAKAI]
- WebVTT DOM [HTML] [WEBVTT]
More features not supported by previous versions of manakai DOM implementation are expected to be implemented as well, including but not limited to:
- HTMLFormControlsCollection, HTMLOptionsCollection [HTML]
- Mutation observers [DOM]
- Selectors API Level 2 features
- DocumentStyle API [CSSOM]
- <?xml-stylesheet?> API [CSSOM]
- @font-face, @page [CSSOM]
- SVGElement->style [CSSOM]
- GetStyleUtils, PseudoElement [CSSOM]
- New mutation methods [DOM]
-
prepend, append, before, after, replace, remove
- DOM Ranges
-
DOM Ranges interfaces and methods [DOM]; Ranges support in DOM Core methods and attributes [DOM]; Range.prototype.createContextualFragment [DOMPARSING].
- Shadow DOM [DOM]
- Custom Elements [DOM, HTML]
In addition, source codes of the modules include many "XXX" markers, indicating TODO items.
Middle priority: URL; Encoding; Promise.
Lower priority: Form API; HTMLMediaElement and related interfaces; Canvas; The ImageBitmap interface; The Screen interface; SVG; DnD; The RelatedEvent interface; The Window interface and related interfaces; The History interface and related interfaces; The Location interface; The Navigator interface and related interfaces; Scripting; Workers; Console; XHR; EventSource; WebSocket; postMessage and related interfaces; Storage; IndexedDB; Fullscreen; Notifications. JS-compatible Date
, JSON
objects.
Very low priority: Zip; XSLT 1.0.
At the time of writing, there is no plan to implement the properties
attribute of the HTMLElement
interface (Instead, the manakaiGetProperties
method is implemented).
Methods returning the index or position in some list or string, whose IDL type is a number type, do not convert the value as specified by the WebIDL specification and the DOM Perl Binding specification. This should not be a problem as it is not realistic to have lists of items whose length is greater than, or nearly equal to 2**31 in both Perl's runtime environment and realworld use cases.
Although the modules implement APIs as used in the Web platform, they does not support the Web's security model, i.e. the same-origin policy. It does not make sense for Perl applications.
Wakaba <wakaba@suikawiki.org>.
Copyright 2007-2019 Wakaba <wakaba@suikawiki.org>.
This library is free software; you can redistribute it and/or modify it under the same terms as Perl itself.