Switch branches/tags
Nothing to show
Fetching latest commit…
Cannot retrieve the latest commit at this time.
Failed to load latest commit information.


$Id: Readme,v 1.2 2003/09/12 12:09:18 harryf Exp $
XML_SaxFilters provides a foundation for using Sax filters in PHP.
The original code base was developed by Luis Argerich and published at
Luis discussed how SaxFilters work, using the Sourceforge classes as
an example, in Chapter 10 of Wrox "PHP 4 XML".

Luis kindly gave permission to modify the code and license for
inclusion in PEAR.

This version of the Sax Filters makes significant changes to Luis's
original code (backwards compatibility is definately broken), seperating
abstract classes from interfaces, providing interfaces for data readers
and writers and providing methods to help parse XML documents recursively
with filters (for example AbstractFilter::setParent()) for documents where
the structure can vary significantly.

Sax Filtering is an approach to making parsing XML documents with Sax modular
and easy to maintain. The parser delegates events to a child filter which may
in turn delegate events to another filter. In general it's possible to implement
filters for a document which are as flexible and powerful as DOM.

For some discussions on Sax filtering try; (Java) (Python) (Perl)

The API provided by XML_SaxFilters is a little different from that commonly
used in other languages, providing the concepts of "parent" and "child".
A parent of the current filter is the filter (or parser) "upsteam" which
receive XML event notifications before the current filter.
A "child" is a filter "downstream" of the current filter (or parser) to
which XML events are delegated.
The top of the "family tree" of filters is always the parser itself, which
can have children but cannot have parents. Filters can have parents and
The parsers themselves never handle any XML events personally but always
delegate to a filter.
The parser accepts an object implementing the reader interface from which
it streams the XML.
The filters can be given an object implementing the writer interface
to write output to.

For an example of SAX filters in action with PHP try;
(example uses Luis Argerich original Sax Filters).

Some potential things to do with SaxFilters (there's probably loads more)
- Perform simple XML parsing in a structured manner (see rssfilter.php example)
- Transform XML into something else (see xml2html.php example)
- Building a template parser where the template tags are themselves XML
  (see template.php example)

 - Implements Parsers for both the native PHP XML extension and
 - Reading and writing of data is seperated from the parsers by
   classes implementing the Reader and Writer interfaces. This helps
   the SaxFilters read and write to any data container.
 - Using the filters methods;
    - setChild()
    - unsetChild()
    - setParent()
    - unsetParent()
    - attachToParent()
    - detachFromParent()
   It's possible to have one filter create another while parsing is
   in progress, which allows for "recursive" parsing of an XML document
   where the structure was "unknown" before hand. This can be particularily
   powerful when dealing with documents like HTML or XUL, where structures
   can vary wildy from document to document.
 - Most of the classes provided by SaxFilters are abstract or interfaces
   (interfaces are currently "virtual" in PHP4 but coming soon to PHP5).
   The intention is to provide a solid basis for building filters to
   all sorts of common XML document formats (contributions appreciated)

++Usage Notes

- When using the ExpatParser, it defaults to XML_OPTION_CASE_FOLDING = 0.
  If you need everything converted to upper case to make it easier to match
  tag names, you should use;
  To do the same with the HtmlSaxParser you need;

- With PHP4, any classes you instantiate from XML_SaxFilters (or built
  on SaxFilters) should be instantiated by reference, e.g.
  $filter = & new MyFilter();
  Wierd things start to happen if your don't, depending on what you're doing.

- Using the HTMLSaxParser depends on PEAR::XML_HTMLSax being installed.

- The SaxFilters only define handles for open tag, close tag and character
  data events. If there's a demand for more (e.g. entity handling) say the word.

- Including the main file XML_SaxFilters.php includes the AbstractFilter and
  FilterInterface classes. Parsers, Readers and Writers need to be included
  on a per use basis from the PEAR XML/SaxFilters namespace.

++ Limitations
- The HTMLSaxParser needs to be watched carefully right now, related to
  some minor issues in XML_HTMLSax 1.0 (fixes coming soon). These can
  all be worked around in a concrete filter but right now HTMLSaxParser
  behaves a little differently form ExpatParser.

++ Example Use
Further examples are available in the examples directory of this package.

require_once 'XML/SaxFilters.php'; // This is the normal way to do it

// Define a customer handler class - just displays stuff
class SimpleFilter extends XML_SaxFilters_AbstractFilter
/* implements XML_SaxFilters_FilterInterface */
    // Parsed output stored here
    var $output = '';
    // For whitespace indentation
    var $indent = '';

    // Called when parsing starts
    function startDoc()
        $this->output.="Parsing started\n";
    // Opening tag handler
    function open(& $tag,& $attribs)
        $sep = '';
        if ( count($attribs) > 0 )
            $this->output.=' (';
            foreach ( $attribs as $key => $value )
                $this->output.="$sep$key: $value";
                $sep = ', ';

    // Closing tag handler
    function close(& $tag)

    // Character data handler
    function data(& $data)
        $data = trim($data);
        if ( !empty($data) ) {
    // Called at end of parsing
    function endDoc()
        $this->output.="Parsing finished\n";
    function addIndent()
    function removeIndent()
        $this->indent = substr_replace($this->indent,'',0,1);

// A Simple XML document
$doc = <<<EOD
<?xml version="1.0"?>
    <language name="PHP" version="4.3.2">
        PHP is number 1 for building web based applications.
    <language name="Python" version="2.2.3">
        Python is number 1 for cross platform desktop applications.
    <language name="Perl" version="5.8.0">
        Perl is number 1 for text and batch processing.

// This is where the action takes place

// Create the parser (use native SAX extension, StringReader, XML document)
$parser = & XML_SaxFilters_createParser('Expat','String',$doc);

// This uses PEAR::XML_HTMLSax instead
// $parser = & XML_SaxFilters_createParser('HTMLSax','String',$doc);

// Instantiate the filter above
$filter = & new SimpleFilter();

// Add the filter to the parser

// Parse
if ( ! $parser->parse() ) {
    $error = $parser->getError();
    echo $error->getMessage();
} else {
    echo '<pre>'.$filter->output.'</pre>';