AParser

Simple text parser. Any text format. Low memory usage (~ 2 buffers) for large files.

Usage

Basic

$parser = new ListParser();
$parser->open('http://google.com/sitemap.xml');
$parser->parseList(
    '<urlset',
    '<url>',
    function() use ($parser) {
        printf(
            "loc: %s\npriority: %s\n\n",
            $parser->parseBetween('<loc>', '</'),
            $parser->parseBetween('<priority>', '</')
        );
    }
);

To parse few files with one parser use parseFiles method.

$parser = new ListParser();
$parser->parseFiles('
        http://google.com/sitemap.xml
        http://php.net/sitemap.xml
    ',
    '<urlset',
    '<url>',
    function() use ($parser) {
        printf(
            "loc: %s\n",
            $parser->parseBetween('<loc>', '</')
        );
    }
);

To store results in array (large result array can cause high memory usage):

$parser = new ListParser();
$parser->open('http://google.com/sitemap.xml');
$result = $parser->parseList(
    '<urlset',
    '<url>',
    function() use ($parser) {
        return [
            'loc' => $parser->parseBetween('<loc>', '</'),
            'priority' => $parser->parseBetween('<priority>', '</'),
        ];
    }
);

Storing results in array works also with parseFiles method.

Except method parseBetween there are two methods seekTo and parseTo.

Method parseTo returns string from current position to specified string, but can't parse string longer than 1 buffer length.

Method seekTo moves file pointer to specified string, and can seek over any amount of buffers, using less than 2 buffers memory.

Method parseBetween uses these methods: seeks to first argument and parses to second argument.

Extending

It may be much flexible to extend class ListParser or AParser with you own:

class MyParser extends ListParser
{
    public $buffer = 4096;
    public $encoding = 'UTF-8';

    public $beginOfList = '<urlset';
    public $beginOfItem = '<url>';

    public function parseItem()
    {
        printf(
            "loc: %s\npriority: %s\n\n",
            $this->parseBetween('<loc>', '</'),
            $this->parseBetween('<priority>', '</')
        );
    }
}

$myParser = new MyParser();
$myParser->open('http://google.com/sitemap.xml');
$myParser->parseList();

Parse images

To parse images use class ImageParser and make parseItem handler so that it return array with two elements: src - url of image to download, dest - local path to save.

Let's download some cute foxes and cats from google:

$parser = new ImageParser();
$parser->parseFiles('
        http://www.google.ru/search?tbm=isch&q=cat
        http://www.google.ru/search?tbm=isch&q=fox
    ',
    '<table class="images_table',
    '<td style="width:25%;word-wrap:break-word">',
    function() use ($parser) {
        return [
            'src' => $parser->parseBetween('src="', '"'),
            'dest' => 'parsed/google-cat-and-fox/' .
                preg_replace(
                    '/[^\w\d]+/',
                    '.',
                    html_entity_decode(
                        strip_tags(
                            $parser->parseBetween('</a>', '</td>')
                        )
                    )
                ),
        ];
    }
);

License

This Source Code Form is subject to the terms of the Mozilla Public License, v. 2.0. If a copy of the MPL was not distributed with this file, You can obtain one at http://mozilla.org/MPL/2.0/.

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
aeqdev		aeqdev
test/aeqdev/AParser		test/aeqdev/AParser
LICENSE		LICENSE
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

AParser

Usage

Basic

Extending

Parse images

License

About

Releases

Packages

Languages

License

makryl/AParser

Folders and files

Latest commit

History

Repository files navigation

AParser

Usage

Basic

Extending

Parse images

License

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages