Example HTTP parser to fetch and retrieve information about remote HTML pages.
Please install the application with Composer, if you are not familiar with composer please visit http://getcomposer.org.
To run the example application please use the following command once composer install
has been run:
./examples/console.php http-parser:fetch:products http://url.com/you/want/to/scrape | python -m json.tool
You can also make use of the URI meta command using the following:
./examples/console.php http-parser:fetch:urimeta http://url.com/you/want/to/scrape | python -m json.tool
Both commands can be run with a verbose -v flag to receive console output dealing the behaviour of the scrape.
./examples/console.php http-parser:fetch:products http://url.com/you/want/to/scrape -v
Can by run via ./vendor/bin/phpunit within the root directory of the repository after completing the composer install.
- Description will be embedded within the DOM, not the meta.
- Will be returning size in KB (Kilo Bytes)
- FetchProductsCommand - fetch a HTML dom page and create a set of product information and return as a JSON string to the console.
- FetchUriMetaCommand - fetch a remote HTML page and return a JSON object which represents data about a URI.
- Url - model representing our HTML meta data.
- Product - model representing data about a product.
- ProductList - model representing a set of products.
- MoneyDecorator - model for formatting a Money object in to a string.
Models should support JsonSerializable interface for returning data to the UI via console.
- HttpFetch - Basic service which acts as a facade to GuzzelHttp and hydrates the domain models.
- Exception handling is poor - does not handle failure well - needs to be significantly improved.
- Unit tests missing from key HttpFetch methods.
- More logging.
- More output options would be nice (to file, other formats).