Added the option to include images with relative urls, added/fixed up some tests. #5

wants to merge 20 commits into


None yet
5 participants

bborn commented Mar 2, 2011

No description provided.

+1 for the optional relative image URLs.

kennyma commented Oct 21, 2011


dparis and others added some commits Feb 29, 2012

Fixed bug where raw_html content was being processed before the encod…
…ing was enforced, leading to an invalid UTF-8 encoding exception
* Rewrote image_extractor to use more idiomatic Ruby
* Rewrote huge parts of internal_document to be more DRY and produce less garbage
* Integrated the htmlentities gem for generalized HTML entity decoding
* Fixed HTML entity decoding so that it happens when content is extracted, rather than doing it on the source document, which can break parsing
* Stubbed out the network calls in the test suite, resulting in dramatically faster tests
* General garbage, speed, and style tweaks
* Removed trailing whitespace from many files
* Make the ImageExtractor logger customizable, or pass false for no logger
* In the same vein, use default options and pass them along down to the various pieces of the parser

@bborn bborn closed this Oct 12, 2012

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment