Fetching contributors…
Cannot retrieve contributors at this time
43 lines (34 sloc) 1.24 KB
- remove bundled htmlentities in favor of a gem dependency
- also extract links from area and frame tags
- fix etagfilter bug
- Add max_depth option to crawler configuration for limiting the crawl to a
specific depth
- add support for http proxies including basic authentication
- remove rubyful_soup support
- make RDig compatible with Ferret 0.10.x
- won't work any more with Ferret 0.9.x and before
- Bug fix release: fixed handling of unparseable URLs
- file system crawling
- optional url rewriting before indexing, e.g. for linking to results
via http and building the index directly from the file system
- PDF title extraction with pdfinfo
- removed dependency on mkmf which doesn't seem to exist in Ruby 1.8.2
- made content extractors more flexible - instances now use a given
configuration instead of the global one. This allows the
WordContentExtractor to use an HtmlContentExtractor with it's own
configuration that is independent of the global config.
- Bugfix release
- add pdf and Word content extraction capabilities using the tools
from the xpdf-utils and wv packages
- additional content extractors may be plugged in by extending
the ContentExtractor class
initial release