Skip to content

Release Notes Heritrix 3.4.0 20210803

Andy Jackson edited this page Aug 3, 2021 · 1 revision

Summary of changes since Release Notes - Heritrix 3.4.0-20210803 - see the full changelog for more details.

Additions

  • ExtractorChrome: reduce request duplication between browser and frontier #416 (ato)
  • ExtractorChrome: Capture requests made by the browser #411 (ato)
  • Add ExtractorChrome to contrib #403 (ato)
  • Add basic syntax highlighting to the crawl.log viewer #408 (ato)
  • JDK 16 compatibility #418 (ato)

Changes

  • Upgrade httpclient to 4.5 #397 (anjackson)
  • Don't extract data URIs #423 (ato)
  • ToeThread: ensure currentCuri is finished before exiting #421 (ato)
  • Switch from Travis CI to Github Actions #404 (ato)
  • Speed up test suite #405 (ato)
  • Fix a couple of boring maven warnings #407 (ato)
  • Fix and document the -r option which runs a named job on startup #406 (ato)
  • Upgrade maven-assembly-plugin to 3.3.0 to fix file permissions #414 (ato)
  • Warc writer stats fixes #410 (ato)

Removals

none

Bugfixes

  • Fix WARC-IP-Address and use a common server-ip CrawlURI attribute for all protocols #409 (ato)
  • Jobs can get stuck STOPPING with "Interrupt leaving unfinished CrawlURI" #420
  • Groovy version is incompatible with JDK 16+ #419
  • module java.base does not export sun.security.tools.keytool to unnamed module @1ece4432 #417
  • Distribution package has broken filesystem permissions #413
  • Add WARC-IP-Address header to WARCWriterChainProcessor #396

Heritrix

Structured Guides:

Wiki index

FAQs

User Guide

Knowledge Base

Known Issues

Background Reading

Users of Heritrix

How To Crawl

Development

Clone this wiki locally