Skip to content
Feb 6, 2019
Create updated last_released_daemon tag; points now to release-candid…


  • The PDF filtering code has been hardened to withstand processing uncharacteristic PDF files with excessively large in-memory representations, without filling up the heap and without requiring changes to existing plugins.


  • The proxy failed to normalize URLs in requests that include an AUID.

  • Cancelling hashes started from DebugPanel or HasherService frequently did not work, and sometimes crashed the daemon.

  • Aborting crawls using crawlPriorityAuMap did not work.

Assets 3

@dlvargas dlvargas released this Nov 7, 2018 · 2461 commits to master since this release

Bug Fixes

  • Upgraded third-party libraries to address security vulnerabilities reported against them. Updated versions include Apache PDFBox 1.8.16 (CVE-2018-11797), Apache Commons Compress 1.18 (CVE-2018-11771) and FasterXML Jackson 2.9.7 (CVE-2018-7489).
  • Some of the ways ServeContent can be invoked failed in some cases on AUs having multiple crawl-start URLs, when some of the start URLs do not exist.
Assets 3

@dlvargas dlvargas released this Jul 24, 2018 · 2461 commits to master since this release


  • The new metadata type "File" supports indexing of arbitrary publication types. Support is in place for both publication level items (MetadataField.PUBLICATION_TYPE_FILE) and article level items (MetadataField.ARTICLE_TYPE_FILE). Article level file items will be assumed to have a publication level file parent even if not explicitly defined. Item metadata beyond the standard access URL, publisher, and provider may be stored as arbitrary key-value pairs in a MetadataField.FIELD_MD_MAP.

  • Content Configuration web service now adds AUs from their TDB definition rather than by AUID, matching the way other subsystems add AUs: Including non-definitional parameters, and choosing the least full repository.

  • Deep crawl status information (lastDeepCrawl, lastDeepCrawlResult, lastCompletedDeepCrawl, lastCompletedDeepCrawlDepth) is tracked and reported in the UI, and through the getAuStatus() and queryAus() Web services.

  • Debug Panel and AU Status now include a "Validate Files" action which runs the plugin's ContentValidator on all files in the AU, reporting any ValidationFailures thrown.

  • In lieu of a MIME-type content validator factory, plugins may specify an au_url_mime_validation_map. ValidationFailures will occur for URLs that match one of the patterns but whose Content-Type does not match the corresponding MIME-type. E.g.,

	<string>/doi/pdf(plus)?/, application/pdf</string>
	<string>/doi/(abs|full)/, text/html</string>
  • ContentValidators may throw ContentValidationException.LogOnly to record a warning message without causing validation failure.

  • The "Files" list from AU Status now includes a PollWeight column.

Bug Fixes

  • SubscriptionManager omitted non-definitional parameters when adding subscribed AUs.

  • The Link Rewriter rewrote in-page links ("#ref"), breaking them.

  • Metadata item type inference reversed BOOKCHAPTER and BOOKVOLUME in some circumstances.

  • In queryAus() web service, selecting newContentCrawlUrls field caused a fatal error.

  • The LastMetadataIndex field in getAuStatus() and queryAus() web services was not accessible using

  • Fixed unsafe database resource closings and incorrect comparisons in metadata-handling code.

  • Fixed active task removal when metadata indexing for an AU is disabled.

Assets 3
Jun 27, 2018

@dlvargas dlvargas released this Feb 13, 2018 · 5373 commits to master since this release


  • Allow the content configuration Web Service to use the same storage volume selection logic as the UI when adding AUs.

Bug Fixes

  • Bug fixes in ServeContent link rewriting and OpenURL resolver.

  • Properly trigger configuration of AUs after synchronizing whole title subscriptions.

Assets 3
You can’t perform that action at this time.