Assets 3

Bug Fixes

  • Upgraded third-party libraries to address security vulnerabilities reported against them. Updated versions include Apache PDFBox 1.8.16 (CVE-2018-11797), Apache Commons Compress 1.18 (CVE-2018-11771) and FasterXML Jackson 2.9.7 (CVE-2018-7489).
  • Some of the ways ServeContent can be invoked failed in some cases on AUs having multiple crawl-start URLs, when some of the start URLs do not exist.

@dlvargas dlvargas released this Jul 24, 2018 · 1397 commits to master since this release

Assets 3

Features

  • The new metadata type "File" supports indexing of arbitrary publication types. Support is in place for both publication level items (MetadataField.PUBLICATION_TYPE_FILE) and article level items (MetadataField.ARTICLE_TYPE_FILE). Article level file items will be assumed to have a publication level file parent even if not explicitly defined. Item metadata beyond the standard access URL, publisher, and provider may be stored as arbitrary key-value pairs in a MetadataField.FIELD_MD_MAP.

  • Content Configuration web service now adds AUs from their TDB definition rather than by AUID, matching the way other subsystems add AUs: Including non-definitional parameters, and choosing the least full repository.

  • Deep crawl status information (lastDeepCrawl, lastDeepCrawlResult, lastCompletedDeepCrawl, lastCompletedDeepCrawlDepth) is tracked and reported in the UI, and through the getAuStatus() and queryAus() Web services.

  • Debug Panel and AU Status now include a "Validate Files" action which runs the plugin's ContentValidator on all files in the AU, reporting any ValidationFailures thrown.

  • In lieu of a MIME-type content validator factory, plugins may specify an au_url_mime_validation_map. ValidationFailures will occur for URLs that match one of the patterns but whose Content-Type does not match the corresponding MIME-type. E.g.,

   <entry>
     <string>au_url_mime_validation_map</string>
     <list>
	<string>/doi/pdf(plus)?/, application/pdf</string>
	<string>/doi/(abs|full)/, text/html</string>
     </list>
   </entry>
  • ContentValidators may throw ContentValidationException.LogOnly to record a warning message without causing validation failure.

  • The "Files" list from AU Status now includes a PollWeight column.

Bug Fixes

  • SubscriptionManager omitted non-definitional parameters when adding subscribed AUs.

  • The Link Rewriter rewrote in-page links ("#ref"), breaking them.

  • Metadata item type inference reversed BOOKCHAPTER and BOOKVOLUME in some circumstances.

  • In queryAus() web service, selecting newContentCrawlUrls field caused a fatal error.

  • The LastMetadataIndex field in getAuStatus() and queryAus() web services was not accessible using daemonstatusservice.py.

  • Fixed unsafe database resource closings and incorrect comparisons in metadata-handling code.

  • Fixed active task removal when metadata indexing for an AU is disabled.

Jun 27, 2018

@dlvargas dlvargas released this Feb 13, 2018 · 4309 commits to master since this release

Assets 3

Features

  • Allow the content configuration Web Service to use the same storage volume selection logic as the UI when adding AUs.

Bug Fixes

  • Bug fixes in ServeContent link rewriting and OpenURL resolver.

  • Properly trigger configuration of AUs after synchronizing whole title subscriptions.

@dlvargas dlvargas released this Oct 16, 2017 · 4311 commits to master since this release

Assets 3

Features

  • The ViewContent screen now offers an option in the upper pane to run a link extractor on the content displayed in the lower pane.

Bug Fixes

  • Fixed a bug in the title subscription management screen's tabbed interface, which
    under some circumstances could cause the loss of title subscription data previously
    entered in other tabs.

@dlvargas dlvargas released this Jul 15, 2017 · 4318 commits to master since this release

Assets 3

Features

  • Plugins may compute the starting URL(s) that should be used to browse
    an AU's content. If plugin_access_url_factory is set to the name of a
    FeatureUrlHelperFactory, then the FeatureUrlHelper's getAccessUrls()
    method will be invoked and the resulting list will be used in place of
    the AU's start URLs in contexts where the user is presented with
    starting points to browse an AU. (E.g., manifest index pages in
    ServeContent and the proxy.) See FeatureUrlHelper.
  • Plugins may also compute feature URLs. If a value in the
    au_feature_urls map is the name of a FeatureUrlHelperFactory, then the
    FeatureUrlHelper's getFeatureUrls() method will be invoked instead of
    expanding a printf template. See FeatureUrlHelper.
    Plugins that synthesize manifest pages (e.g., for bulk ingest content)
    should generally set both plugin_access_url_factory and the au_volume feature to a FeatureUrlHelperFactory.
  • The AU status page now has two ServeContent links: "Serve Content" and
    "Serve AU".

    "Serve Content" does what "Serve AU" has historically done: it feeds the
    bibliographic information for that AU (usually issn&year or isbn&year)
    to the OpenURL resolver to find AUs that contain that content. In
    the case of multiple publishers or providers there may be more than
    one AU. But the results can be misleading or unintuitive in cases
    where the bibliographic information in the title db is incomplete.
    "Serve AU" now serves that specific AU and no longer depends on Open URL
    resolution.

  • The AU XPath expressions in org.lockss.crawler.crawlPriorityAuMap and
    org.lockss.poll.pollWeightAuMap can now refer to the variable $myhost,
    which is set to the value of org.lockss.platform.fqdn. This allows
    crawl and poll priorities to be set differently on different boxes.

Bug Fixes

  • OpenUrlResolver's results were influenced by the availability of pages
    at the publisher site, even when org.lockss.serveContent.neverProxy
    was true.

  • Link rewriting in ServeContent used the wrong base URL when serving
    redirected pages.

  • Deactivating AUs or reloading plugin could cause rapid, excessive
    logging.

  • Race condition when deactivating AUs could cause CrawlManager to exit
    and daemon restart.

  • Files with erroneous Content-Encoding compression headers caused
    errors in login page checkers.

  • URLs with path components longer than 255 chars and containing an
    encoded slash weren't decoded decoded properly, resulting in their
    childern not being seen by URL iterators

  • Changes to the name of an AU in the title DB, with no other changes,
    weren't reflected in status displays, etc, until daemon restart.

Jun 16, 2017