Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

merge lewismc:master with apache:master #2

Merged
merged 14 commits into from
Jun 11, 2021
Merged

Commits on Feb 16, 2021

  1. Configuration menu
    Copy the full SHA
    2fae4cd View commit details
    Browse the repository at this point in the history

Commits on Feb 18, 2021

  1. Configuration menu
    Copy the full SHA
    5250d62 View commit details
    Browse the repository at this point in the history
  2. Configuration menu
    Copy the full SHA
    2724578 View commit details
    Browse the repository at this point in the history

Commits on Mar 15, 2021

  1. NUTCH-2596 Upgrade from org.mortbay.jetty to org.eclipse.jetty

    - remove Jetty (serving JSP pages) for HTTP protocol plugin tests
    - replace JSP pages by header/content strings hold in unit test classes
    sebastian-nagel committed Mar 15, 2021
    Configuration menu
    Copy the full SHA
    d193137 View commit details
    Browse the repository at this point in the history

Commits on Mar 16, 2021

  1. Merge pull request #574 from sebastian-nagel/NUTCH-2596-http-protocol…

    …-plugin-test-remove-jsp
    
    NUTCH-2596 Remove org.mortbay.jetty from unit tests of HTTP protocol plugins
    sebastian-nagel authored Mar 16, 2021
    Configuration menu
    Copy the full SHA
    81fb7bc View commit details
    Browse the repository at this point in the history

Commits on Mar 21, 2021

  1. NUTCH-2857 Upgrade from JDK1.8 --> JDK11 (#573)

    * NUTCH-2857 Upgrade from JDK1.8 --> JDK11
    lewismc authored Mar 21, 2021
    Configuration menu
    Copy the full SHA
    b91fae5 View commit details
    Browse the repository at this point in the history

Commits on Mar 27, 2021

  1. NUTCH-2858 urlnormalizer-protocol: URL port is lost during normalization

    - if URL includes a port the protocol is not normalized
    - add unit tests to verify correct behavior
    sebastian-nagel committed Mar 27, 2021
    Configuration menu
    Copy the full SHA
    c454a64 View commit details
    Browse the repository at this point in the history

Commits on Mar 29, 2021

  1. NUTCH-2858 urlnormalizer-protocol: URL port is lost during normalization

    - add note in config file that URLs including port are not left
      unchanged
    sebastian-nagel committed Mar 29, 2021
    Configuration menu
    Copy the full SHA
    d749920 View commit details
    Browse the repository at this point in the history
  2. NUTCH-2859: urlnormalizer-protocol: allow to normalize domains

    - host names starting with `*.` are matched as suffixes:
      `*.example.org` matches `example.org`, `www.example.org`,
      `www.subdomain.example.org`, etc.
    - allow to read config file protocols.txt from hdfs://
      or any file system supported by Hadoop
    - add Javadoc package documentation
    - document configuration properties in nutch-default.xml
    - reduce memory footprint by deduplicating protocol strings
      so that same protocol values are references to same objects
    sebastian-nagel committed Mar 29, 2021
    Configuration menu
    Copy the full SHA
    081c826 View commit details
    Browse the repository at this point in the history

Commits on Apr 1, 2021

  1. NUTCH-2855 Update org.elasticsearch.client (#577)

    * NUTCH-2855 Update org.elasticsearch.client
    lewismc authored Apr 1, 2021
    Configuration menu
    Copy the full SHA
    2837039 View commit details
    Browse the repository at this point in the history

Commits on Apr 6, 2021

  1. Merge pull request #576 from sebastian-nagel/NUTCH-2859-urlnormalizer…

    …-protocol-domain-rules
    
    NUTCH-2859: urlnormalizer-protocol: allow to normalize domains
    sebastian-nagel authored Apr 6, 2021
    Configuration menu
    Copy the full SHA
    6c02da0 View commit details
    Browse the repository at this point in the history

Commits on May 31, 2021

  1. Configuration menu
    Copy the full SHA
    0d6eaa3 View commit details
    Browse the repository at this point in the history

Commits on Jun 1, 2021

  1. Merge pull request #648 from sebastian-nagel/NUTCH-2866-metadata-tost…

    …ring
    
    NUTCH-2866 Fix MetaData.toString() to return "key=value ..."
    sebastian-nagel authored Jun 1, 2021
    Configuration menu
    Copy the full SHA
    18d2872 View commit details
    Browse the repository at this point in the history

Commits on Jun 3, 2021

  1. NUTCH-2864 Upgrade Dockerfile to use JDK 11 (#647)

    * NUTCH-2864 Upgrade Dockerfile to use JDK 11
    lewismc authored Jun 3, 2021
    Configuration menu
    Copy the full SHA
    cc8d76a View commit details
    Browse the repository at this point in the history