Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

merge lewismc:master with apache:master #2

Merged
merged 14 commits into from
Jun 11, 2021
Merged

Conversation

lewismc
Copy link
Owner

@lewismc lewismc commented Jun 11, 2021

merge lewismc:master with apache:master

lewismc and others added 14 commits February 16, 2021 10:40
- remove Jetty (serving JSP pages) for HTTP protocol plugin tests
- replace JSP pages by header/content strings hold in unit test classes
…-plugin-test-remove-jsp

NUTCH-2596 Remove org.mortbay.jetty from unit tests of HTTP protocol plugins
- if URL includes a port the protocol is not normalized
- add unit tests to verify correct behavior
- add note in config file that URLs including port are not left
  unchanged
- host names starting with `*.` are matched as suffixes:
  `*.example.org` matches `example.org`, `www.example.org`,
  `www.subdomain.example.org`, etc.
- allow to read config file protocols.txt from hdfs://
  or any file system supported by Hadoop
- add Javadoc package documentation
- document configuration properties in nutch-default.xml
- reduce memory footprint by deduplicating protocol strings
  so that same protocol values are references to same objects
…-protocol-domain-rules

NUTCH-2859: urlnormalizer-protocol: allow to normalize domains
…ring

NUTCH-2866 Fix MetaData.toString() to return "key=value ..."
@lewismc lewismc merged commit 1da6c02 into lewismc:master Jun 11, 2021
lewismc pushed a commit that referenced this pull request Mar 10, 2022
NUTCH-2940 Develop Gradle Core Build for Apache Nutch
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
2 participants