Join GitHub today
GitHub is home to over 50 million developers working together to host and review code, manage projects, and build software together.Sign up
crawl level and host level limits on *novel* (not deduplicated) bytes and urls #138
… StatisticsSelfTest.java in this branch * seed-limits: change class originally known as SeedLimitsEnforcer to SourceQuotaEnforcer; make it a Processor instead of a DecideRule (because checking quota at link scoping time doesn't work, since many urls which would go over quota can be added to the frontier); support quotas on any of the fields tracked by CrawledBytesHistotable fix checkpointing problems with new statsBySource SeedLimitsEnforcer (contrib) DecideRule that rejects CrawlURI if source seed byte or document limit has been reached SourceSeedDecideRule applies the configured decision for any URI with discovered from one of a set of seeds add support to StatisticsTracker to keep a CrawledBytesHistotable per source tag when trackSources is enabled; integration test for this functionality
…ase where this can happen currently is if basic auth is configured for a url, but fails and url returns "401 Unauthorized")
* master: license header check that sourceTag of CrawlURI actually matches configured sourceTag remove already-outdated stuff from javadoc handle multiple clauses for same user agent in robots.txt Hook in submitted seeds properly. avoid spurious logging try very hard to start url consumer, and therefore bind the queue to the routing key, so that no messages are dropped, before crawling starts (should always work unless rabbitmq is down); some other tweaks for clarity and stability
Add this suggestion to a batch that can be applied as a single commit. This suggestion is invalid because no changes were made to the code. Suggestions cannot be applied while the pull request is closed. Suggestions cannot be applied while viewing a subset of changes. Only one suggestion per line can be applied in a batch. Add this suggestion to a batch that can be applied as a single commit. Applying suggestions on deleted lines is not supported. You must change the existing code in this line in order to create a valid suggestion. Outdated suggestions cannot be applied. This suggestion has been applied or marked resolved. Suggestions cannot be applied from pending reviews. Suggestions cannot be applied on multi-line comments.