Default: This parameter is mandatory
Name of the project used internally by the engine
Default: Optional parameter
Host and Port combination that telnet console will bind to, e.g: localhost:7654
Default: Optional parameter
Host and Port combination that HTTP API server will bind to, e.g: localhost:5555
Default: Optional parameter
Host and Port combination of redis server, which is required for http api frontend as well as storage.
Default: This parameter is mandatory
List of scrapers that will be executed by the engine
Default: This parameter is mandatory
Internal name of the scraper
Default: This parameter is mandatory
Base url which will be used to start crawling
Default: 1 millisecond
Number of millisecond to wait between requests
Default: Optional parameter
List of patterns to validate url that's currently being scraped against. See patterns configuration.
Default: Optional parameter
Short name of extractor struct which implements Extractable interface, by defualt LinkExtractor (link) is used.
Default: This parameter is mandatory
Either contains or regexp. First one uses string matching, the latter relies on regular expression.
Default: This parameter is mandatory
Value that's used as string to match against or regexp expression depending on the type of pattern.
project: test
tcpaddress: localhost:7654
redisaddress: localhost:6379
httpaddress: localhost:5555
scrapers:
- name: golang
url: http://golangweekly.com
requestlimit: 200
patterns:
- type: contains
pattern: /issues
- name: scrapinghub
url: https://blog.scrapinghub.com
requestlimit: 200