Elegant Scraper and Crawler Framework for Golang
Switch branches/tags
Nothing to show
Clone or download
vosmith Merge pull request #222 from pyjac/master
Proxy URL for request
Latest commit 7c1d26d Sep 17, 2018
Permalink
Failed to load latest commit information.
.github Added .github/ISSUE_TEMPLATE.md (optional) Nov 22, 2017
_examples Proxy URL for request Sep 17, 2018
cmd/colly [mod] add license header Mar 11, 2018
debug [mod] add license header Mar 11, 2018
extensions [fix] package comment should be of the form "Package extensions ..." Mar 12, 2018
proxy Proxy URL for request Sep 17, 2018
queue [fix] resolve race in queue II. Jun 5, 2018
storage [mod] simplify the cookie layer in storage interface Mar 13, 2018
.codecov.yml turn off codecov comments Jan 6, 2018
.travis.yml [mod] update go versions for travis tests Apr 17, 2018
CHANGELOG.md [enh] version bump to 1.1.0 Aug 13, 2018
CONTRIBUTING.md Update CONTRIBUTING.md Dec 2, 2017
LICENSE.txt [enh] add request & response callbacks ++ cookie handling ++ readme Sep 29, 2017
README.md Adding goquotes to the list of projects using colly Jul 31, 2018
VERSION [enh] version bump to 1.1.0 Aug 13, 2018
colly.go use iota as ProxyURLKey value Sep 17, 2018
colly_test.go [fix] golint Aug 22, 2018
context.go [mod] add license header Mar 11, 2018
context_test.go [mod] add license header Mar 11, 2018
htmlelement.go [enh] add Index attribute to HTMLElement - closes #211 Aug 22, 2018
http_backend.go [fix] avoid int overflow on smaller platforms - closes #210 Aug 17, 2018
request.go Proxy URL for request Sep 17, 2018
response.go [fix] valid response filename handling - closes #119 Apr 2, 2018
unmarshal.go Add support for struct slices Aug 28, 2018
unmarshal_test.go Add support for struct slices Aug 28, 2018
xmlelement.go [fix] gofmt Jul 10, 2018
xmlelement_test.go [fix] gofmt Jul 10, 2018

README.md

Colly

Lightning Fast and Elegant Scraping Framework for Gophers

Colly provides a clean interface to write any kind of crawler/scraper/spider.

With Colly you can easily extract structured data from websites, which can be used for a wide range of applications, like data mining, data processing or archiving.

GoDoc Backers on Open Collective Sponsors on Open Collective build status report card view examples Code Coverage FOSSA Status Twitter URL

Features

  • Clean API
  • Fast (>1k request/sec on a single core)
  • Manages request delays and maximum concurrency per domain
  • Automatic cookie and session handling
  • Sync/async/parallel scraping
  • Caching
  • Automatic encoding of non-unicode responses
  • Robots.txt support
  • Distributed scraping
  • Configuration via environment variables
  • Extensions

Example

func main() {
	c := colly.NewCollector()

	// Find and visit all links
	c.OnHTML("a[href]", func(e *colly.HTMLElement) {
		e.Request.Visit(e.Attr("href"))
	})

	c.OnRequest(func(r *colly.Request) {
		fmt.Println("Visiting", r.URL)
	})

	c.Visit("http://go-colly.org/")
}

See examples folder for more detailed examples.

Installation

go get -u github.com/gocolly/colly/...

Bugs

Bugs or suggestions? Visit the issue tracker or join #colly on freenode

Other Projects Using Colly

Below is a list of public, open source projects that use Colly:

If you are using Colly in a project please send a pull request to add it to the list.

Contributors

This project exists thanks to all the people who contribute. [Contribute].

Backers

Thank you to all our backers! 🙏 [Become a backer]

Sponsors

Support this project by becoming a sponsor. Your logo will show up here with a link to your website. [Become a sponsor]

License

FOSSA Status