Skip to content
Elegant Scraper and Crawler Framework for Golang
Go HTML
Branch: master
Clone or download
Permalink
Type Name Latest commit message Commit time
Failed to load latest commit information.
.github Added .github/ISSUE_TEMPLATE.md (optional) Nov 22, 2017
_examples Update go.mod github.com/gocolly/colly/v2 Dec 10, 2019
cmd/colly Update go.mod github.com/gocolly/colly/v2 Dec 10, 2019
debug Fix variable names in js and lock during Marshal Apr 11, 2019
extensions Update go.mod github.com/gocolly/colly/v2 Dec 10, 2019
proxy Update go.mod github.com/gocolly/colly/v2 Dec 10, 2019
queue Update go.mod github.com/gocolly/colly/v2 Dec 10, 2019
storage [mod] simplify the cookie layer in storage interface Mar 13, 2018
.codecov.yml turn off codecov comments Jan 6, 2018
.travis.yml [enh] update go versions for travis tests Nov 19, 2019
CHANGELOG.md [enh] release v2.0.0 Nov 28, 2019
CONTRIBUTING.md f Dec 10, 2019
LICENSE.txt [enh] add request & response callbacks ++ cookie handling ++ readme Sep 29, 2017
README.md [doc] update documentation to v2 Jan 4, 2020
VERSION [fix] version bump to fix #417 Jan 3, 2020
colly.go Move isDomainAllowed and robots.txt checks inside requestCheck Dec 25, 2019
colly_test.go Move isDomainAllowed and robots.txt checks inside requestCheck Dec 25, 2019
context.go [mod] add license header Mar 11, 2018
context_test.go [mod] add license header Mar 11, 2018
go.mod Update go.mod Dec 6, 2019
go.sum Update go.mod github.com/gocolly/colly/v2 Dec 10, 2019
htmlelement.go added ChildTexts method to htmlelement: returns the stripped text con… Oct 24, 2019
http_backend.go Merge pull request #351 from jbaxter-va/gzip-sitemaps Nov 15, 2019
request.go [fix] do not repeat cookies on request retry - fixes #362 Jan 13, 2020
response.go Merge pull request #318 from icamys/patch-1 Apr 8, 2019
unmarshal.go Add map unmarshal Feb 14, 2019
unmarshal_test.go Add map unmarshal Feb 14, 2019
xmlelement.go [fix] use latest htmlquery - closes #280 Feb 4, 2019
xmlelement_test.go Update go.mod github.com/gocolly/colly/v2 Dec 10, 2019

README.md

Colly

Lightning Fast and Elegant Scraping Framework for Gophers

Colly provides a clean interface to write any kind of crawler/scraper/spider.

With Colly you can easily extract structured data from websites, which can be used for a wide range of applications, like data mining, data processing or archiving.

GoDoc Backers on Open Collective Sponsors on Open Collective build status report card view examples Code Coverage FOSSA Status Twitter URL

Features

  • Clean API
  • Fast (>1k request/sec on a single core)
  • Manages request delays and maximum concurrency per domain
  • Automatic cookie and session handling
  • Sync/async/parallel scraping
  • Caching
  • Automatic encoding of non-unicode responses
  • Robots.txt support
  • Distributed scraping
  • Configuration via environment variables
  • Extensions

Example

func main() {
	c := colly.NewCollector()

	// Find and visit all links
	c.OnHTML("a[href]", func(e *colly.HTMLElement) {
		e.Request.Visit(e.Attr("href"))
	})

	c.OnRequest(func(r *colly.Request) {
		fmt.Println("Visiting", r.URL)
	})

	c.Visit("http://go-colly.org/")
}

See examples folder for more detailed examples.

Installation

go get -u github.com/gocolly/colly/v2/...

Bugs

Bugs or suggestions? Visit the issue tracker or join #colly on freenode

Other Projects Using Colly

Below is a list of public, open source projects that use Colly:

If you are using Colly in a project please send a pull request to add it to the list.

Contributors

This project exists thanks to all the people who contribute. [Contribute].

Backers

Thank you to all our backers! 🙏 [Become a backer]

Sponsors

Support this project by becoming a sponsor. Your logo will show up here with a link to your website. [Become a sponsor]

License

FOSSA Status

You can’t perform that action at this time.