Burrow

Because gophers don't crawl. They burrow.

An experiment in writing a document crawler in Go.

The API aims to be expressive yet succinct. Take for example the task of crawling through html documents using anchor hrefs:

crawl.Through(urlsUsingAnchor).BeginWith(seedUrls, crawledUrlSink)

Note that the current example sever implementation does not persist crawled entities to disk but rather keeps a pool of urls and polls and removes them as the reqeust multiplexer deems fit. Therefore if scalibility is a concern and you expect more than urlSinkSize concurrent requests I would recommend using an actual crawling engine.

Name		Name	Last commit message	Last commit date
Latest commit History 13 Commits
crawl		crawl
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
serve.go		serve.go

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Burrow

About

Releases

Packages

Languages

License

hpxro7/burrow

Folders and files

Latest commit

History

Repository files navigation

Burrow

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages