crawler

Web crawler tool limited to one domain. When crawling example.com it would crawl all pages within the example.com domain, but not follow the links to Facebook or Instagram accounts or subdomains like other.example.com. Given a URL, it should output a site map, showing which static assets each page depends on, and the links between pages.

building

The Crawler project was developed using the Go language and it depends on the following Go packages:

code.google.com/p/go.net/html

All the above packages can be installed using the command:

go get -u <package_name>

Also, to easy run the project tests you will need the following:

Python3 - http://www.python.org/

Finally, to download and build the command line tool just use the following commands:

go get -u github.com/rafaeljusto/crawler
go build -o crawler github.com/rafaeljusto/crawler/app

deploying

To deploy the project you will need the program bellow.

FPM - https://github.com/jordansissel/fpm (Debian packages)

Name		Name	Last commit message	Last commit date
Latest commit History 23 Commits
app		app
.gitignore		.gitignore
.travis.yml		.travis.yml
AUTHORS		AUTHORS
ChangeLog		ChangeLog
LICENSE		LICENSE
README.md		README.md
cover-report.py		cover-report.py
crawler.go		crawler.go
crawler_test.go		crawler_test.go
gendeb.sh		gendeb.sh
types.go		types.go
types_test.go		types_test.go

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

app

app

.gitignore

.gitignore

.travis.yml

.travis.yml

AUTHORS

AUTHORS

ChangeLog

ChangeLog

LICENSE

LICENSE

README.md

README.md

cover-report.py

cover-report.py

crawler.go

crawler.go

crawler_test.go

crawler_test.go

gendeb.sh

gendeb.sh

types.go

types.go

types_test.go

types_test.go

Repository files navigation

crawler

building

deploying

About

Releases

Packages

Languages

License

rafaeljusto/crawler

Folders and files

Latest commit

History

Repository files navigation

crawler

building

deploying

About

Resources

License

Stars

Watchers

Forks

Languages