Skip to content

flerro/simple-crawler

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

13 Commits
 
 
 
 
 
 
 
 

Repository files navigation

Crawler

A multi-threaded recursive crawler, implemented with apache camel. Crawling ability is really basic, only text/html content will be downloaded and stored downloaded locally.

Build

Tweak configuration in application.properties and use maven for building.

Usage

Package the fat JAR, from project directory:

mvn package

Then start crawler with:

java -jar target/scraper-0.0.1.jar http://mywebsite.com 

Pages will be saved to output folder in local directory, to stop crawling use CTRL+C

About

A multi-threaded recursive web crawler

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages