Skip to content

jordillachmrf/stormcrawler

Repository files navigation

This has been generated by the StormCrawler Maven Archetype as a starting point for building your own crawler. Have a look at the code and resources and modify them to your heart's content.

With Storm installed, you must first generate an uberjar:

mvn clean package

before submitting the topology using the storm command:

storm jar target/stormcrawler-1.0-SNAPSHOT.jar com.marfeel.CrawlTopology -conf crawler-conf.yaml -local

This will run the topology in local mode. Simply remove the '-local' to run the topology in distributed mode.

You can also use Flux to do the same:

storm jar target/stormcrawler-1.0-SNAPSHOT.jar  org.apache.storm.flux.Flux --local crawler.flux --sleep 86400000

Note that in local mode, Flux uses a default TTL for the topology of 60 secs. The command above runs the topology for 24 hours.

It is best to run the topology with --remote to benefit from the Storm UI and logging. In that case, the topology runs continuously, as intended.

stormcrawler

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published