Skip to content
Fetching latest commit…
Cannot retrieve the latest commit at this time.
..
Failed to load latest commit information.
bin
doc
lib
src
README
build.properties
build.xml
pom.xml

README

The "examples" folder contains a set of examples to showcase Bixo.

The SimpleCrawlTool demonstrates a simple web crawler. You provide
it with a starting domain name (e.g. cnn.com) and a user agent name,
and it will crawl for the specified number of loops.

The WebMiningTool demonstrates a focused crawler with a focus on 
extracting (un)structured data from web pages. It demonstrates how 
to analyze the fetched data from a page using a DOM parser.  


To build a job jar:

% cd <path to Bixo>/examples
% ant clean job

To run the job jar:

% hadoop jar build/bixo-examples-job-1.0-SNAPSHOT.jar bixo.examples.crawl.SimpleCrawlTool \
	-agentname <your agent name> -domain <target domain> -numloops <number of loops> -outputdir <results dir>

Note that you should pick a number of loops > 1, so that it crawls more than just the top-level
page of your target domain.

To create an Eclipse project:

% cd <path to Bixo>/examples
% ant eclipse

after which you would import the project into your Eclipse workspace.

Something went wrong with that request. Please try again.