Skip to content
Fetching latest commit…
Cannot retrieve the latest commit at this time.
Failed to load latest commit information.


The "examples" folder contains a set of examples to showcase Bixo.

The DemoSimpleCrawlTool demonstrates a simple web crawler. You provide
it with a starting domain name (e.g. and a user agent name,
and it will crawl for the specified number of loops.

The DemoWebMiningTool is an example of a focused crawler with the emphasis on 
extracting (un)structured data from web pages. It demonstrates how 
to analyze the fetched data from a page using a DOM parser. The result is a text
file with one entry per line : 
	<soruce url>\t<extracted img url>\t<description text (if any) via the alt tag>.
( is a good starting point 
to understand focused crawling)

To build a job jar:

% cd <path to Bixo>/examples
% ant clean job

To run the job jar (for DemoCrawlTool and DemoWebMiningTool):

% hadoop jar build/bixo-examples-job-<x.y.z>.jar bixo.examples.crawl.DemoCrawlTool \
	-agentname <your agent name> -domain <target domain> -numloops <number of loops> -outputdir <results dir>

Note that you should pick a number of loops > 1, so that it crawls more than just the top-level
page of your target domain.

% hadoop jar build/bixo-examples-job-<x.y.z>.jar bixo.examples.webmining.DemoWebMiningTool \
	-agentname <your agent name> -workingdir <results dir>

To create an Eclipse project:

% cd <path to Bixo>/examples
% ant eclipse

after which you would import the project into your Eclipse workspace.

Something went wrong with that request. Please try again.