Skip to content

ianbjorndilling/webvis

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

4 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

webvis

A visualy stimulating web crawler. Displays a live updating graph of all domains visited.

How to Build

Requires leiningen.

Run lein uberjar or lein bin

Usage

CLI

./target/webvis <URL> [-h --help] [-d --depth] [-b --blacklist] [-w --workers] [-c --concurrency]

The crawler will begin crawling at the url provided.

If a depth is specified, it will crawl no more than the specified depth, where the root domain is of depth 0. The depth is the number of domains crawled from the root domain, not the number of URLS crawled from the root URL.

The number of worker threads and the maximum number of concurrent requests can also be set. Usually only 1 worker is needed. A lower request concurrency is preferable so the crawler won't overload any servers. The default number of workers is 1 and the default maximum number of requests is 2.

Blacklisted domains will not be crawled.

REPL

To create a spider:

create-spider: [max-concurrent-reqs blacklist] [max-concurrent-reqs]

(def spider (create-spider 2 [facebook.com yahoo.com]))

The spider can then begin crawling with:

build-web: [spider url worker-count max-depth]

(build-web spider "http://example.com" 1 4)

A max depth of -1 will cause the spider to crawl forever.

To stop the spider from crawling:

(freeze! spider)

This will remove all workers. The spider will start back again once another worker is added.

(spawn-worker spider)

To kill a spider, rendering it forever unusable:

;; eek!
(kill! spider)

License

Copyright © 2015 FIXME

Distributed under the Eclipse Public License v1.0 (https://www.eclipse.org/legal/epl-v10.html)

About

A visual web crawler.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published