Web Crawler with Redis Graph

Read the blog post.

Web Crawler built with NodeJS. Fetch site data from a given URL and recursively follow links across the web.

Search the sites with either breadth first search, or depth first search.

Every URL will be saved to a Graph (using an adjacency list). The Graph is stored with Redis.

Installation

npm install --save redis-web-crawler

Usage

Run a local redis server to store output: $ redis-server

Create a new crawler instance and pass in a configuration object. Call the run method to begin crawling.

  import WebCrawler from 'redis-web-crawler';

  const crawlerSettings = {
    startUrl: 'https://en.wikipedia.org/wiki/Main_Page',
    followInternalLinks: false,
    searchDepthLimit: null,
    searchAlgorithm: "breadthFirstSearch",
  }

  var crawler = new WebCrawler(crawlerSettings);
  crawler.run();

Configuration Properties

Name	Type	Description
startUrl	`string`	A valid URL off a page with links.
followInternalLinks	`boolean`	Toggle searching through internal site links
searchDepthLimit	`integer`	Set a limit on the recursive URL requests
searchAlgorithm	`string`	"breadthFirstSearch" or "depthFirstSearch"

Exporting the Redis Graph

clone the Redis Dump Repo
run commands to install gem dependencies (refer to redis-dump/README)
with redis server up and running:
- note the slave and port of the redis-server (e.g. 6371)
- in project root folder, run ./bin/redis-dump -u 127.0.0.1:6371 > db_full.json
- view the Redis export in db_full.json

spencerlepine.com · GitHub @spencerlepine · Twitter @spencerlepine

Name		Name	Last commit message	Last commit date
Latest commit History 22 Commits
src		src
.gitignore		.gitignore
README.md		README.md
index.js		index.js
package-lock.json		package-lock.json
package.json		package.json

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

src

src

.gitignore

.gitignore

README.md

README.md

index.js

index.js

package-lock.json

package-lock.json

package.json

package.json

Repository files navigation

Web Crawler with Redis Graph

Installation

Usage

Configuration Properties

Exporting the Redis Graph

About

Languages

spencerlepine/web-crawler

Folders and files

Latest commit

History

Repository files navigation

Web Crawler with Redis Graph

Installation

Usage

Configuration Properties

Exporting the Redis Graph

About

Topics

Resources

Stars

Watchers

Forks

Languages