Skip to content

sunnypurewal/crawlbot-server

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

4 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Crawlbot Server

crawlbot-server is a web front-end for crawlbot.

Screenshot

Usage

const Server = require("crawlbot-server")
const server = new Server()

server.onHTML = (html, url) => {
  // Do something with the html here
}

server.listen((options) => {
  console.log(`Crawlbot Server started at ${options.host}:${options.port}`)
})

This will start the server on the default port of 9999. Visit http://localhost:9999 in your browser to begin. Check the console output for HTTP status codes and errors.

Notes

Crawlbot uses getsitemap under the hood which means it will only crawl websites that have valid Sitemaps.

It also uses hittp to make HTTP requests so it will automatically delay requests to the same hosts by 3 seconds so the server is not overloaded by crawlbot.

About

A simple web interface for the web crawling library 'crawlbot'

Resources

Stars

Watchers

Forks

Packages

No packages published