Htcrawl is nodejs module for the recursive crawling of single page applications (SPA) using javascript
Branch: master
Clone or download
Fetching latest commit…
Cannot retrieve the latest commit at this time.
Permalink
Type Name Latest commit message Commit time
Failed to load latest commit information.
.gitignore
README.md
main.js
options.js
package.json
probe.js
shingleprint.js
utils.js

README.md

HTCRAWL

Htcrawl is nodejs module for the recursive crawling of single page applications (SPA) using javascript.
It uses headless chrome to load and analyze web applications and it's build on top of Puppetteer from wich it inherits all the functionalities.

With htcrawl you can roll your own DOM-XSS scanner with less than 60 lines of javascript!!

More infos at htcrawl.org.

SAMPLE USAGE

const htcrawl = require('htcrawl');
const crawler = await htcrawl.launch("https://htcrawl.org");

// Print out the url of ajax calls
crawler.on("xhr", e => {
  console.log("XHR to " + e.params.request.url);
});

// Start crawling!
crawler.start();

DOCUMENTATION

API documentation can be found at https://htcrawl.org/api/.

LICENSE

This program is free software; you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation; either version 2 of the License, or(at your option) any later version.

ABOUT

Written by Filippo Cavallarin. This project is son of Htcap (https://github.com/fcavallarin/htcap | https://htcap.org).