A tool to crawl a site and log any resources that return a 404. Results are presented with a searchable todo-style checklist.
- Install Node
- Clone repo
git clone git@github.com:hudakdidit/site_crawler.git
- Install dependencies
npm install
- Setup config file: run
mv config-example.json config.json
. Update thesite
andport
properties as necessary. - Start by running
npm run crawl
to crawl the site you added in the last step. This will create the json 'database' (used as the data for the react front-end). Depending on the size of the site, the crawler may take some time so check your email and get coffee. A progress bar will indicate how far along the crawler is.
TODO
Start the crawler script.
npm run crawl
Start webpack and the express web server
npm start
Start webpack the express web server, and the web crawler
npm run dev-crawl
Start the express web server
npm run server