A lightweight and configurable web crawler built with Node.js.
This crawler recursively extracts links from websites using Node.js, Axios, and Cheerio. It respects depth limits and avoids duplicate visits for efficient crawling.
- Asynchronous operation for optimal performance
- Recursive link extraction with configurable depth
- Deduplication of visited URLs
- Targeted crawling capability (e.g., specific domain)
- Extensible codebase for easy customization
- Error handling and reporting
- Node.js
- Axios (HTTP requests)
- Cheerio (HTML parsing)
The crawler fetches and parses HTML using Axios and Cheerio, respectively. It maintains a set of visited URLs and recursively follows links within the configured depth limit and target domain. The process continues until all links are crawled or the maximum depth is reached.
- Clone the repository
- Install dependencies:
npm install
- Configure
MAX_DEPTH
andtargetDomain
incrawler.js
- Run:
node crawler.js
Contributions are welcome! Open issues or submit pull requests.