A simple and concurrent web crawler written in Go.
This project is a web crawler that takes a URL as input and crawls all the pages within the same domain. It is designed to be fast and efficient by using goroutines to crawl pages concurrently.
The crawler will output a report of the crawled pages and the number of internal links pointing to each page. The results can also be exported to a CSV file for further analysis.
- Concurrent crawling using goroutines
- Control the maximum concurrency
- Limit the number of pages to crawl
- Export crawl results to a CSV file
- Export detailed analysis to a CSV file
- Go 1.21 or higher
- Clone the repository:
git clone https://github.com/stkisengese/go-web-crawler.git- Navigate to the project directory:
cd go-web-crawler- Build the project:
go buildTo run the web crawler, use the following command:
./go-web-crawler <URL> <maxConcurrency> <maxPages> [options]Arguments:
URL: The starting URL to crawl (e.g.,https://example.com)maxConcurrency: The maximum number of concurrent requests (e.g.,5)maxPages: The maximum number of pages to crawl (e.g.,100)
Options:
--csv FILE: Export results to a CSV file.--detailed-csv FILE: Export detailed analysis to a CSV file.
Example:
./go-web-crawler https://example.com 5 100 --csv results.csvContributions are welcome! Please feel free to submit a pull request or open an issue.
This project is licensed under the MIT License. See the LICENSE file for details.