A concurrent web crawler written in Go. It crawls all internal pages of a single website, extracts basic page data (heading, first paragraph, outgoing links, image URLs), and writes the result to a JSON report.
- Go 1.26 or newer
go build -o crawler./crawler BASE_URL MAX_CONCURRENCY MAX_PAGESBASE_URL— the site to crawl (e.g.https://example.com)MAX_CONCURRENCY— max number of pages fetched in parallelMAX_PAGES— stop after this many pages have been crawled
The crawler only follows links on the same domain as BASE_URL.
./crawler https://example.com 5 50This writes a report.json file in the current directory containing one entry
per crawled page:
[
{
"url": "https://example.com/",
"heading": "Example Heading",
"first_paragraph": "First paragraph text...",
"outgoing_links": ["https://example.com/about"],
"image_urls": ["https://example.com/logo.png"]
}
]go test ./...