Go Web Crawler

A simple and concurrent web crawler written in Go.

Description

This project is a web crawler that takes a URL as input and crawls all the pages within the same domain. It is designed to be fast and efficient by using goroutines to crawl pages concurrently.

The crawler will output a report of the crawled pages and the number of internal links pointing to each page. The results can also be exported to a CSV file for further analysis.

Features

Concurrent crawling using goroutines
Control the maximum concurrency
Limit the number of pages to crawl
Export crawl results to a CSV file
Export detailed analysis to a CSV file

Getting Started

Prerequisites

Go 1.21 or higher

Installation

Clone the repository:

git clone https://github.com/stkisengese/go-web-crawler.git

Navigate to the project directory:

cd go-web-crawler

Build the project:

go build

Usage

To run the web crawler, use the following command:

./go-web-crawler <URL> <maxConcurrency> <maxPages> [options]

Arguments:

URL: The starting URL to crawl (e.g., https://example.com)
maxConcurrency: The maximum number of concurrent requests (e.g., 5)
maxPages: The maximum number of pages to crawl (e.g., 100)

Options:

--csv FILE: Export results to a CSV file.
--detailed-csv FILE: Export detailed analysis to a CSV file.

Example:

./go-web-crawler https://example.com 5 100 --csv results.csv

Contributing

Contributions are welcome! Please feel free to submit a pull request or open an issue.

License

This project is licensed under the MIT License. See the LICENSE file for details.

Name		Name	Last commit message	Last commit date
Latest commit History 65 Commits
crawler		crawler
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
go.mod		go.mod
go.sum		go.sum
main.go		main.go

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Go Web Crawler

Description

Features

Getting Started

Prerequisites

Installation

Usage

Contributing

License

About

Uh oh!

Releases

Packages

Languages

License

stkisengese/go-web-crawler

Folders and files

Latest commit

History

Repository files navigation

Go Web Crawler

Description

Features

Getting Started

Prerequisites

Installation

Usage

Contributing

License

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages