SourceCodeCrawlsWithin

Crawl

Crawl is a simple web crawler written in Go that recursively crawls a website and prints all the URLs it finds. It uses the gocolly library for web scraping and URL parsing.

Features

Recursively crawls a website and prints all the URLs it finds.
Restricts crawling to a specific base URL to prevent crawling outside the target website.
Handles relative and absolute URLs.

Installation

Make sure you have Go installed on your system. If not, you can download it from the official website: https://golang.org/dl/
Clone this repository to your local machine:

git clone https://github.com/fusion212/SourceCodeCrawlsWithin.git

Change into the project directory:

cd crawl

Build the executable binary:

go build crawl.go

Usage

You can use the "crawl" tool in two ways:

Provide URLs as command-line arguments:

./crawl http://example.com https://example.com/page1

Pipe URLs from standard input:

echo -e 'http://example.com\nhttps://example.com/page1' | ./crawl

The tool will recursively crawl each URL and print all the URLs it finds on the same domain.

Name		Name	Last commit message	Last commit date
Latest commit History 6 Commits
README.md		README.md
crawl.go		crawl.go

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

crawl.go

crawl.go

Repository files navigation

SourceCodeCrawlsWithin

Crawl

Features

Installation

Usage

About

Releases

Packages

Contributors 2

Languages

fusion212/SourceCodeCrawlsWithin

Folders and files

Latest commit

History

README.md

README.md

crawl.go

crawl.go

Repository files navigation

SourceCodeCrawlsWithin

Crawl

Features

Installation

Usage

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Packages