URL Extractor

A simple, fast, and asynchronous web crawler to extract all URLs from a website.

Features

Asynchronous Crawling: Uses asyncio and aiohttp for fast, concurrent crawling.
Subdomain Matching: Can crawl and extract URLs from the main domain and its subdomains.
Max Pages Limit: Allows setting a maximum number of pages to crawl.
Command-Line Interface: Provides a simple CLI to specify the start URL, max pages, and number of workers.
Graceful Shutdown: Ensures that the crawler stops gracefully when the maximum number of pages is reached or when all URLs have been processed.

Name		Name	Last commit message	Last commit date
Latest commit History 13 Commits
url_extractor		url_extractor
web		web
.gitignore		.gitignore
.python-version		.python-version
GEMINI.md		GEMINI.md
README.md		README.md
graph_viewer.html		graph_viewer.html
main.py		main.py
pyproject.toml		pyproject.toml
uv.lock		uv.lock