Sitemap-based web crawler that efficiently searches for specific phrases across a website and logs results.
Sitemaps are fantastic resources, but manually combing through them is tedious. I wanted a quick way to find specific content patterns within a website's structure. Tracebound does just that. It leverages the sitemap to crawl all linked pages efficiently, hunting for any phrase or keyword I specify. It's been a fun little experiment in focused web crawling!
pip install requirements.txt
python3 main.py
- Regular Expression Support
- Fuzzy Search
- CSV Export
- Web Interface
This project is licensed under the MIT License. This means you are free to use, copy, modify, and distribute the software for any purpose, even commercial ones, as long as you include the copyright notice and license information.
Paul - - mail@fled.dev