Skip to content

Sitemap-based web crawler that efficiently searches for specific phrases across a website and logs results.

License

Notifications You must be signed in to change notification settings

fled-dev/Tracebound

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

17 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Tracebound

Sitemap-based web crawler that efficiently searches for specific phrases across a website and logs results.

Table of Contents

About the Project

Sitemaps are fantastic resources, but manually combing through them is tedious. I wanted a quick way to find specific content patterns within a website's structure. Tracebound does just that. It leverages the sitemap to crawl all linked pages efficiently, hunting for any phrase or keyword I specify. It's been a fun little experiment in focused web crawling!

Screenshots

image

Getting Started

Installation

pip install requirements.txt

Run Locally

python3 main.py

Roadmap

  • Regular Expression Support
  • Fuzzy Search
  • CSV Export
  • Web Interface

License

This project is licensed under the MIT License. This means you are free to use, copy, modify, and distribute the software for any purpose, even commercial ones, as long as you include the copyright notice and license information.

Contact

Paul - - mail@fled.dev

About

Sitemap-based web crawler that efficiently searches for specific phrases across a website and logs results.

Resources

License

Stars

Watchers

Forks

Packages

No packages published

Languages