Page Prowler

Page Prowler is a tool designed to find and extract links from websites based on specified terms. It allows direct interaction through the command-line interface or the initiation of the Echo web server, which exposes an API. This API utilizes the Asynq library to manage queued crawl jobs.

Usage

page-prowler [command]

Commands

api: Starts the API server.
matchlinks: Crawls specific websites and extracts matchlinks that match the provided terms. Can be run from the command line or via a POST request to /v1/matchlinks on the API server.
clearlinks: Clears the Redis set for a given siteid.
getlinks: Gets the list of links for a given siteid.
worker: Starts the Asynq worker.
help: Displays help about any command.

Building

To install Page Prowler, clone the repository and build the binary using the following commands:

git clone https://github.com/jonesrussell/page-prowler.git
cd page-prowler
go build

Alternatively, you can use the provided Makefile to build the project:

make all

This command will run fmt, lint, test, and build targets defined in the Makefile.

Command Line

To search for matchlinks from the command line, use the following command:

./page-prowler matchlinks --url="https://www.example.com" --searchterms="keyword1,keyword2" --siteid=siteID --maxdepth=1 --debug

Replace "https://www.example.com" with the URL you want to crawl, "keyword1,keyword2" with the search terms you want to look for, siteID with your site ID, and 1 with the maximum depth of the crawl.

API

To start the API server, use the following command:

./page-prowler api

Then, you can send a POST request to start a crawl:

curl -X POST -H "Content-Type: application/json" -d '{
 "URL": "https://www.example.com",
 "SearchTerms": "keyword1,keyword2",
 "CrawlSiteID": "siteID",
 "MaxDepth": 3,
 "Debug": true
}' http://localhost:3000/matchlinks

Again, replace "https://www.example.com" with the URL you want to crawl, "keyword1,keyword2" with the search terms you want to look for, siteID with your site ID, and 3 with the maximum depth of the crawl.

Configuration

Page Prowler uses a .env file for configuration. You can specify the Redis host and password in this file. For example:

REDIS_HOST=localhost
REDIS_PORT=6379
REDIS_AUTH=yourpassword

Contributing

Contributions are welcome! Please feel free to submit a pull request.

License

This project is licensed under the MIT License. See the LICENSE file for details.

Name		Name	Last commit message	Last commit date
Latest commit History 436 Commits
.devcontainer		.devcontainer
.github		.github
.vscode		.vscode
cmd		cmd
internal		internal
mocks		mocks
.env.example		.env.example
.gitignore		.gitignore
.golangci.yml		.golangci.yml
Dockerfile		Dockerfile
LICENSE		LICENSE
Makefile		Makefile
README.md		README.md
SECURITY.md		SECURITY.md
api.yaml		api.yaml
go.mod		go.mod
go.sum		go.sum
main.go		main.go
qodana.yaml		qodana.yaml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Page Prowler

Usage

Commands

Building

Command Line

API

Configuration

Contributing

License

About

Releases

Packages

Contributors 2

Languages

License

jonesrussell/page-prowler

Folders and files

Latest commit

History

Repository files navigation

Page Prowler

Usage

Commands

Building

Command Line

API

Configuration

Contributing

License

About

Topics

Resources

License

Security policy

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Packages