snitch2

A web crawler to search a domain for target links.

Installation

pip install snitch2

Getting Started

from snitch2 import snitch

starting_url = 'http://joelcolucci.com'
target_url = 'github.com'

results = snitch(starting_url, target_url)

Example results

{
    "start_url": "http://joelcolucci.com"
	"target_url": "github.com",
    "pages_crawled": 1,
	"guilty_total": 1,
	"guilty_results": [{
		"page_uri": "http://joelcolucci.com",
		"target_url": "github.com",
		"guilty_link": {
			"kind": "external",
			"text": "github.com/joelcolucci",
			"uri": "https://github.com/joelcolucci",
			"href": "https://github.com/joelcolucci",
			"domain": "joelcolucci.com",
			"type": "absolute"
		}
	}]
}

Be responsible

Snitch2 does not currently check robots.txt when crawling a domain. Please be responsible!

Name		Name	Last commit message	Last commit date
Latest commit History 58 Commits
snitch2		snitch2
tests		tests
.gitignore		.gitignore
.travis.yml		.travis.yml
LICENSE.md		LICENSE.md
README.md		README.md
requirements.txt		requirements.txt
run_tests.sh		run_tests.sh
setup.py		setup.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

snitch2

snitch2

tests

tests

.gitignore

.gitignore

.travis.yml

.travis.yml

LICENSE.md

LICENSE.md

README.md

README.md

requirements.txt

requirements.txt

run_tests.sh

run_tests.sh

setup.py

setup.py

Repository files navigation

snitch2

Installation

Getting Started

Be responsible

License

About

Releases 1

Packages

Languages

License

joelcolucci/snitch2

Folders and files

Latest commit

History

Repository files navigation

snitch2

Installation

Getting Started

Be responsible

License

About

Resources

License

Stars

Watchers

Forks

Languages