Skip to content
master
Switch branches/tags
Code

Latest commit

 

Git stats

Files

Permalink
Failed to load latest commit information.
Type
Name
Latest commit message
Commit time
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

torspider

It does things that crawl Tor.

Initial ideas inspired by terrible jokes on Discord about Tor analytics. Lots of help with not reinventing the code for the crawling wheel comes from this crawler.

Licence is AGPL.

Notes

  • docker-compose run --rm spider python init_db.py - Init the DB
  • docker-compose up --scale spider=4 -d brings some nice multispider crawling
  • Rebloom is a required Redis module for duplicate URL filtering.
  • It is assumed that POSTGRES_URL is a bouncer that does its own pooling such as pgbouncer/pgpool.
  • Postgres MUST be the database due to Postgres specific features.

About

public bad code that crawls tor for terrible homemade spaghetti analytics | "Great repository names are short and memorable. Need inspiration? How about urban-fiesta."

Resources

License

Releases

No releases published

Packages

No packages published