Skip to content

A very simple Scrapy email hunter. It only searches up to the first result on each website. No JS support.

Notifications You must be signed in to change notification settings

p371k9/mailhunt

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

17 Commits
 
 
 
 
 
 
 
 

Repository files navigation

mailhunt

This is a very simple Scrapy crawler and is used to scrape off emails. It does not extract JavaScript-generated content and only searches until the first result.

If we are only searching on one website:

scrapy crawl hunter -a url=https://bieler-lang.de/

If on multiple sites:

scrapy crawl hunter -a list=teszt/urls.lll -o list.csv

Where .lll is a list file - in other words a single-column headless .csv.

The script only searches web pages that match the strings specified in the settings.py RLS variable. But the RLS list can be changed from the command line just like any Scrapy global. E.g.:

scrapy crawl hunter -a url=https://bieler-lang.de/ -s RLS="['contact', 'about', 'kontakt', 'über']"

or:

scrapy crawl hunter -a list=path/to/list.de -s RLS="['contact', 'about', 'kontakt', 'über']" -o de.csv

Tested with Scrapy 2.5.1

About

A very simple Scrapy email hunter. It only searches up to the first result on each website. No JS support.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published