Email Extractor

With God speed
Extract email addresses from entire website, by crawling URLS.

Just 10 kilo bytes binary, takes just 10 seconds to extract emails from 1000 urls.

A free utility to extract email address by crawling a given url upto a given depth or number of urls to crawl provided by the user. Web email extractor by url crawl using command line interface. Email addresses can be extracted from any url. First it extracts all the number of urls provided by the user and at the same time extracts emails using simple Go routines. This simple application allows to crawl through a website with depth options.

Quick Setup: One command to install lighweight binary.

Blazing speeds: Extracts by crawling URLS in parallel with light weight cpu.

Crawling capability: Crawls entire page, finds the links and extracts email addresses.

Beautiful: Colorful output with write to file option.

Dependency Free: No need to install any dependencies from pip, npm. Just download and run.

C.I

Install

curl -sL https://raw.githubusercontent.com/kevincobain2000/email_extractor/master/install.sh | sh
mv email_extractor /usr/local/bin/

Usage

Simple usage

email_extractor -url=kevincobain2000.github.io

Advanced usages

# Do not crawl urls
email_extractor -depth=0 -url=kevincobain2000.github.io

# write emails to a file
email_extractor -out=marketing.txt -url=kevincobain2000.github.io

#extract from 100 urls
email_extractor -limit-urls=100 -url=kevincobain2000.github.io

All Options

  -depth int
    	depth of urls to crawl.
    	-1 for url provided & all depths (both backward and forward)
    	0  for url provided (only this)
    	1  for url provided & until first level (forward)
    	2  for url provided & until second level (forward) (default -1)
  -ignore-queries
    	ignore query params in the url
    	Note: pagination links are usually query params
    	Set it to false, if you want to crawl such links
    	 (default true)
  -limit-emails int
    	limit of emails to crawl (default 1000)
  -limit-urls int
    	limit of urls to crawl (default 1000)
  -out string
    	file to write to (default "emails.txt")
  -parallel
    	crawl urls in parallel (default true)
  -sleep int
    	sleep in milliseconds before each request to avoid getting blocked
  -timeout int
    	timeout limit in milliseconds for each request (default 10000)
  -url string
    	url to crawl
  -version
    	prints version

Samples

Performance

It crawled 1000 urls, and found 300 email addresses in about 11 seconds.

CHANGE LOG

v1.0 - Python implementation to extract email addresses by crawling URLS. Installation using pip.
v2.0 - 100x performance improvement by using goroutines
v2.5 - 2x performance improvement by not opening the same url again
v2.6 - Added depth of crawling urls
v2.8 - Limit emails addresses, and possible fix on relative urls
v2.10 - Adds hint for status code
v2.12 - Option to do in parallel and better messaging

Name		Name	Last commit message	Last commit date
Latest commit History 79 Commits
.github/workflows		.github/workflows
pkg		pkg
.gitignore		.gitignore
.golangci.yaml		.golangci.yaml
.goreleaser.yaml		.goreleaser.yaml
README.md		README.md
go.mod		go.mod
go.sum		go.sum
install.sh		install.sh
main.go		main.go

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

.github/workflows

.github/workflows

pkg

pkg

.gitignore

.gitignore

.golangci.yaml

.golangci.yaml

.goreleaser.yaml

.goreleaser.yaml

README.md

README.md

go.mod

go.mod

go.sum

go.sum

install.sh

install.sh

main.go

main.go

Repository files navigation

Email Extractor

C.I

Install

Usage

Samples

Performance

CHANGE LOG

About

Releases 14

Packages

Contributors 2

Languages

kevincobain2000/email_extractor

Folders and files

Latest commit

History

Repository files navigation

Email Extractor

C.I

Install

Usage

Samples

Performance

CHANGE LOG

About

Topics

Resources

Stars

Watchers

Forks

Languages