██████╗ ██╗ ██╗ ██████╗ ███████╗████████╗ ██████╗██████╗ █████╗ ██╗ ██╗██╗ ███████╗██████╗ ██╔════╝ ██║ ██║██╔═══██╗██╔════╝╚══██╔══╝██╔════╝██╔══██╗██╔══██╗██║ ██║██║ ██╔════╝██╔══██╗ ██║ ███╗███████║██║ ██║███████╗ ██║ ██║ ██████╔╝███████║██║ █╗ ██║██║ █████╗ ██████╔╝ ██║ ██║██╔══██║██║ ██║╚════██║ ██║ ██║ ██╔══██╗██╔══██║██║███╗██║██║ ██╔══╝ ██╔══██╗ ╚██████╔╝██║ ██║╚██████╔╝███████║ ██║ ╚██████╗██║ ██║██║ ██║╚███╔███╔╝███████╗███████╗██║ ██║ ╚═════╝ ╚═╝ ╚═╝ ╚═════╝ ╚══════╝ ╚═╝ ╚═════╝╚═╝ ╚═╝╚═╝ ╚═╝ ╚══╝╚══╝ ╚══════╝╚══════╝╚═╝ ╚═╝ Empowering Exploration: GhostCrawler - Open Source Intelligence for the DarkWeb
- ahmia
- darksearchio
- onionland
- notevil
- darksearchenginer
- phobos
- GhostCrawler2server
- torgle
- GhostCrawler2engine
- tordex
- tor66
- tormax
- haystack
- multivac
- evosearch
- deeplink
pip3 install GhostCrawler
git clone https://github.com/Pandya-mayur/GhostCrawler.git
cd GhostCrawler
python3 setup.py install
GhostCrawler2 "Ethical Hacking" --engines onionland torgle tor66 haystack
Help:
usage: GhostCrawler2 [-h] [--proxy PROXY] [--output OUTPUT]
[--continuous_write CONTINUOUS_WRITE] [--limit LIMIT]
[--engines [ENGINES [ENGINES ...]]]
[--exclude [EXCLUDE [EXCLUDE ...]]]
[--fields [FIELDS [FIELDS ...]]]
[--field_delimiter FIELD_DELIMITER] [--mp_units MP_UNITS]
search
positional arguments:
search The search string or phrase
optional arguments:
-h, --help show this help message and exit
--proxy PROXY Set Tor proxy (default: 127.0.0.1:9050)
--output OUTPUT Output File (default: output_$SEARCH_$DATE.txt), where $SEARCH is replaced by the first chars of the search string and $DATE is replaced by the datetime
--continuous_write CONTINUOUS_WRITE
Write progressively to output file (default: False)
--limit LIMIT Set a max number of pages per engine to load
--engines [ENGINES [ENGINES ...]]
Engines to request (default: full list)
--exclude [EXCLUDE [EXCLUDE ...]]
Engines to exclude (default: none)
--fields [FIELDS [FIELDS ...]]
Fields to output to csv file (default: engine name link), available fields are shown below
--field_delimiter FIELD_DELIMITER
Delimiter for the CSV fields
--mp_units MP_UNITS Number of processing units (default: core number minus 1)
[...]
To request all the engines for the word "computer":
GhostCrawler2 "computer"
To request all the engines excepted "Ahmia" and "Candle" for the word "computer":
GhostCrawler2 "computer" --exclude ahmia candle
To request only "Tor66", "DeepLink" and "Phobos" for the word "computer":
GhostCrawler2 "computer" --engines tor66 deeplink phobos
The same as previously but limiting to 3 the number of pages to load per engine:
GhostCrawler2 "computer" --engines tor66 deeplink phobos --limit 3
Please kindly note that the list of supported engines (and their keys) is given in the script help (-h).
By default, the file is written at the end of the process. The file will be csv formatted, containing the following columns:
"engine","name of the link","url"
- Onion Crawler (.onion)
- Returns Page title and address with a short description about the site
- Save links to database
- Get data from site
- Save crawl info to JSON file
- Crawl custom domains
- Check if the link is live
- Built-in Updater
- Build visual tree of link relationship that can be quickly viewed or saved to an image file
...(will be updated)
- Tor
- Python ^3.8
- Golang 1.19
- Poetry
(see requirements.txt for more details)
- https://github.com/KingAkeem/gotor (This service needs to be ran in tandem with GhostCrawler)
Before you run the GhostCrawler make sure the following things are done properly:
- Run the tor service:
sudo service tor start
- Make sure that your torrc is configured to SOCKS_PORT localhost:9050
git clone https://github.com/Pandya-mayur/GhostCrawler.git
cd GhostCrawler
- Open a new terminal and run:
cd gotor && go run cmd/main/main.go -server
- Install GhostCrawler Python requirements using poetry
poetry install # to install dependencies
poetry run python run.py -u https://www.example.com --depth 2 -v # example of running command with poetry
poetry run python run.py -h # for help
usage: Gather and analayze data from Tor sites. optional arguments: -h, --help show this help message and exit --version Show current version of TorBot. --update Update TorBot to the latest stable version -q, --quiet -u URL, --url URL Specifiy a website link to crawl -s, --save Save results in a file -m, --mail Get e-mail addresses from the crawled sites -p, --phone Get phone numbers from the crawled sites --depth DEPTH Specifiy max depth of crawler (default 1) --gather Gather data for analysis -v, --visualize Visualizes tree of data gathered. -d, --download Downloads tree of data gathered. -e EXTENSION, --extension EXTENSION Specifiy additional website extensions to the list(.com , .org, .etc) -c, --classify Classify the webpage using NLP module -cAll, --classifyAll Classify all the obtained webpages using NLP module -i, --info Info displays basic info of the scanned site`
- NOTE: -u is a mandatory for crawling
-
Ensure than you have a tor container running on port 9050.
-
Build the image using following command (in the root directory):
docker build -f docker/Dockerfile -t Pandya-mayur/GhostCrawler .
-
Run the container (make sure to link the tor container as
tor
):docker run --link tor:tor --rm -ti Pandya-mayur/GhostCrawler
On Linux platforms, you can make an executable for TorBot by using the install.sh script.
You will need to give the script the correct permissions using chmod +x install.sh
Now you can run ./install.sh
to create the GhostCrwaler binary.
Run ./GhostCrawler
to execute the program.
- Visualization Module Revamp
- Implement BFS Search for webcrawler
- Use Golang service for concurrent webcrawling
- Improve stability (Handle errors gracefully, expand test coverage and etc.)
- Randomize Tor Connection (Random Header and Identity)
- Keyword/Phrase search
- Social Media Integration
- Increase anonymity
- Improve performance (Done with gotor)
- Screenshot capture
I would like to express my gratitude to all the team members of The Linux boys and all the open source projects that have made this tool possible and have made recon tasks easier to accomplish.