Skip to content

magoole/scanner

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

40 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Banner

Magoole finder and crawler

Navigate through internet to find and crawl websites.

This repository uses DNS query, bruteforce like domain search, crawling and more technologies to find websites and crawl and reference them on Magoole.

Table of content

Credits and acknowledgement

How it works ?

The Magoole scanner is divided in two parts:

  • The finder: recover a list of websites urls
  • The crawler: crawl and analyse website content

Finder

Finder files are contained in finder/ folder.

  1. With given domain extensions, the finder will bruteforce and check every domain name.
  2. If a domain is found and a webserver is running, the url is added to a crawling queue.
  3. The finder will check every subdomain of the domain and apply the same tests.

Crawler

Crawler files are contained in crawler/ folder.

  1. The crawler take each queued website and perform http requests to recover his html pages.
  2. Using BM25, it tokenizes pages content and index them.
  3. Meta tags are analysed so images and medias can be referenced as well !
  4. Provides his work through a database.

Try locally:

  1. Install requirements

PyPI:

pip install -r requirements.txt
  1. Set up your mongodb passwords/url
    • If you have an account on Magoole Mongodb just do this
    touch .mongopass && echo "username:password" > .mongopass
    • Or modify MongoServer url on file finder/main.py and crawler/main.py.py at line client = pymongo.MongoClient(f"mongodb+srv://yourmongoserver")


3. Run the wanted python file
cd /path/to/repo && python3 finder/main.py crawler/main.py
  1. Happy scanning !

You can modify finder/config.json to configure search parameters:

{
   "DNS": {
      "domain_max_length": 253,
      "subdomains": false,
      "records": ["A", "AAAA"],
      "nameservers": ["1.1.1.1"],
      "max_recursion": 10
   },

   "THREADING": {
      "enabled": true,
      "threads": 8
   },
   "DOMAIN_EXTENSIONS": [".fr", ".com", ".eu.org", ".tech", ".info", ".dev"]
}

About

The scanner of magoole, scan the internet to reference websites on magoole

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published