NightCrawler v1.5

NightCrawler Web Scraper

NightCrawler is a tool developed on Python 2.7 (and 3.5) that we can use as a footprinting tool against our own web pages.

The tool tries to fetch:

Enumerates and stores on a list [CHILD_LINKS] all the links finded from a ROOT node and under the same ROOT domain
A list [EMAIL_ACCOUNTS] with all the email accounts that the parser finds on the visited pages
A list [TEL_NUMS] with all the telephone numbers that the parser have found

NightCrawler is Object Oriented (a Class), so you can reuse the code for make a better (more interesting?) program. The code at this point only follows links to web pages, not to binary files, but you can reuse the data that the "crawler" object generates through the Class attributes: CHILD_LINKS, BROKEN_LINKS, EMAIL_ACCOUNTS, TEL_NUMS

Any improvement and comments to the code would be apreciate it.

Usage:

You only need to change on line 18 of the code the base url parameter named ROOT (global variable):

ROOT = "http://www.google.com/"

Now just run the script: nightcrawler.py

Requires:

Module Requests: HTTP for Humans (can be installed with the "pip" command) http://docs.python-requests.org/en/master/

Experimental: - Added nc_gui.py as a GUI for nightcrawler.py (Not finished yet!)

Name		Name	Last commit message	Last commit date
Latest commit History 22 Commits
2.7		2.7
3.5		3.5
LICENSE		LICENSE
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

2.7

2.7

3.5

3.5

LICENSE

LICENSE

README.md

README.md

Repository files navigation

NightCrawler v1.5

NightCrawler Web Scraper

About

Releases

Packages

Languages

License

luckyr13/nightcrawler

Folders and files

Latest commit

History

Repository files navigation

NightCrawler v1.5

NightCrawler Web Scraper

About

Resources

License

Stars

Watchers

Forks

Languages