GitHub - serenity-valley/PyCrawler: A simple python web crawler

Branches Tags

Name		Name	Last commit message	Last commit date
Latest commit History 27 Commits
.gitignore		.gitignore
PyCrawler.py		PyCrawler.py
README		README
__init__.py		__init__.py

Repository files navigation

usage: PyCrawler.py [-h] [--dbname DBNAME] [--followextern] [--verbose]
                    [--striphtml] [--downloadstatic]
                    starturl crawldepth


positional arguments:
  starturl          The root URL to start crawling from.
  crawldepth        Number of levels to crawl down to before quitting. Default
                    is 10.

optional arguments:
  -h, --help        show this help message and exit
  --dbname DBNAME   The db file to be created for storing crawl data.
  --followextern    Follow external links(disabled by default).
  --verbose         Be verbose while crawling.
  --striphtml       Strip HTML tags from crawled content.
  --downloadstatic  Download static content.