Skip to content

myslak71/web-crawler

Repository files navigation

Web Crawler

Build Status Coverage Status image

Description

Simple web crawling package.

Starts crawling from given domain_url. Visits every html link within specified domain via HTTP/HTTPS, collects each site title and links and repeats the process for collected links.

Returns dictionary of dictionaries as follows:

{
    'http://0.0.0.0:8000': {
    'title': 'Index',
    'links': {'http://0.0.0.0:8000/example.html', 'http://0.0.0.0:8000/site.html'}
    }
    ...
}

Installation

pip install git+https://github.com/myslak71/web_crawler.git

Usage

In scripts:

from web_crawler import site_map
site_map(url)

CLI:

$ web-crawler --url URL
OPTION DESCRIPTION
-u, --url REQUIRED Domain URL to start crawling from
-h, --help OPTIONAL Help

About

Web crawler - collects links and titles.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published