Skip to content
A web crawler/scraper to find the broken links in the targeted seed url based on the keywords matched around that broken links.
Python CSS JavaScript
Branch: master
Clone or download

Latest commit

Fetching latest commit…
Cannot retrieve the latest commit at this time.

Files

Permalink
Type Name Latest commit message Commit time
Failed to load latest commit information.
rottoscraper
.gitignore
LICENSE
README.md
fabfile.py
requirements.txt
setup.py

README.md

Rotto-Links-Scraper

A web crawler/scraper to find the broken links in the targeted seed url based on the keywords matched in the broken links contained page .

##Installation

  1. Redis
  2. Fabric
  3. Python 2.7+

##Instructions

  1. First install all dependencies listed in requirements.txt using pip package manager :
    $ pip install -r requirements.txt
  1. Set the DATABASE_PATH environment variables (i.e SMTP_USER, SMTP_PASSWORD) in your shell config file(i.e .bashrc , .zshrc or etc)
    # your shell config file
    export DATABASE_PATH='/path/to/database/'
  1. Also, set the two more environment variables required for SMTP Server for sending email to users in your shell config file.
    # your shell config file
    export SMTP_USER='smtp-username'
    export SMTP_PASSWORD='smtp-password'
  1. Also, set the one more environmnet variable to save Logs of the app in defined location.
    # your shell config file
    export LOGS_DIR='path/to/logs'

##Commands Note:- First install Fabric to run below commands

To run a gui app :

    $ fab app

To run a dispatcher :

    $ fab dispatcher

To run a worker :

    $ fab worker

##Developer

  1. Akshay Pratap Singh
  2. Sunny Gupta
You can’t perform that action at this time.