Navigation Menu

Skip to content

my8100/scrapydweb

Repository files navigation

๐Ÿ”ค English | ๐Ÿ€„ ็ฎ€ไฝ“ไธญๆ–‡

ScrapydWeb: Web app for Scrapyd cluster management, with support for Scrapy log analysis & visualization.

PyPI - scrapydweb Version PyPI - Python Version CircleCI codecov Coverage Status Downloads - total GitHub license Twitter

servers

Scrapyd โŒ ScrapydWeb โŒ LogParser

๐Ÿ“– Recommended Reading

๐Ÿ”— How to efficiently manage your distributed web scraping projects

๐Ÿ”— How to set up Scrapyd cluster on Heroku

๐Ÿ‘€ Demo

๐Ÿ”— scrapydweb.herokuapp.com

โญ Features

View contents
  • ๐Ÿ’  Scrapyd Cluster Management

    • ๐Ÿ’ฏ All Scrapyd JSON API Supported
    • โ˜‘๏ธ Group, filter and select any number of nodes
    • ๐Ÿ–ฑ๏ธ Execute command on multinodes with just a few clicks
  • ๐Ÿ” Scrapy Log Analysis

    • ๐Ÿ“Š Stats collection
    • ๐Ÿ“ˆ Progress visualization
    • ๐Ÿ“‘ Logs categorization
  • ๐Ÿ”‹ Enhancements

    • ๐Ÿ“ฆ Auto packaging
    • ๐Ÿ•ต๏ธโ€โ™‚๏ธ Integrated with ๐Ÿ”— LogParser
    • โฐ Timer tasks
    • ๐Ÿ“ง Monitor & Alert
    • ๐Ÿ“ฑ Mobile UI
    • ๐Ÿ” Basic auth for web UI

๐Ÿ’ป Getting Started

View contents

โš ๏ธ Prerequisites

โ— Make sure that ๐Ÿ”— Scrapyd has been installed and started on all of your hosts.

โ€ผ๏ธ Note that for remote access, you have to manually set 'bind_address = 0.0.0.0' in ๐Ÿ”— the configuration file of Scrapyd and restart Scrapyd to make it visible externally.

โฌ‡๏ธ Install

  • Use pip:
pip install scrapydweb

โ— Note that you may need to execute python -m pip install --upgrade pip first in order to get the latest version of scrapydweb, or download the tar.gz file from https://pypi.org/project/scrapydweb/#files and get it installed via pip install scrapydweb-x.x.x.tar.gz

  • Use git:
pip install --upgrade git+https://github.com/my8100/scrapydweb.git

Or:

git clone https://github.com/my8100/scrapydweb.git
cd scrapydweb
python setup.py install

โ–ถ๏ธ Start

  1. Start ScrapydWeb via command scrapydweb. (a config file would be generated for customizing settings at the first startup.)
  2. Visit http://127.0.0.1:5000 (It's recommended to use Google Chrome for a better experience.)

๐ŸŒ Browser Support

The latest version of Google Chrome, Firefox, and Safari.

โœ”๏ธ Running the tests

View contents
$ git clone https://github.com/my8100/scrapydweb.git
$ cd scrapydweb

# To create isolated Python environments
$ pip install virtualenv
$ virtualenv venv/scrapydweb
# Or specify your Python interpreter: $ virtualenv -p /usr/local/bin/python3.7 venv/scrapydweb
$ source venv/scrapydweb/bin/activate

# Install dependent libraries
(scrapydweb) $ python setup.py install
(scrapydweb) $ pip install pytest
(scrapydweb) $ pip install coverage

# Make sure Scrapyd has been installed and started, then update the custom_settings item in tests/conftest.py
(scrapydweb) $ vi tests/conftest.py
(scrapydweb) $ curl http://127.0.0.1:6800

# '-x': stop on first failure
(scrapydweb) $ coverage run --source=scrapydweb -m pytest tests/test_a_factory.py -s -vv -x
(scrapydweb) $ coverage run --source=scrapydweb -m pytest tests -s -vv --disable-warnings
(scrapydweb) $ coverage report
# To create an HTML report, check out htmlcov/index.html
(scrapydweb) $ coverage html

๐Ÿ—๏ธ Built With

View contents

๐Ÿ“‹ Changelog

Detailed changes for each release are documented in the ๐Ÿ”— HISTORY.md.

๐Ÿ‘จโ€๐Ÿ’ป Author


my8100

๐Ÿ‘ฅ Contributors


Kaisla

ยฉ๏ธ License

This project is licensed under the GNU General Public License v3.0 - see the ๐Ÿ”— LICENSE file for details.