Skip to content
Async Python 3.6+ web scraping micro-framework based on asyncio.
Branch: master
Clone or download
Latest commit 205bae2 Mar 13, 2019
Permalink
Type Name Latest commit message Commit time
Failed to load latest commit information.
docs Optimize the project code Mar 5, 2019
examples Optimize the project code Mar 5, 2019
ruia
tests
.gitignore Optimize the project code Mar 5, 2019
.travis.yml Fixed: test Feb 18, 2019
CODE_OF_CONDUCT.md
CONTRIBUTING.md Update: docs Feb 20, 2019
CONTRIBUTORS.txt Update: docs Feb 20, 2019
LICENSE Update LICENSE Mar 10, 2019
MANIFEST.in Update: docs Feb 20, 2019
Pipfile
README.md
mkdocs.yml Add logo Feb 14, 2019
setup.cfg 🎉 v0.4.2 Jan 25, 2019
setup.py Update: docs Feb 20, 2019

README.md

travis codecov PyPI - Python Version PyPI gitter

Overview

Ruia is an async web scraping micro-framework, written with asyncio and aiohttp, aims to make crawling url as convenient as possible.

Write less, run faster:

Features

  • Easy: Declarative programming
  • Fast: Powered by asyncio
  • Extensible: Middlewares and plugins
  • Powerful: JavaScript support

Installation

# For Linux & Mac
pip install -U ruia[uvloop]

# For Windows
pip install -U ruia

# New features
pip install git+https://github.com/howie6879/ruia

Tutorials

  1. Overview
  2. Installation
  3. Define Data Items
  4. Spider Control
  5. Request & Response
  6. Customize Middleware
  7. Write a Plugins

TODO

  • Cache for debug, to decreasing request limitation
  • Distributed crawling/scraping

Contribution

Ruia is still under developing, feel free to open issues and pull requests:

  • Report or fix bugs
  • Require or publish plugins
  • Write or fix documentation
  • Add test cases

Thanks

You can’t perform that action at this time.