Skip to content
Async Python 3.6+ web scraping micro-framework based on asyncio.
Branch: master
Clone or download
Latest commit 1ae3d76 May 26, 2019
Permalink
Type Name Latest commit message Commit time
Failed to load latest commit information.
docs Fixed: #66 May 26, 2019
examples v0.5.8 May 4, 2019
ruia Fixed: #66 May 26, 2019
tests Fixed: tests Feb 26, 2019
.gitignore Optimize the project code Mar 5, 2019
.travis.yml Fixed: test Feb 18, 2019
CODE_OF_CONDUCT.md Create CODE_OF_CONDUCT.md Feb 20, 2019
CONTRIBUTING.md Update: docs Feb 20, 2019
CONTRIBUTORS.txt Update: docs Feb 20, 2019
LICENSE Update LICENSE Mar 10, 2019
MANIFEST.in Update: docs Feb 20, 2019
Pipfile Optimize the project code Mar 5, 2019
README.md Fixed: #66 May 26, 2019
mkdocs.yml Add logo Feb 14, 2019
setup.cfg 🎉 v0.4.2 Jan 25, 2019
setup.py Update: docs Feb 20, 2019

README.md

travis codecov PyPI - Python Version PyPI Downloads gitter

Overview

Ruia is an async web scraping micro-framework, written with asyncio and aiohttp, aims to make crawling url as convenient as possible.

Write less, run faster:

Features

  • Easy: Declarative programming
  • Fast: Powered by asyncio
  • Extensible: Middlewares and plugins
  • Powerful: JavaScript support

Installation

# For Linux & Mac
pip install -U ruia[uvloop]

# For Windows
pip install -U ruia

# New features
pip install git+https://github.com/howie6879/ruia

Tutorials

  1. Overview
  2. Installation
  3. Define Data Items
  4. Spider Control
  5. Request & Response
  6. Customize Middleware
  7. Write a Plugins

TODO

  • Cache for debug, to decreasing request limitation
  • Distributed crawling/scraping

Contribution

Ruia is still under developing, feel free to open issues and pull requests:

  • Report or fix bugs
  • Require or publish plugins
  • Write or fix documentation
  • Add test cases

Thanks

You can’t perform that action at this time.