Skip to content

HTTPS clone URL

Subversion checkout URL

You can clone with HTTPS or Subversion.

Download ZIP
A very simple example of using RethinkDB along with the Extraction library for a simpler crawler.
Python
branch: master

Fetching latest commit…

Cannot retrieve the latest commit at this time

Failed to load latest commit information.
rethinkdb_extraction
.gitignore
LICENSE.txt
README.rst
requirements.txt
setup.py

README.rst

RethinkDB-Extraction

An extremely simple example of using RethinkDB in Python, along with the extraction module to create a database of crawled HTML pages and the extracted data.

Usage

There isn't really very much here beyond syntax examples, but to run those, first install RethinkDB (maybe using these very easy instructions), and then do this:

git clone https://github.com/lethain/rethinkdb-extraction.git
cd rethinkdb-extraction
virtualenv .
. ./bin/activate
pip install -r requirements.txt
python rethinkgdb_extraction/crawl.py

That will crawl a few pages, and load them into a local RethinkDB instance.

Something went wrong with that request. Please try again.