Skip to content
a multi-threaded spider with a web interface
Find file
Fetching latest commit…
Cannot retrieve the latest commit at this time.
Failed to load latest commit information.
spider
LICENSE
MANIFEST.in
README.rst
runtests.py
setup.py

README.rst

django-spider

a multi-threaded spider with a web interface

http://charlesleifer.com/media/images/photos/grass-spider.png

list of sessions for a site

http://charlesleifer.com/media/images/photos/spider_session.png

session detail

http://charlesleifer.com/media/images/photos/spider_detail.png

dependencies:

running

first, make sure you pip install the requirements:

pip install httplib2
pip install lxml
pip install -e git+https://github.com/coleifer/django-utils.git#egg=djutils
pip install -e git+https://github.com/coleifer/django-spider.git#egg=spider

add djutils and spider to your settings file and make sure you run manage.py syncdb.

add spider.urls to your root urlconf:

from django.conf import settings
from django.conf.urls.defaults import *
from django.contrib import admin

admin.autodiscover()

urlpatterns = patterns('',
    url(r'^admin/', include(admin.site.urls)),
    url(r'', include('spider.urls')),
)

make sure the media in the spider app is copied into your static media directory.

start up the task queue:

# assume your cwd is the root dir of virtualenv
export DJANGO_SETTINGS_MODULE=mysite.settings
./bin/python ./src/djutils/djutils/queue/bin/consumer.py start -l ./logs/queue.log -p ./run/queue.pid
Something went wrong with that request. Please try again.