Skip to content


Subversion checkout URL

You can clone with
Download ZIP
branch: master
Commits on Jun 29, 2015
  1. Merge pull request #5 from chrisv2/improve_auth

    improve authentication
Commits on Feb 8, 2013
  1. @chrisv2
  2. @chrisv2
Commits on Mar 28, 2011
  1. @acdha

    Quick hack for Django 1.3 staticfiles support

    acdha authored
    Django 1.3 introduces staticfiles which will raise an ImproperlyConfigured
    exception if you request a static file when settings.DEBUG is false. This
    avoids wasting time serving static media files but it would be nice if we could
    find an approach which would still allow us to check file existance.
Commits on Nov 4, 2010
  1. Added mailing list info.

Commits on Nov 3, 2010
  1. Small doc refactor.

  2. Small doc updates.

  3. Added support for authentication.

    Applies 2f7c79a3bd2ae6896b37 from test-utils.
Commits on Sep 24, 2010
  1. Fix docs.

Commits on Sep 10, 2010
  1. @acdha

    Bonehead typo removal

    acdha authored
    Let's hear it for this testing business. I hear it can save time…
  2. Fix imports.

  3. Fix

Commits on Sep 9, 2010
  1. @acdha

    Crawler: init request for accurate memory tracking

    acdha authored
    The very first URL requested will cause a big (2+MB) memory delta as some delayed loading happens. To avoid skewing memory usage reports the client will make a single initial request before the actual monitored spidering.
  2. @acdha

    Crawler: added a no-parent option to avoid ascending

    acdha authored
    This allows faster runs doing something like ` crawlurls --no-parent /subpage/` and avoiding URLs which do not start with /subpage/
  3. @acdha

    Crawler: adjusted log level for link crawling

    acdha authored
    This avoids tons of console output by default
  4. @acdha
  5. @acdha

    Crawler: query_count log file

    acdha authored
  6. @acdha

    Crawler: guppy plugin now uses human-readable sizes

    acdha authored
    The CSV file will still display bytes but now the console output is
  7. @acdha

    Crawler: uniform system for saving output

    acdha authored
    This provides a simple --output-dir option which all plugins can use to save data as makes sense. This would still benefit from an easy way for plugins to have their own configuration when necessary.
    This introduces a set_output_dir() method on Plugin which subclasses may use to open log files or otherwise initialize their output system - see the guppy plugin for a simple example.
  8. @acdha

    Crawler: guppy plugin simplification

    acdha authored
    Since we now only load the guppy plugin when the user requested it, it's
    better just to toss an import error if we can't load the guppy module.
  9. @acdha

    Crawler: better plugin activation mechanism

    acdha authored
    To avoid everything needing to be listed in this introduces a simple change: more things are disabled by default and each plugin module has a PLUGIN attribute to simplify loading with --enable-plugins.
    Now enabled by default: time, pdb, urlconf
Commits on Sep 8, 2010
  1. @acdha

    crawler: disable sanitize module by default

    acdha authored
    BeautifulSoup is useful but slow. This allows you to avoid the cost
    unless you choose to use --enable-plugin=sanitize
  2. @acdha

    Crawler: added a query count plugin

    acdha authored
    This plugin logs at varying levels of severity the number of database
    queries executed by a view: crawlurls --enable-plugin=query_count
Something went wrong with that request. Please try again.