Permalink
Switch branches/tags
Nothing to show
Commits on Mar 8, 2012
Commits on Mar 7, 2012
Commits on Mar 6, 2012
  1. merge

    committed Mar 6, 2012
Commits on Jan 20, 2012
  1. Merge branch 'next'

    chriskite committed Jan 20, 2012
  2. bump version to 0.7.1

    chriskite committed Jan 20, 2012
  3. switch from robots to robotex

    chriskite committed Jan 20, 2012
  4. Merge pull request #44 from bernd/kyoto-filename-fix

    Fix default filename extension for the kyotocabinet adapter.
    chriskite committed Jan 20, 2012
  5. Merge branch 'next'

    chriskite committed Jan 20, 2012
  6. bump version to 0.7.0

    chriskite committed Jan 20, 2012
  7. add some contributors

    chriskite committed Jan 20, 2012
  8. URIs can contain invalid characters if they are built by the website …

    …using article titles or similar. That does work when clicked in a browser, it does however result in an invalid link when crawled by Anemone.
    
    I added URI.escape to avoid that. Tested on about 20 websites without issues.
    lpradovera committed with chriskite Apr 23, 2011
  9. Add support for base HTML element

    Add support in the Page's to_absolute method for the base HTML element.
    This way it can correctly convert relative links for a given page
    document.
    brutuscat committed with chriskite Nov 29, 2011
  10. Merge pull request #35 from spk/dev_dependencies

    Dev dependencies
    chriskite committed Jan 20, 2012
Commits on Oct 4, 2011
  1. Add bson_ext 1.3.1 on dev dependencies

    Signed-off-by: Laurent Arnoud <laurent@spkdev.net>
    spk committed Oct 4, 2011
  2. Use gemspec on Gemfile and add development dependencies

    Signed-off-by: Laurent Arnoud <laurent@spkdev.net>
    spk committed Oct 4, 2011
Commits on Sep 4, 2011
  1. removed 'monitor' requirement

    pokey909 committed Sep 4, 2011
  2. Added some documentation and code cleanup

    External queue storage can be activated via new option :large_scale_crawl
    
    Signed-off-by: Alexander Lenhardt <alenhard@techfak.uni-bielefeld.de>
    pokey909 committed Sep 4, 2011
  3. Fixed issues:

    - OutOfMemory caused by large link/page queues. Added thread safe ExtQueue class which swaps to disk when too much memory is consumed
    - Improved threading. Most worker threads kept idling when launched simultaneously
    
    Signed-off-by: Alexander Lenhardt <alenhard@techfak.uni-bielefeld.de>
    pokey909 committed Sep 4, 2011
Commits on Aug 31, 2011
  1. Temporary fix for OutOfMemory error.

    Occurs when crawling larges sites.
    
    Issue: link_queue grows faster than threads consume links.
    
    Fix: Wait until threads consumed enough links, then continue adding more to the queue.
    pokey909 committed Aug 31, 2011
Commits on Aug 12, 2011
  1. use bundler

    chriskite committed Aug 12, 2011
Commits on Apr 15, 2011
Commits on Mar 11, 2011
Commits on Mar 3, 2011
  1. add SQLite3 storage to README

    skojin committed Mar 3, 2011
  2. add SQLite3 storage

    skojin committed Mar 3, 2011
Commits on Feb 24, 2011
  1. Merge branch 'next'

    chriskite committed Feb 24, 2011
  2. bump version to 0.6.1

    chriskite committed Feb 24, 2011
Commits on Feb 22, 2011
Commits on Feb 17, 2011
  1. Merge branch 'next'

    chriskite committed Feb 17, 2011
  2. bump version to 0.6.0

    chriskite committed Feb 17, 2011