Skip to content

HTTPS clone URL

Subversion checkout URL

You can clone with HTTPS or Subversion.

Download ZIP
branch: master
Commits on Apr 8, 2015
  1. @ssesha

    Merge pull request #6 from ind9/CookieCrawlFix

    ssesha authored
    Fix for cookie-based crawling
Commits on Apr 7, 2015
  1. @manojlds

    Set cookie path

    manojlds authored
  2. @manojlds
  3. @addnab

    Fix for cookie-based crawling

    addnab authored
Commits on Mar 19, 2015
  1. @manojlds
  2. @manojlds

    Trusting the ssl certificate by default

    manojlds authored
    Back ported from main crawler4j
    
    https://github.com/yasserg/crawler4j/blob/70fe6f1942427d2c054b50ad8a924b
    0e6c4beba3/src/main/java/edu/uci/ics/crawler4j/fetcher/PageFetcher.java#
    L91
Commits on Dec 11, 2014
  1. @vinothkr

    Updating version

    vinothkr authored
  2. @vinothkr

    Ideally we should just reuse the client. But the crawler does need a …

    vinothkr authored
    …different site-config (may be change the contract). We can atleast be good citizens and shut it down, it may reduce number of CLOSE_WAITs probably
Commits on Dec 10, 2014
  1. @ashwanthkumar

    bumping the crawler4j version

    ashwanthkumar authored
    contains the workaround for meta refresh property
  2. @ashwanthkumar

    workaround for extracting meta refresh property

    ashwanthkumar authored
    for some reason TIKA's HTMLParser is capturing http-equiv as name property in meta tags. Upgraded to latest TIKA that didn't help either.
    
    Added a test for now, will need to look into it
Commits on Dec 9, 2014
  1. @ashwanthkumar
  2. @ashwanthkumar
  3. @ashwanthkumar
Commits on Sep 6, 2014
  1. @phoenix24
  2. @phoenix24
Commits on Aug 19, 2014
  1. @sattiwari

    Updated version

    sattiwari authored
  2. @sattiwari

    Revert "Revert "removed thread-pools/pools-client-manager from page-f…

    sattiwari authored
    …etcher.""
    
    This reverts commit 03072ac.
Commits on Aug 13, 2014
  1. @sattiwari

    Updating version

    sattiwari authored
  2. @sattiwari
Commits on Jul 10, 2014
  1. @sattiwari

    Updating version

    sattiwari authored
  2. @sattiwari

    Add link tag to end element

    sattiwari authored
  3. @sattiwari
  4. @sattiwari

    Updating version

    sattiwari authored
  5. @sattiwari
  6. @sattiwari
Commits on Jul 9, 2014
  1. @sattiwari

    Updaing version

    sattiwari authored
  2. @sattiwari
Commits on May 14, 2014
  1. @ashwanthkumar
  2. @ashwanthkumar
Commits on Apr 30, 2014
  1. @phoenix24

    Merge pull request #1 from ind9/refactoring

    phoenix24 authored
    removed thread-pools/pools-client-manager from page-fetcher.
Commits on Apr 29, 2014
  1. @phoenix24

    bumped up crawler4j version.

    phoenix24 authored
  2. @phoenix24
Commits on Apr 25, 2014
  1. @phoenix24
Commits on Apr 23, 2014
  1. @phoenix24
  2. @phoenix24
Something went wrong with that request. Please try again.