Skip to content

HTTPS clone URL

Subversion checkout URL

You can clone with HTTPS or Subversion.

Download ZIP
Commits on Dec 25, 2012
Commits on Nov 22, 2012
  1. Jim Plush

    Merge pull request #58 from aurality/upstream

    jiminoc authored
    Use line.separator to split Stopwords
Commits on Oct 18, 2012
  1. Robbie Coleman

    raising our version to 2.1.22

    erraggy authored
  2. Robbie Coleman

    MAJOR: The changes included in this were made to significantly decrea…

    erraggy authored
    …se the number of exceptions thrown as well as increase overall performance with image detection.
Commits on Oct 13, 2012
  1. Robbie Coleman

    Fixed backwards compatibility

    erraggy authored
  2. Robbie Coleman

    Added the ability to completely override the object responsible for r…

    erraggy authored
    …etrieving HTML content for goose to parse.
Commits on Jul 10, 2012
  1. aurality

    Use line.separator from sys.props for newline

    aurality authored
    Windows converts new lines to \n\r, but the split() is done on "\n" so
    all stopwords have a carraige return at the end and all tests fail since
    none of the words are matched.
Commits on Jun 13, 2012
  1. Robbie Coleman

    Somehow having the generic scala.collection._ import cause an java.la…

    erraggy authored
    …ng.AbstractMethodError for a call to Set.empty[String]. This removes the wildcard to fix it.
Commits on Jun 12, 2012
  1. Robbie Coleman

    Issue #56 - Content not extracted from "article" tag: Adding 'article…

    erraggy authored
    …' as a possible root element for article content
Commits on Apr 17, 2012
  1. Robbie Coleman

    Now the ImageUtils.fetchEntity will also wrap inner exceptions with t…

    erraggy authored
    …he image source (URL) for better logging.
  2. Robbie Coleman

    Tightened up our usage of the apache http commons lib by replacing de…

    erraggy authored
    …precated calls and removed the default request retries from 3 to 0.
    
    Also added a little more improved logging on errors so that the URL should always be written for any exception during a crawl
  3. Robbie Coleman
Commits on Apr 13, 2012
  1. Robbie Coleman

    Merge pull request #53 from amir343/4620db9325f9e6a880cc2d8b64646bdf7…

    erraggy authored
    …5b5dd82
    
    Hopefully the last pull request!
    
    Well... for at least this one feature anyway ;-)
Commits on Apr 11, 2012
  1. αmir μoulavi
Commits on Apr 5, 2012
  1. Robbie Coleman

    MEDIUM: From this point on, goose will only attempt to parse web resp…

    erraggy authored
    …onses with a 200 (OK) HTTP Status Code
    
     Also of note: The extraneous akka dependencies and repository have been removed from the pom file. #40 #47
Commits on Mar 22, 2012
  1. Robbie Coleman

    incrementing version

    erraggy authored
  2. Robbie Coleman

    Merge pull request #46 from john-kurkowski/master

    erraggy authored
    Canonicals must be non-empty
  3. John Kurkowski
Commits on Feb 24, 2012
  1. Robbie Coleman

    MEDIUM: Refactored all Logging be run on singletons and not instances…

    erraggy authored
    … to reduce memory usage and possible memory leaks.
Commits on Feb 13, 2012
  1. Robbie Coleman

    Extending goose configuration to allow for custom ContentExtractor se…

    erraggy authored
    …tting. Also moving to version 2.1.11
Commits on Jan 9, 2012
  1. Jim Plush

    Merge pull request #36 from sharat87/patch-1

    jiminoc authored
    Fix markdown formatting for bullet points in Readme.
  2. Jim Plush

    MINOR - adding a couple of domains for known image elements and upgra…

    jiminoc authored
    …ding how to pull a domain from a url, rolling to 2.1.10
  3. Robbie Coleman

    Downgrading jsoup back to 1.5.2 due to memory leak in 1.6.1. Also upp…

    erraggy authored
    …ing goose to 2.1.9 to reflect this change.
Commits on Jan 7, 2012
  1. Robbie Coleman

    Reving up to version 2.1.8 to correct the fact that 2.1.7 was already…

    erraggy authored
    … released outside of github O_o.
  2. Robbie Coleman
  3. Robbie Coleman
Commits on Jan 2, 2012
  1. Shrikant Sharat
Commits on Dec 12, 2011
  1. Robbie Coleman

    versioning on up to 2.1.6

    erraggy authored
  2. Robbie Coleman
Commits on Nov 19, 2011
  1. Jim Plush
  2. Jim Plush
  3. Jim Plush
  4. Jim Plush

    MINOR - doing some code cleanup/refactoring, extracting out methods a…

    jiminoc authored
    …nd converting vars to vals
  5. Jim Plush

    MINOR - removing unused code

    jiminoc authored
Something went wrong with that request. Please try again.