Permalink
Commits on Dec 25, 2012
Commits on Nov 22, 2012
  1. @jiminoc

    Merge pull request #58 from aurality/upstream

    Use line.separator to split Stopwords
    jiminoc committed Nov 22, 2012
Commits on Oct 18, 2012
  1. @erraggy

    raising our version to 2.1.22

    erraggy committed Oct 18, 2012
  2. @erraggy

    MAJOR: The changes included in this were made to significantly decrea…

    …se the number of exceptions thrown as well as increase overall performance with image detection.
    erraggy committed Oct 18, 2012
Commits on Oct 13, 2012
  1. @erraggy

    Fixed backwards compatibility

    erraggy committed Oct 13, 2012
  2. @erraggy

    Added the ability to completely override the object responsible for r…

    …etrieving HTML content for goose to parse.
    erraggy committed Oct 13, 2012
Commits on Jul 10, 2012
  1. @aurality

    Use line.separator from sys.props for newline

    Windows converts new lines to \n\r, but the split() is done on "\n" so
    all stopwords have a carraige return at the end and all tests fail since
    none of the words are matched.
    aurality committed Jul 10, 2012
Commits on Jun 13, 2012
  1. @erraggy

    Somehow having the generic scala.collection._ import cause an java.la…

    …ng.AbstractMethodError for a call to Set.empty[String]. This removes the wildcard to fix it.
    erraggy committed Jun 13, 2012
Commits on Jun 12, 2012
  1. @erraggy

    Issue #56 - Content not extracted from "article" tag: Adding 'article…

    …' as a possible root element for article content
    erraggy committed Jun 12, 2012
Commits on Apr 17, 2012
  1. @erraggy

    Now the ImageUtils.fetchEntity will also wrap inner exceptions with t…

    …he image source (URL) for better logging.
    erraggy committed Apr 17, 2012
  2. @erraggy

    Tightened up our usage of the apache http commons lib by replacing de…

    …precated calls and removed the default request retries from 3 to 0.
    
    Also added a little more improved logging on errors so that the URL should always be written for any exception during a crawl
    erraggy committed Apr 17, 2012
  3. @erraggy
Commits on Apr 13, 2012
  1. @erraggy

    Merge pull request #53 from amir343/4620db9325f9e6a880cc2d8b64646bdf7…

    …5b5dd82
    
    Hopefully the last pull request!
    
    Well... for at least this one feature anyway ;-)
    erraggy committed Apr 13, 2012
Commits on Apr 11, 2012
  1. @amir343
Commits on Apr 5, 2012
  1. @erraggy

    MEDIUM: From this point on, goose will only attempt to parse web resp…

    …onses with a 200 (OK) HTTP Status Code
    
     Also of note: The extraneous akka dependencies and repository have been removed from the pom file. #40 #47
    erraggy committed Apr 5, 2012
Commits on Mar 22, 2012
  1. @erraggy

    incrementing version

    erraggy committed Mar 22, 2012
  2. @erraggy

    Merge pull request #46 from john-kurkowski/master

    Canonicals must be non-empty
    erraggy committed Mar 22, 2012
  3. @john-kurkowski
Commits on Feb 24, 2012
  1. @erraggy

    MEDIUM: Refactored all Logging be run on singletons and not instances…

    … to reduce memory usage and possible memory leaks.
    erraggy committed Feb 24, 2012
Commits on Feb 13, 2012
  1. @erraggy

    Extending goose configuration to allow for custom ContentExtractor se…

    …tting. Also moving to version 2.1.11
    erraggy committed Feb 13, 2012
Commits on Jan 9, 2012
  1. @jiminoc

    Merge pull request #36 from sharat87/patch-1

    Fix markdown formatting for bullet points in Readme.
    jiminoc committed Jan 9, 2012
  2. @jiminoc

    MINOR - adding a couple of domains for known image elements and upgra…

    …ding how to pull a domain from a url, rolling to 2.1.10
    jiminoc committed Jan 9, 2012
  3. @erraggy

    Downgrading jsoup back to 1.5.2 due to memory leak in 1.6.1. Also upp…

    …ing goose to 2.1.9 to reflect this change.
    erraggy committed Jan 9, 2012
Commits on Jan 7, 2012
  1. @erraggy

    Reving up to version 2.1.8 to correct the fact that 2.1.7 was already…

    … released outside of github O_o.
    erraggy committed Jan 7, 2012
  2. @erraggy
  3. @erraggy
Commits on Jan 2, 2012
  1. @sharat87
Commits on Dec 12, 2011
  1. @erraggy

    versioning on up to 2.1.6

    erraggy committed Dec 12, 2011
  2. @erraggy
Commits on Nov 19, 2011
  1. @jiminoc
  2. @jiminoc
  3. @jiminoc

    MINOR: Upgrading to JSOUP 1.6.1

    jiminoc committed Nov 19, 2011
  4. @jiminoc

    MINOR - doing some code cleanup/refactoring, extracting out methods a…

    …nd converting vars to vals
    jiminoc committed Nov 19, 2011
  5. @jiminoc

    MINOR - removing unused code

    jiminoc committed Nov 19, 2011