Juicer is a web API for extracting text, meta data and named entities from HTML "article" type pages.
Scala HTML Shell Makefile Ruby CSS Other
Switch branches/tags
Nothing to show
Clone or download
Fetching latest commit…
Cannot retrieve the latest commit at this time.
Permalink
Failed to load latest commit information.
juicer-service/src
juicer-web/src
packaging
project
.gitignore
LICENSE.md
Procfile
README.markdown

README.markdown

Juicer

Juicer is a web API for extracting text, meta data and named entities from HTML "article" type pages.

For more info visit: http://juicer.herokuapp.com/

Install

On debian / ubuntu

Install from our PPA

sudo add-apt-repository ppa:juicer-server/juicer
sudo apt-get update
sudo apt-get install juicer

This will start juicer running as a service on port 9000, try http://localhost:9000

sudo service juicer start|stop|restart|status

You can run the server directly also:

usage: juicer-server [options]

options:
  -h, --help         Print this message and exit
  --version          Print program version and exit
  --port       PORT  Port to run on, default 9000
  --pidfile    FILE  Write process pid to pidfile
  --logfile    FILE  Write logs to logfile
  --daemonize        Daemonize and exit

See also man juicer-server

On Heroku

  • Clone the repo
  • Install the Heroku tools; be sure heroku is on your path
  • Type these commands inside the application's git clone:
    • heroku create -s cedar --buildpack http://github.com/heroku/heroku-buildpack-scala.git
    • git push heroku master
    • heroku open

Using a fat jar

Build a fat jar for use anywhere ...

  • Run sbt assembly
  • Run java -Xmx1g -jar juicer-web/target/scala-2.9.1/juicer-web-assembly-1.0.jar

Running the app (Developers)

  • Install sbt brew install sbt
  • Run sbt test to test the app (sbt must be sbt 0.11, not 0.7)
  • Run sbt stage to stage the app
  • Run JAVA_OPTS="$JAVA_OPTS -Xmx1g" juicer-web/target/start to run the server
  • Now open http://localhost:8080 in a browser