web app for parsing citations
Pull request Compare This branch is 28 commits ahead of miriam:master.
Failed to load latest commit information.
app Update to location of newer source repo. Apr 16, 2009
db update schema Aug 18, 2008
doc initial commit May 12, 2008
public Add the Ubiquity command Apr 16, 2009
script upgraded to rails 2.1 Jul 9, 2008
test Whitespace cleanups. Jan 23, 2009
.gitignore ignore emacs tilde files Aug 18, 2008
Capfile initial capistrano files. not working yet. Aug 12, 2008
LICENSE add a copy of the MIT license Aug 12, 2008
README change references to old working title to FreeCite Aug 12, 2008


# Install instructions from a SuSE 10.1 development environment

== Install required packages ==

wget http://superb-west.dl.sourceforge.net/sourceforge/crfpp/CRF++-0.47.tar.gz
tar xvzf CRF++-0.47.tar.gz
cd CRF++-0.47
./configure && make && sudo make install
cd CRF++-0.47/ruby
ruby extconf.rb
sudo make install

If you are running fedora, you will have to take these additional steps:
echo "/usr/local/lib" > /etc/ld.so.conf.d/default-i386.conf

sudo gem install -v=2.1 rails
gem install ruby-postgres

== Configure Postgres ==

sudo gem install ruby-postgres

# as postgres

pg_ctl -D /var/lib/pgsql/data -l logfile start
/sbin/chkconfig postgresql on

# create user, answer no to all questions
createuser freecite
ALTER USER freecite PASSWORD 'changemetosomethingmoresecure';
CREATE DATABASE freecite_development;
CREATE DATABASE freecite_production;
CREATE DATABASE freecite_test;
GRANT ALL ON DATABASE "freecite_development" TO freecite;
GRANT ALL ON DATABASE "freecite_production" TO freecite;
GRANT ALL ON DATABASE "freecite_test" TO freecite;

== Train the model == 

# A trained model is checked in to the repo, so this step is not necessary 
# unless you are making local changes. 
> rake crfparser:train_model

== Using the API ==
 * (required argument) citation or citation[] - a citation or array of 
   citations to parse
 * Accept: text/xml or text/html

If accepting xml, returns the parsed citation string and resulting context 
object in xml format. 
If accepting html returns the parsed citation string as spans.


# in ruby:
require 'net/http'

Net::HTTP.start('localhost', 3000) do |http|
  response = http.post('/citations/create',
    'citation=A. Bookstein and S. T. Klein,   Detecting content-bearing words by serial clustering,   Proceedings of the Nineteenth Annual International ACM SIGIR Conference on Research and Development in Information Retrieval,   pp. 319327,   1995.',
    'Accept' => 'text/xml')

  puts "Code: #{response.code}"
  puts "Message: #{response.message}"
  puts "Body:\n #{response.body}"

# with curl
curl -H 'Accept: text/xml' -d "citation[]=Fielderman, A., Silvester, G., Gatsonis, C.A., Hoenig, J., Flynn, S. Prognostic significance of flow cytometric DNA analysis and proliferative index in stage I non-small cell lung cancer. American Review of Respiratory Disease, 1992; 146:707-710.&citation[]=Udvarhelyi, I.S., Gatsonis, C.A., Epstein, A.M., Pashos, C.L., Newhouse, J.P. and McNeil, B.J. Acute Myocardial Infarction in the Medicare population: process of care and clinical outcomes. Journal of the American Medical Association, 1992; 18:2530-2536.  "

== Using the Web App ==

Enter a citation, or a list of citations separated by newlines.
Should be self-explanitory.

== Welcome to Rails

Rails is a web-application and persistence framework that includes everything
needed to create database-backed web-applications according to the
Model-View-Control pattern of separation. This pattern splits the view (also
called the presentation) into "dumb" templates that are primarily responsible
for inserting pre-built data in between HTML tags. The model contains the
"smart" domain objects (such as Account, Product, Person, Post) that holds all
the business logic and knows how to persist themselves to a database. The
controller handles the incoming requests (such as Save New Account, Update
Product, Show Post) by manipulating the model and directing data to the view.

In Rails, the model is handled by what's called an object-relational mapping
layer entitled Active Record. This layer allows you to present the data from
database rows as objects and embellish these data objects with business logic
methods. You can read more about Active Record in 

The controller and view are handled by the Action Pack, which handles both
layers by its two parts: Action View and Action Controller. These two layers
are bundled in a single package due to their heavy interdependence. This is
unlike the relationship between the Active Record and Action Pack that is much
more separate. Each of these packages can be used independently outside of
Rails.  You can read more about Action Pack in 

== Getting started

1. Start the web server: <tt>ruby script/server</tt> (run with --help for options)
2. Go to http://localhost:3000/ and get "Welcome aboard: You’re riding the Rails!"
3. Follow the guidelines to start developing your application

== Web servers

Rails uses the built-in web server in Ruby called WEBrick by default, so you don't
have to install or configure anything to play around. 

If you have lighttpd installed, though, it'll be used instead when running script/server.
It's considerably faster than WEBrick and suited for production use, but requires additional
installation and currently only works well on OS X/Unix (Windows users are encouraged
to start with WEBrick). We recommend version 1.4.11 and higher. You can download it from

If you want something that's halfway between WEBrick and lighttpd, we heartily recommend
Mongrel. It's a Ruby-based web server with a C-component (so it requires compilation) that
also works very well with Windows. See more at http://mongrel.rubyforge.org/.

But of course its also possible to run Rails with the premiere open source web server Apache.
To get decent performance, though, you'll need to install FastCGI. For Apache 1.3, you want
to use mod_fastcgi. For Apache 2.0+, you want to use mod_fcgid.

See http://wiki.rubyonrails.com/rails/pages/FastCGI for more information on FastCGI.

== Example for Apache conf

  <VirtualHost *:80>
    ServerName rails
    DocumentRoot /path/application/public/
    ErrorLog /path/application/log/server.log
    <Directory /path/application/public/>
      Options ExecCGI FollowSymLinks
      AllowOverride all
      Allow from all
      Order allow,deny

NOTE: Be sure that CGIs can be executed in that directory as well. So ExecCGI
should be on and ".cgi" should respond. All requests from go
through CGI, so no Apache restart is necessary for changes. All other requests
go through FCGI (or mod_ruby), which requires a restart to show changes.

== Debugging Rails

Have "tail -f" commands running on both the server.log, production.log, and
test.log files. Rails will automatically display debugging and runtime
information to these files. Debugging info will also be shown in the browser
on requests from

== Breakpoints

Breakpoint support is available through the script/breakpointer client. This
means that you can break out of execution at any point in the code, investigate
and change the model, AND then resume execution! Example:

  class WeblogController < ActionController::Base
    def index
      @posts = Post.find_all
      breakpoint "Breaking out from the list"
So the controller will accept the action, run the first line, then present you
with a IRB prompt in the breakpointer window. Here you can do things like:

Executing breakpoint "Breaking out from the list" at .../webrick_server.rb:16 in 'breakpoint'

  >> @posts.inspect
  => "[#<Post:0x14a6be8 @attributes={\"title\"=>nil, \"body\"=>nil, \"id\"=>\"1\"}>, 
       #<Post:0x14a6620 @attributes={\"title\"=>\"Rails you know!\", \"body\"=>\"Only ten..\", \"id\"=>\"2\"}>]"
  >> @posts.first.title = "hello from a breakpoint"
  => "hello from a breakpoint"

...and even better is that you can examine how your runtime objects actually work:

  >> f = @posts.first 
  => #<Post:0x13630c4 @attributes={"title"=>nil, "body"=>nil, "id"=>"1"}>
  >> f.
  Display all 152 possibilities? (y or n)

Finally, when you're ready to resume execution, you press CTRL-D

== Console

You can interact with the domain model by starting the console through script/console. 
Here you'll have all parts of the application configured, just like it is when the
application is running. You can inspect domain models, change values, and save to the
database. Starting the script without arguments will launch it in the development environment.
Passing an argument will specify a different environment, like <tt>script/console production</tt>.

To reload your controllers and models after launching the console run <tt>reload!</tt>

== Description of contents

  Holds all the code that's specific to this particular application.

  Holds controllers that should be named like weblog_controller.rb for
  automated URL mapping. All controllers should descend from

  Holds models that should be named like post.rb.
  Most models will descend from ActiveRecord::Base.
  Holds the template files for the view that should be named like
  weblog/index.rhtml for the WeblogController#index action. All views use eRuby
  syntax. This directory can also be used to keep stylesheets, images, and so on
  that can be symlinked to public.
  Holds view helpers that should be named like weblog_helper.rb.

  Holds API classes for web services.

  Configuration files for the Rails environment, the routing map, the database, and other dependencies.

  Self-contained mini-applications that can bundle together controllers, models, and views.

  Contains the database schema in schema.rb.  db/migrate contains all
  the sequence of Migrations for your schema.

  Application specific libraries. Basically, any kind of custom code that doesn't
  belong under controllers, models, or helpers. This directory is in the load path.
  The directory available for the web server. Contains subdirectories for images, stylesheets,
  and javascripts. Also contains the dispatchers and the default HTML files.

  Helper scripts for automation and generation.

  Unit and functional tests along with fixtures.

  External libraries that the application depends on. Also includes the plugins subdirectory.
  This directory is in the load path.