Skip to content

HTTPS clone URL

Subversion checkout URL

You can clone with HTTPS or Subversion.

Download ZIP
Browse files

Stop using bundler, to make loadpdf more standalone. auto-install fas…

…tercsv for Ruby 1.8
  • Loading branch information...
commit 51d406b9d5a1d2924aca81e98f33500b8346b76b 1 parent 7eab737
@jstray jstray authored
Showing with 27 additions and 14 deletions.
  1. +0 −5 docloader/Gemfile
  2. +11 −7 loadpdf.sh
  3. +16 −2 preprocess.sh
View
5 docloader/Gemfile
@@ -1,5 +0,0 @@
-# A sample Gemfile
-source "https://rubygems.org"
-
-gem "rest-client"
-gem "json_pure"
View
18 loadpdf.sh
@@ -1,4 +1,4 @@
-#!/bin/sh
+#!/bin/bash
BASEDIR=`dirname $0`
RUBYDIR=$BASEDIR/docloader
@@ -13,12 +13,16 @@ then
exit
fi
-# do bundle install to download required gemfiles, if not already done
-if [ ! -f $RUBYDIR/Gemfile.lock ]; then
- pushd $RUBYDIR
- bundle install
- popd
-fi
+# install a gem if not already there
+function install_gem {
+ count=`gem list | grep $1 | wc -l`
+ if [ $count -ne 1 ]; then
+ gem install $1
+ fi
+}
+
+install_gem json_pure
+install_gem rest-client
# find pdf files in a directory, extract the text, convert to CSV
ruby -I $RUBYDIR $RUBYDIR/docloader.rb $1 -o $2.csv -r
View
18 preprocess.sh
@@ -1,8 +1,22 @@
-#!/bin/sh
+#!/bin/bash
BASEDIR=`dirname $0`
RUBYDIR=$BASEDIR/preprocessing
+# install a gem if not already there
+function install_gem {
+ count=`gem list | grep $1 | wc -l`
+ if [ $count -ne 1 ]; then
+ gem install $1
+ fi
+}
+
+# install fastercsv for Ruby 1.8
+count=`ruby -v | grep "ruby 1.9" | wc -l`
+if [ $count -ne 1 ]; then
+ intall_gem fastercsv
+fi
+
# Look for commonly occurring co-locations, and extract the top candidates.
# TODO: threshold for acceptance is hard-coded
ruby -I $RUBYDIR $RUBYDIR/find-bigrams.rb $1.csv $1-bigrams.csv
@@ -19,4 +33,4 @@ ruby -I $RUBYDIR $RUBYDIR/make-featurenames.rb $1-termlist.csv $1-featurenames.c
# Finally, extract URLs/document text for the Overview doc viewer window
ruby -I $RUBYDIR $RUBYDIR/make-urls.rb $1.csv $1-urls.csv
-
+
Please sign in to comment.
Something went wrong with that request. Please try again.