Skip to content
This repository

opacmo is a mash-up of biomedical objects linked to the open-access subset of PubMed Central.

branch: master
README.markdown

opacmo

opacmo logo

opacmo is the Open Access Mortar — a mash-up of biomedical objects linked to the open-access subset of PubMed Central.

Text-Mining

Running on a Desktop Computer

Text-mining the complete open access subset of PubMed Central can take a while on a single machine, but it can be done. Supported operating systems are Mac OS X and Linux.

mkdir opacmo_release ; cd opacmo_release
git clone git://github.com/joejimbo/opacmo.git
git clone git://github.com/joejimbo/bioknack.git
opacmo/make_opacmo.sh all 2>&1 | tee MAKE_OPACMO_LOG

Running on a Oracle Grid Engine Cluster

Prepare a 'bundle' on a cluster node with internet access or your desktop computer, which can later be extracted and executed on a Oracle Grid Engine (former Sun Grid Engine) cluster.

mkdir opacmo_release ; cd opacmo_release
git clone git://github.com/joejimbo/opacmo.git
git clone git://github.com/joejimbo/bioknack.git
opacmo/make_opacmo.sh bundle 2>&1 | tee MAKE_OPACMO_BUNDLE_LOG

Bundling can take a very long time, because it involves downloading all open access publications of PubMed Central, downloading several biomedical databases/ontologies, and preprocessing the latter for the text-mining run. The bundle itself will be quite large too (>25G).

Now transfer the bundle to the cluster, log-in to the cluster, extract the bundle and continue the processing on the cluster. It is important that the chosen path below is accessible (read/write) to all cluster nodes, which is usually the case with your home directory on the cluster or a designated shared mount point.

scp bundle.tar username@nodeX.yourdomain:/path/opacmo_release
ssh username@nodeX.yourdomain
cd /path/opacmo_release
tar xf bundle.tar
screen -DR opacmo_release
opacmo/make_opacmo.sh sge 2>&1 | tee MAKE_OPACMO_CLUSTER_LOG

Running opacmo/make_opacmo.sh sge requires up to 8GB memory for postprocessing tasks. It might be necessary to request this amount of memory explicitly on your cluster by logging in to a node with qrsh -l h_vmem=8G.

Specific output of grid engine jobs is written into the respective fork_* directories as opacmo.*.{e,o}*. The actual text-mining output is written into the directory opacmo_data.

Running on Amazon Elastic Compute Cloud

Important: You are running opacmo on Amazon's Elastic Compute Cloud at your own financial risk. If you are not familiar with Amazon's billing procedures, do not use opacmo's cloud computing pipeline.

Installation and Configuration of EC2 API Tools
wget 'http://s3.amazonaws.com/ec2-downloads/ec2-api-tools.zip'
unzip ec2-api-tools.zip
export PATH=$PATH:`pwd`/ec2-api-tools-1.6.0.0/bin
export EC2_HOME=`pwd`/ec2-api-tools-1.6.0.0
# See: https://portal.aws.amazon.com/gp/aws/securityCredentials
# AWS_ACCOUNT_ID looks like '1234-5678-9012'
# AWS_ACCESS_KEY looks like 'BUE920...', 20 characters
# AWS_SECRET_KEY looks like 'EsfW2R...', >20 characters
export AWS_ACCOUNT_ID=...
export AWS_ACCESS_KEY=...
export AWS_SECRET_KEY=...
Executing the Text-Mining Pipeline on EC2 instances
opacmo/make_opacmo.sh ec2spot | tee LOCAL_LOG

If you rather prefer the more expensive, but more reliable "on-demand" instances, use the parameter ec2ondemand instead of ec2spot above.

Moving Text-Mining Results Off the Cloud
scp -i KEY.pem ec2-user@CACHE_INSTANCE_URL:/media/ephemeral0/{ftp/uploads/*,pipeline/*.tar,pipeline/{C*,V*}} .
for archive in worker_opacmo_data_*.tar.gz ; do
    tar xzf $archive ; for batch in opacmo_data/[0-9]* ; do
        mv $batch "opacmo_data/`basename $archive .tar.gz | sed 's/worker_opacmo_data_//'`_`basename $batch`"
    done
done
tar xf cache_labels*.tar
Loading Text-Mining Results into Yoctogi

This step is optional and only of relevance when you are importing the text-mining results into a Yoctogi database.

scp -r -i KEY.pem ec2-user@CACHE_INSTANCE_URL:/media/ephemeral0/pipeline/dictionaries .
scp -i KEY.pem ec2-user@CACHE_INSTANCE_URL:/media/ephemeral0/pipeline/tmp/species tmp
opacmo/make_opacmo.sh tsv
opacmo/make_opacmo.sh yoctogi
sudo su postgres
opacmo/load_opacmo.sh

Java Installation on Mac OS X

Amazon's EC2 tools make use of Java, for which it is necessary to install a Java Runtime Environment on Mac OS X. The automatic installation process can be invoked by executing the following command line:

/usr/bin/java

A dialogue will pop-up for installing Java Runtime Environment.

After installing the environment, the following shell variable needs to be set for the EC2 tools to work correctly:

export JAVA_HOME=`/System/Library/Frameworks/JavaVM.framework/Versions/Current/Commands/java_home`

Database & Web-Server Set-Up

PostgreSQL

Install PostgreSQL 8.3 or newer, then -- on a Debian distro -- do

sudo su - postgres
createuser yoctogi
Shall the new role be a superuser? (y/n) n
Shall the new role be allowed to create databases? (y/n) n
Shall the new role be allowed to create more new roles? (y/n) n
createdb yoctogi
load_opacmo.sh
psql yoctogi
ALTER USER yoctogi WITH PASSWORD 'yoctogi';
GRANT SELECT ON yoctogi TO yoctogi;
^D

The load_opacmo.sh command expects the Yoctogi TSV files that have been generated in the directory ./opacmo_data.

lighttpd

Install and set up lighttpd. opacmo uses Yoctogi as backend, which requires FastCGI support.

sudo apt-get install libfcgi-dev
sudo gem install ruby-fcgi

Workarounds

On Debian 5.0 you have to cheat a little bit to get the FastCGI going.

cd /usr/include
sudo ln -s ruby-1.9.0 ruby-1.9.1

Acknowledgements

Contributors in alphabetical order:

  • Kenneth Chu. Beta Testing.
  • Miyuki Fukuma. Web-Design Consulting & CSS Coding.

Licenses

opacmo's source code is licensed under the MIT License. opacmo's art work is licensed under the Creative Commons Attribution-NonCommercial-NoDerivs 3.0 license.

The PNG-files in

  • html/images/blue
  • html/images/cyan
  • html/images/grey_light
  • html/images/red

are part of the Iconic minimal set of icons by P.J. Onori. These icons are licenced under the Creative Commons Attribution-ShareAlike 3.0 United States license.

The spinner image html/images/ajax-loader.gif was created by ajaxload.info.

Something went wrong with that request. Please try again.