Skip to content

Commit

Permalink
Merge branch 'release/v3.2.0'
Browse files Browse the repository at this point in the history
  • Loading branch information
rautenberg committed Jul 13, 2012
2 parents 6dc3e74 + bc73e23 commit 9d0d964
Show file tree
Hide file tree
Showing 1,225 changed files with 77,734 additions and 494 deletions.
5 changes: 4 additions & 1 deletion .gitmodules
Expand Up @@ -3,4 +3,7 @@
url = git://github.com/amaunz/fminer2.git
[submodule "last-utils"]
path = last-utils
url = git://github.com/amaunz/last-utils.git
url = git://github.com/amaunz/last-utils.git
[submodule "bbrc-sample"]
path = bbrc-sample
url = git://github.com/amaunz/bbrc-sample
13 changes: 13 additions & 0 deletions ChangeLog
@@ -1,3 +1,16 @@
v4.0.0 2012-07-12
* bbrc-sample as submodule
* fminer feature datasets carry metadata (minfreq, nr_hits, ...)
* matching service uses now last-utils, calculates p-values
* fminer support for percentage and per-mil frequencies
* switch to opentox-ruby version 4.0.0

2012-04-20
* Dichotomy between nominal and numeric features removed, which allows for uniform handling of all descriptors
* Uniform interface /pc for feature generation (for PC descriptors)
* Uniform interface /fs for feature selection (using recursive feature elimination)
* min_sim for cosine similarity corrected

v3.1.0 2012-02-24
* lazar.rb: pc type parameter in model, cleaned all parameters, propositionalized learning only for SVM, switch for minimal training performance, removed conf_stdev
* fminer.rb: feature match service for datasets, also with number of hits
188 changes: 126 additions & 62 deletions README.md
Expand Up @@ -3,66 +3,104 @@ OpenTox Algorithm

- An [OpenTox](http://www.opentox.org) REST Webservice
- Implements the OpenTox algorithm API for
- fminer
- lazar
- subgraph descriptor calculation (fminer)
- physico-chemical descriptor calculation (pc) for more than 300 descriptors
- feature selection (fs) using recursive feature elimination (rfe)
- See [opentox-ruby on maunz.de](http://opentox-ruby.maunz.de) for high-level workflow documentation

REST operations
---------------

Get a list of all algorithms GET / - URIs of algorithms 200
Get a representation of the GET /fminer/ - fminer representation 200,404
fminer algorithms
Get a representation of the GET /fminer/bbrc - bbrc representation 200,404
DESCRIPTION TYPE ADDRESS ARGUMENTS RETURN TYPE RETURN CODE
Get a representation of the GET /lazar - lazar representation 200,404
lazar algorithm
Get a list of all algorithms GET / - URIs of algorithms 200
Get a representation of the GET /fminer/ - fminer representation 200,404
fminer algorithms
Get a representation of the GET /fminer/bbrc - bbrc representation 200,404
bbrc algorithm
Get a representation of the GET /fminer/last - last representation 200,404
last algorithm
Get a representation of the GET /lazar - lazar representation 200,404
lazar algorithm
Get a representation of the GET /feature_selection - feature selection representation 200,404
feature selection algorithms
Get a representation of the GET /feature_selection/rfe - rfe representation 200,404
rfe algorithm


Create bbrc features POST /fminer/bbrc dataset_uri, URI for feature dataset 200,400,404,500
feature_uri,
[min_frequency=5 per-mil],
[feature_type=trees],
[backbone=true],
[min_chisq_significance=0.95],
[nr_hits=false]
Create last features POST /fminer/last dataset_uri, URI for feature dataset 200,400,404,500
feature_uri,
[min_frequency=8 %],
[feature_type=trees],
[nr_hits=false]
Create lazar model POST /lazar dataset_uri, URI for lazar model 200,400,404,500
[prediction_feature],
[feature_generation_uri],
[prediction_algorithm],
[feature_dataset_uri],
[pc_type=null],
[nr_hits=false (class. using wt. maj. vote), true (else)],
[min_sim=0.3 (nominal), 0.4 (numeric features)]
[min_train_performance=0.1]

Create selected features POST /feature_selection/rfe dataset_uri, URI for dataset 200,400,404,500
prediction_feature,
feature_dataset_uri,
[del_missing=false]

Get a representation of the GET /fminer/last - last representation 200,404
last algorithm
Get a representation of the GET /pc - URIs of algorithms 200,404
pc algorithms
Get a representation of the GET /pc/<name> - descriptor representation 200,404
pc algorithm <name>
Get a representation of the GET /fs - URIs of algorithms 200,404
fs algorithms
Get a representation of the GET /fs/rfe - rfe representation 200,404
rfe algorithm
Create lazar model POST /lazar dataset_uri, URI for lazar model 200,400,404,500
[prediction_feature],
[feature_generation_uri],
[feature_dataset_uri],
[prediction_algorithm],
[pc_type=null],
[lib=null],
[nr_hits=false (cl+wmv),
true (else)],
[min_sim=0.3 (nominal), 0.4
(numeric features)],
[min_train_performance=0.1]
Create bbrc features POST /fminer/bbrc dataset_uri, URI for feature dataset 200,400,404,500
prediction_feature,
[min_frequency=5 per-mil],
[feature_type=trees],
[backbone=true],
[min_chisq_significance=0.95],
[nr_hits=false]
Create last features POST /fminer/last dataset_uri, URI for feature dataset 200,400,404,500
prediction_feature,
[min_frequency=8 %],
[feature_type=trees],
[nr_hits=false]
Create features POST /pc/AllDescriptors dataset_uri, URI for dataset 200,400,404,500
[pc_type=constitutional,
topological,geometrical,
electronic,cpsa,hybrid],
[lib=cdk,joelib,openbabel]
Create feature POST /pc/<name> dataset_uri URI for dataset 200,400,404,500
Select features POST /fs/rfe dataset_uri, URI for dataset 200,400,404,500
prediction_feature,
feature_dataset_uri,
[del_missing=false]

Synopsis
--------

- prediction\_algorithm: One of "weighted\_majority\_vote" (default for classification), "local\_svm\_classification", "local\_svm\_regression" (default for regression). "weighted\_majority\_vote" is not applicable for regression.
- pc_type: Mandatory for feature dataset, one of [geometrical, topological, electronic, constitutional, hybrid, cpsa].
- nr_hits: Whether nominal features should be instantiated with their occurrence counts in the instances. One of "true", "false".
- min_sim: The minimum similarity threshold for neighbors. Numeric value in [0,1].
- min_train_performance. The minimum training performance for "local\_svm\_classification" (Accuracy) and "local\_svm\_regression" (R-squared). Numeric value in [0,1].
- del_missing: one of true, false
- *del_missing*: one of
- *true*
- *false*

- *feature\_type*: Type of subgraphs when no feature dataset is supplied, one of
- *trees*
- *paths*

- *lib*: Mandatory for feature datasets that do not contain appropriate feature metadata, one of
- *cdk*
- *openbabel*
- *joelib*

- *min_sim*: The minimum similarity threshold for neighbors. Numeric value in [0,1].

- *min_train_performance*. The minimum training performance for *local\_svm\_classification* (Accuracy) and *local\_svm\_regression* (R-squared). Numeric value in [0,1].

See http://www.maunz.de/wordpress/opentox/2011/lazar-models-and-how-to-trigger-them for a graphical overview.
- *nr_hits*: Whether nominal features should be instantiated with their occurrence counts in the instances. One of
- *true*
- *false*

- *pc_type*: Mandatory for feature datasets that do not contain appropriate feature metadata, one of
- *geometrical*
- *topological*
- *electronic*
- *constitutional*
- *hybrid*
- *cpsa*

- *prediction\_algorithm*: One of
- *weighted\_majority\_vote* (default for classification, n.a. for regression)
- *local\_svm\_classification*
- *local\_svm\_regression* (default for regression).


Supported MIME formats
Expand All @@ -76,17 +114,39 @@ Examples

NOTE: http://webservices.in-silico.ch hosts the stable version that might not have complete functionality yet. **Please try http://ot-test.in-silico.ch** for latest versions.

### Get the OWL-DL representation of lazar

curl http://webservices.in-silico.ch/algorithm/lazar

### Get the OWL-DL representation of fminer

curl http://webservices.in-silico.ch/algorithm/fminer

### Get the OWL-DL representation of lazar
### Get the OWL-DL representation of pc

curl http://webservices.in-silico.ch/algorithm/lazar
curl http://webservices.in-silico.ch/algorithm/pc

### Get the OWL-DL representation of fs

curl http://webservices.in-silico.ch/algorithm/fs

* * *

The following creates datasets with backbone refinement class representatives or latent structure patterns, using supervised graph mining, see http://cs.maunz.de. These features can be used e.g. as structural alerts, as descriptors (fingerprints) for prediction models or for similarity calculations.
### Create lazar model

Creates a standard Lazar model with subgraph descriptors.

curl -X POST -d dataset_uri={datset_uri} -d prediction_feature={feature_uri} -d feature_generation_uri=http://webservices.in-silico.ch/algorithm/fminer/bbrc http://webservices.in-silico.ch/test/algorithm/lazar

Creates a Lazar model with physico-chemical descriptors.

curl -X POST -d dataset_uri={datset_uri} -d prediction_feature={feature_uri} -d feature_dataset_uri={feature_dataset_uri} http://webservices.in-silico.ch/test/algorithm/lazar

feature_uri specifies the dependent variable from the dataset.

* * *

Creates subgraph descriptors with backbone refinement class representatives or latent structure patterns, using supervised graph mining, see http://cs.maunz.de. These features can be used e.g. as structural alerts, as descriptors (fingerprints) for prediction models or for similarity calculations.

### Create the full set of frequent and significant subtrees

Expand All @@ -101,30 +161,34 @@ backbone=false reduces BBRC mining to frequent and correlated subtree mining (mu

feature_uri specifies the dependent variable from the dataset.
Adding -d nr_hits=true produces frequency counts per pattern and molecule.
Please click [here](http://bbrc.maunz.de#usage) for more guidance on usage.
Click [here](http://bbrc.maunz.de#usage) for more guidance on usage.

### Create [LAST-PM](http://last-pm.maunz.de) descriptors, recommended for small to medium-sized datasets.

curl -X POST -d dataset_uri={datset_uri} -d prediction_feature={feature_uri} -d min_frequency={min_frequency} http://webservices.in-silico.ch/algorithm/fminer/last

feature_uri specifies the dependent variable from the dataset.
Adding -d nr_hits=true produces frequency counts per pattern and molecule.
Please click [here](http://last-pm.maunz.de#usage) for guidance for more guidance on usage.
Click [here](http://last-pm.maunz.de#usage) for guidance for more guidance on usage.

* * *

### Create lazar model
* * *

Creates a standard Lazar model.
### Create a feature dataset of physico-chemical descriptors with CDK

curl -X POST -d dataset_uri={datset_uri} -d prediction_feature={feature_uri} -d feature_generation_uri=http://webservices.in-silico.ch/algorithm/fminer/bbrc http://webservices.in-silico.ch/test/algorithm/lazar
curl -X POST -d dataset_uri={dataset_uri} -d lib=cdk http://webservices.in-silico.ch/test/algorithm/pc/AllDescriptors

[API documentation](http://rdoc.info/github/opentox/algorithm)
--------------------------------------------------------------
lib specifies the library to use.

* * *

### Create a feature dataset of selected features
curl -X POST -d dataset_uri={dataset_uri} -d prediction_feature_uri={prediction_feature_uri} -d feature_dataset_uri={feature_dataset_uri} -d del_missing=true http://webservices.in-silico.ch/test/algorithm/feature_selection/rfe
### Select features from a feature dataset

curl -X POST -d dataset_uri={dataset_uri} -d prediction_feature={feature_uri} -d feature_dataset_uri={feature_dataset_uri} http://webservices.in-silico.ch/test/algorithm/fs/rfe

feature_uri specifies the dependent variable from the dataset.

* * *

Copyright (c) 2009-2011 Christoph Helma, Martin Guetlein, Micha Rautenberg, Andreas Maunz, David Vorgrimmler, Denis Gebele. See LICENSE for details.

36 changes: 26 additions & 10 deletions application.rb
@@ -1,17 +1,33 @@
# Java Klimbim
ENV["JAVA_HOME"] = "/usr/lib/jvm/java-6-sun" unless ENV["JAVA_HOME"]
ENV["JOELIB2"] = File.join File.expand_path(File.dirname(__FILE__)),"java"
deps = []
deps << "#{ENV["JAVA_HOME"]}/lib/tools.jar"
deps << "#{ENV["JAVA_HOME"]}/lib/classes.jar"
deps << "#{ENV["JOELIB2"]}"
jars = Dir[ENV["JOELIB2"]+"/*.jar"].collect {|f| File.expand_path(f) }
deps = deps + jars
ENV["CLASSPATH"] = deps.join(":")

require 'rubygems'
# AM LAST: can include both libs, no problems
require File.join(File.expand_path(File.dirname(__FILE__)), 'libfminer/libbbrc/bbrc') # has to be included before openbabel, otherwise we have strange SWIG overloading problems
require File.join(File.expand_path(File.dirname(__FILE__)), 'libfminer/liblast/last') # has to be included before openbabel, otherwise we have strange SWIG overloading problems
require File.join(File.expand_path(File.dirname(__FILE__)), 'last-utils/lu.rb') # AM LAST
gem "opentox-ruby", "~> 3"


# fminer libs to be included before openbabel, otherwise strange SWIG overloading problems
require File.join(File.expand_path(File.dirname(__FILE__)), 'libfminer/libbbrc/bbrc')
require File.join(File.expand_path(File.dirname(__FILE__)), 'libfminer/liblast/last')
require File.join(File.expand_path(File.dirname(__FILE__)), 'last-utils/lu.rb')

gem "opentox-ruby", "~> 4"
require 'opentox-ruby'
require 'rjb'
require 'rinruby'


#require 'smarts.rb'
#require 'similarity.rb'
require 'openbabel.rb'
# main
require 'fminer.rb'
require 'lazar.rb'
require 'feature_selection.rb'
require 'fs.rb'
require 'pc.rb'

set :lock, true

Expand All @@ -23,7 +39,7 @@
#
# @return [text/uri-list] algorithm URIs
get '/?' do
list = [ url_for('/lazar', :full), url_for('/fminer/bbrc', :full), url_for('/fminer/last', :full), url_for('/feature_selection/rfe', :full) ].join("\n") + "\n"
list = [ url_for('/lazar', :full), url_for('/fminer/bbrc', :full), url_for('/fminer/bbrc/sample', :full), url_for('/fminer/last', :full), url_for('/fminer/bbrc/match', :full), url_for('/fminer/last/match', :full), url_for('/feature_selection/rfe', :full), url_for('/pc', :full) ].join("\n") + "\n"
case request.env['HTTP_ACCEPT']
when /text\/html/
content_type "text/html"
Expand Down

0 comments on commit 9d0d964

Please sign in to comment.