Skip to content

Commit

Permalink
Merge pull request #2984 from iglesias/develop
Browse files Browse the repository at this point in the history
Revert "added license files" (bringing README.md back to life)
  • Loading branch information
karlnapf committed Feb 19, 2016
2 parents f828a96 + 576e9ad commit 46eace3
Show file tree
Hide file tree
Showing 3 changed files with 213 additions and 32 deletions.
10 changes: 0 additions & 10 deletions AUTHORS.md

This file was deleted.

21 changes: 0 additions & 21 deletions LICENSE.md

This file was deleted.

214 changes: 213 additions & 1 deletion README.md
Original file line number Diff line number Diff line change
@@ -1 +1,213 @@
# ModernAnalytics
# The SHOGUN machine learning toolbox
-------------------------------------

Develop branch build status:

[![Build Status](https://travis-ci.org/shogun-toolbox/shogun.png?branch=develop)](https://travis-ci.org/shogun-toolbox/shogun)
[![Coverage Status](https://coveralls.io/repos/shogun-toolbox/shogun/badge.png?branch=develop)](https://coveralls.io/r/shogun-toolbox/shogun?branch=develop)

Buildbot: http://buildbot.shogun-toolbox.org/waterfall.

Quick links to this file:

* [Quickstart](doc/md/QUICKSTART.md)
* [Introduction](#introduction)
* [Interfaces](#interfaces)
* [Platforms](#platforms)
* [Contents](#contents)
* [Applications](#applications)
* [License](#license)
* [Download](#download)
* [References](#references)

Other links that may be useful:

* See [INSTALL](doc/md/INSTALL.md) for first steps on installation and running SHOGUN.
* See [README.developer](doc/md/README_developer.md) for the developer documentation.
* See [README.data](doc/md/README_data.md) for how to download example data sets accompanying SHOGUN.
* See [README.cmake](doc/md/README_cmake.md) for setting particular build options with SHOGUN and cmake.

## Introduction
---------------

The machine learning toolbox's focus is on large scale kernel methods and
especially on Support Vector Machines (SVM) [1]. It provides a generic SVM
object interfacing to several different SVM implementations, among them the
state of the art LibSVM [2] and SVMlight [3]. Each of the SVMs can be
combined with a variety of kernels. The toolbox not only provides efficient
implementations of the most common kernels, like the Linear, Polynomial,
Gaussian and Sigmoid Kernel but also comes with a number of recent string
kernels as e.g. the Locality Improved [4], Fischer [5], TOP [6], Spectrum [7],
Weighted Degree Kernel (with shifts) [8, 9, 10]. For the latter the efficient
LINADD [10] optimizations are implemented. Also SHOGUN offers the freedom of
working with custom pre-computed kernels. One of its key features is the
*combined kernel* which can be constructed by a weighted linear combination
of a number of sub-kernels, each of which not necessarily working on the same
domain. An optimal sub-kernel weighting can be learned using Multiple Kernel
Learning [11, 12, 16]. Currently SVM 2-class classification and regression problems can be dealt
with. However SHOGUN also implements a number of linear methods like Linear
Discriminant Analysis (LDA), Linear Programming Machine (LPM), (Kernel)
Perceptrons and features algorithms to train hidden markov models.
The input feature-objects can be dense, sparse or strings, and
of types int/short/double/char. In addition, they can be converted into different feature types.
Chains of *preprocessors* (e.g. substracting the mean) can be attached to
each feature object allowing for on-the-fly pre-processing.

Shogun got initiated by Soeren Sonnenburg and Gunnar Raetsch (thats where the
name ShoGun originates from). It is now developed by a much larger Team
cf. [AUTHORS](doc/md/AUTHORS.md) and would not have been possible without the patches
and bug reports by various people and by the various authors of other machine
learning packages that we utilize. See [CONTRIBUTIONS](doc/md/CONTRIBUTIONS.md) for
a detailled list.

## Interfaces
-------------

SHOGUN is implemented in C++ and interfaces to Matlab(tm), R, Octave,
Java, C#, Ruby, Lua and Python.

The following table depicts the status of each interface available in SHOGUN:

| Interface | Status |
|:----------------:|-----------------------------------------------------------|
|python\_modular | *mature* (no known problems) |
|octave\_modular | *mature* (no known problems) |
|java\_modular | *stable* (no known problems; not all examples are ported) |
|ruby\_modular | *stable* (no known problems; only few examples ported) |
|csharp\_modular | *stable* (no known problems; not all examples are ported) |
|lua\_modular | *alpha* (some examples work, string typemaps are unstable |
|perl\_modular | *pre-alpha* (work in progress quality) |
|r\_modular | *pre-alpha* (SWIG does not properly handle reference counting and thus only for the brave, <br/> --disable-reference-counting to get it to work, but beware that it will leak memory; disabled by default) |
|octave\_static | *mature* (no known problems) |
|matlab\_static | *mature* (no known problems) |
|python\_static | *mature* (no known problems) |
|r\_static | *mature* (no known problems) |
|libshogun\_static | *mature* (no known problems) |
|cmdline\_static | *stable* (but some data types incomplete) |
|elwms\_static | this is the eierlegendewollmilchsau interface, a chimera that in one file interfaces with python, octave, r, matlab and provides the run\_python command to run code in python using the in octave,r,matlab available variables, etc) |

Visit http://www.shogun-toolbox.org/doc/en/current for further information.


## Platforms
------------

Debian GNU/Linux, Mac OSX and WIN32/CYGWIN are supported platforms (see
the [INSTALL](doc/md/INSTALL.md) file for generic and platform specific installation instructions).

## Contents
-----------

The following directories are found in the source distribution.

- *src* - source code.
- *data* - data sets (required for some examples / applications - these need to be downloaded
separately via the download site or `git submodule update --init` from the root of the git checkout.
- *doc* - documentation (to be built using doxygen), ipython notebooks, and the PDF tutorial.
- *examples* - example files for all interfaces.
- *applications* - applications of SHOGUN.
- *benchmarks* - speed benchmarks.
- *tests* - unit and integration tests.
- *cmake* - cmake build scripts

## Applications
---------------

We have successfully used this toolbox to tackle the following sequence
analysis problems: Protein Super Family classification[6],
Splice Site Prediction [8, 13, 14], Interpreting the SVM Classifier [11, 12],
Splice Form Prediction [8], Alternative Splicing [9] and Promotor
Prediction [15]. Some of them come with no less than 10
million training examples, others with 7 billion test examples.

## License
----------

Except for the files classifier/svm/Optimizer.{cpp,h},
classifier/svm/SVM_light.{cpp,h}, regression/svr/SVR_light.{cpp,h}
and the kernel caching functions in kernel/Kernel.{cpp,h}
which are (C) Torsten Joachims and follow a different
licensing scheme (cf. [LICENSE\_SVMlight](doc/md/LICENSE_SVMlight.md)) SHOGUN is
generally licensed under the GPL version 3 or any later version (cf.
[LICENSE](doc/md/LICENSE.md)) with code borrowed from various GPL compatible
libraries from various places (cf. [CONTRIBUTIONS](doc/md/CONTRIBUTIONS.md)). See also
[LICENSE\_msufsort](doc/md/LICENSE_msufsort.md) and [LICENSE\_tapkee](doc/md/LICENSE_tapkee.md).

## Download
-----------

SHOGUN can be downloaded from http://www.shogun-toolbox.org and GitHub at
https://github.com/shogun-toolbox/shogun.

## References
-------------

[1] C. Cortes and V.N. Vapnik. Support-vector networks.
Machine Learning, 20(3):273--297, 1995.

[2] J. Liu, S. Ji, and J. Ye. SLEP: Sparse Learning with Efficient Projections. Arizona State University, 2009.
http://www.public.asu.edu/~jye02/Software/SLEP.

[3] C.C. Chang and C.J. Lin. Libsvm: Introduction and benchmarks.
Technical report, Department of Computer Science and Information
Engineering, National Taiwan University, Taipei, 2000.

[4] T. Joachims. Making large-scale SVM learning practical. In B.~Schoelkopf,
C.J.C. Burges, and A.J. Smola, editors, Advances in Kernel Methods -
Support Vector Learning, pages 169--184, Cambridge, MA, 1999. MIT Press.

[5] A.Zien, G.Raetsch, S.Mika, B.Schoelkopf, T.Lengauer, and K.-R.
Mueller. Engineering Support Vector Machine Kernels That Recognize
Translation Initiation Sites. Bioinformatics, 16(9):799-807, September 2000.

[6] T.S. Jaakkola and D.Haussler.Exploiting generative models in
discriminative classifiers. In M.S. Kearns, S.A. Solla, and D.A. Cohn,
editors, Advances in Neural Information Processing Systems, volume 11,
pages 487-493, 1999.

[7] K.Tsuda, M.Kawanabe, G.Raetsch, S.Sonnenburg, and K.R. Mueller.
A new discriminative kernel from probabilistic models.
Neural Computation, 14:2397--2414, 2002.

[8] C.Leslie, E.Eskin, and W.S. Noble. The spectrum kernel: A string kernel
for SVM protein classification. In R.B. Altman, A.K. Dunker, L.Hunter,
K.Lauderdale, and T.E. Klein, editors, Proceedings of the Pacific
Symposium on Biocomputing, pages 564-575, Kaua'i, Hawaii, 2002.

[9] G.Raetsch and S.Sonnenburg. Accurate Splice Site Prediction for
Caenorhabditis Elegans, pages 277-298. MIT Press series on Computational
Molecular Biology. MIT Press, 2004.

[10] G.Raetsch, S.Sonnenburg, and B.Schoelkopf. RASE: recognition of
alternatively spliced exons in c. elegans. Bioinformatics,
21:i369--i377, June 2005.

[11] S.Sonnenburg, G.Raetsch, and B.Schoelkopf. Large scale genomic sequence
SVM classifiers. In Proceedings of the 22nd International Machine Learning
Conference. ACM Press, 2005.

[12] S.Sonnenburg, G.Raetsch, and C.Schaefer. Learning interpretable SVMs
for biological sequence classification. In RECOMB 2005, LNBI 3500,
pages 389-407. Springer-Verlag Berlin Heidelberg, 2005.

[13] G.Raetsch, S.Sonnenburg, and C.Schaefer. Learning Interpretable SVMs
for Biological Sequence Classification. BMC Bioinformatics, Special Issue
from NIPS workshop on New Problems and Methods in Computational Biology
Whistler, Canada, 18 December 2004, 7:(Suppl. 1):S9, March 2006.

[14] S. Sonnenburg. New methods for splice site recognition. Master's thesis,
Humboldt University, 2002. Supervised by K.R. Mueller H.D. Burkhard and
G. Raetsch.

[15] S. Sonnenburg, G. Raetsch, A. Jagota, and K.R. Mueller. New methods for
splice-site recognition. In Proceedings of the International Conference on
Artifical Neural Networks, 2002. Copyright by Springer.

[16] S. Sonnenburg, A. Zien, and G. Raetsch. ARTS: Accurate Recognition of
Transcription Starts in Human. 2006.

[17] S. Sonnenburg, G. Raetsch, C.Schaefer, and B.Schoelkopf, Large Scale
Multiple Kernel Learning, Journal of Machine Learning Research, 2006,
K.Bennett and E.P. Hernandez Editors.

[![githalytics.com alpha](https://cruel-carlota.pagodabox.com/3e5ff04ff56513867eedb5b2f4261702 "githalytics.com")](http://githalytics.com/shogun-toolbox/shogun)

0 comments on commit 46eace3

Please sign in to comment.