Contains material that is supplementary to our paper submitted to SWAT4LS'16: Int. Workshop on Semantic Web Applications and Tools for Life Sciences
Switch branches/tags
Nothing to show
Clone or download
Fetching latest commit…
Cannot retrieve the latest commit at this time.
Failed to load latest commit information.

Studying the Cohesion Evolution of Genes Related to Chronic Lymphocytic Leukemia Using Semantic Similarity in Gene Ontology and Self-Organizing Maps

This repository contains the data and code associated with the following paper:

E. Kontopoulos, T. Moysiadis, M. Tsagiopoulou, S. Darányi, P. Wittek, N. Papakonstantinou, S. Ntoufa, G. Meditskos, K. Stamatopoulos, I. Kompatsiaris. (2016). Studying the Cohesion Evolution of Genes Related to Chronic Lymphocytic Leukemia Using Semantic Similarity in Gene Ontology and Self-Organizing Maps. Proceedings of SWAT4LS-16, 9th Semantic Web Applications and Tools for Life Sciences Conference, December 2016.

The outline of the processing steps is as follows:

  1. Index the subsequent time periods by Lucene.
  2. Build random indices.
  3. Train emergent self-organizing maps.

The rest of this readme details these steps. The dependencies for the Java tools are lucene-core-4.10.3.jar, lucene-analyzers-common-4.10.3.jar, and, and SemanticVectors 5.8.


Run the class concepDrifts.PubMedAbstractIndexer with setting the parameters in the main() function for each period 1 to 3. This will yield three index directories.

Generating the random indices

We build the random indices in the data folder.

java -Xmx40000m -luceneindexpath index1
mv termvectors2.bin termvectorsperiod-1.bin
mv docvectors2.bin docvectorsperiod-1.bin

java -Xmx40000m -luceneindexpath index2
mv termvectors2.bin termvectorsperiod-2.bin
mv docvectors2.bin docvectorsperiod-2.bin

java -Xmx40000m -luceneindexpath index3
mv termvectors2.bin termvectorsperiod-3.bin
mv docvectors2.bin docvectorsperiod-3.bin

We need to convert the term vectors to text format:

java -lucenetotext termvectorsperiod-1.bin termvectorsperiod-1.txt
java -lucenetotext termvectorsperiod-2.bin termvectorsperiod-2.txt
java -lucenetotext termvectorsperiod-3.bin termvectorsperiod-3.txt

Then we transform the random index to suitable input files for Somoclu and ESOM Tools:

java conceptDrifts.SvDense2Sparse termvectorsperiod-1.txt termvectorsperiod-1.svm termvectorsperiod-1.names
java conceptDrifts.SvDense2Sparse termvectorsperiod-2.txt termvectorsperiod-2.svm termvectorsperiod-2.names
java conceptDrifts.SvDense2Sparse termvectorsperiod-3.txt termvectorsperiod-3.svm termvectorsperiod-3.names

Training the emergent self-organizing maps

We used Somoclu to train emergent self-organizing maps on a toroid topology:

somoclu -k 2 -m toroid -s 1 -x 176 -y 100 termvectorsperiod-1.svm termvectorsperiod-1
somoclu -k 2 -m toroid -c data/termvectorsperiod-1.wts -s 1 -x 176 -y 100 termvectorsperiod-2.svm termvectorsperiod-2
somoclu -k 2 -m toroid -c data/termvectorsperiod-2.wts -s 1 -x 176 -y 100 termvectorsperiod-3.svm termvectorsperiod-3


This work was supported by the European Commission Seventh Framework Programme under Grant Agreement Number FP7-601138 PERICLES.