Updating doc

fikipollo · Apr 27, 2015 · c52480f · c52480f
1 parent 1138b25
commit c52480f
Show file tree

Hide file tree

Showing 12 changed files with 249 additions and 21 deletions.
diff --git a/docs/img/2_data-structure_1.jpg b/docs/img/2_data-structure_1.jpg
diff --git a/docs/img/2_data-structure_2.jpg b/docs/img/2_data-structure_2.jpg
diff --git a/docs/img/2_data-structure_3.jpg b/docs/img/2_data-structure_3.jpg
diff --git a/docs/img/3_app-structure_1.jpg b/docs/img/3_app-structure_1.jpg
diff --git a/docs/img/4_installation_1.jpg b/docs/img/4_installation_1.jpg
diff --git a/docs/installation.md → docs/installation/autoinstaller.md b/docs/installation.md → docs/installation/autoinstaller.md
diff --git a/docs/installation/install.md b/docs/installation/install.md
@@ -0,0 +1,54 @@
+# STATegra EMS Autoinstaller
+
+However an auto-install bash script for UNIX environments has been developed.
+For automatic installation, download the script from this link and follow the installation steps.
+
+# Manual installation
+1.Download the last version for STATegra EMS binaries from this [link](http://bioinfo.cipf.es/stategraems/get-stategra-ems/) and extract the compressed file.
+
+2.Copy the WAR file (stategraems_app.war) in the webapps directory located in the Tomcat directory.
+If Tomcat is running, the WAR file should be automatically deployed, creating a new directory for the STATegra EMS application (*<tomcat_location>/webapps/stategraems_app*).
+
+**Note**: By default, STATegra EMS will be accesible using *http://yourservernameandport/stategraems_app*, if you want to change this subdomain, i.e. stategraems_app, rename the WAR file before copying in the webapps.
+
+3.If you decide to change the subdomain, edit the *ServerConfiguration.js* file (*<tomcat_location>/webapps/<new_subdomain_name>/resources/ServerConfiguration.js*) replacing the **SERVER_URL** value with the new subdomain name.
+
+4.The STATegra EMS stores some files and images after users add new information in the system so it is necessary to specify the location for those files.
+Whatever location you decide, please do not forget to set allow read/write access in this location to the Tomcat user.
+
+By default, the location is */data*, if your are agree, just create the following directory structure under the */data* directory:
+
+```
+/data
+|___stategra_ems_data
+|___treatment_documents
+|___SOP_documents
+```
+
+After that, copy the provided db_config.properties file into the /data/stategra_ems_data/ dir.
+
+5.If you decide to change this location, choose your own location and create the following directory structure:
+
+```
+<Your Location>
+|___stategra_ems_data
+        |___treatment_documents
+        |___SOP_documents
+```
+
+After that, copy the provided **db_config.properties** file into the *<Your Location>/stategra_ems_data* dir.
+Finally, edit the data_location.properties file (*<tomcat_location>/webapps/<subdomain_name>/conf/data_location.properties*), replacing the **data_location** value with the new location (without the stategra_ems_data dir).
+
+6.Run the provided SQL script (*databases.sql*) to install the STATegraEMS database.
+  The used MySQL user must have DATABASE and USER CREATION privileges.
+
+```
+$ mysql -u your_mysql_user -p < databases.sql
+```
+
+7.Finally, add the Administrator user to the STATegraEMS database, setting the PASSWORD value to the password you choose.
+
+```
+$ echo "INSERT INTO STATegraDB.users VALUES('admin', SHA1('your_admin_password'),'');" | mysql -u your_mysql_user -p
+```
+
diff --git a/docs/installation/system_requirements.md b/docs/installation/system_requirements.md
@@ -0,0 +1,117 @@
+#System requirements
+
+STATegra EMS was developed to run under a UNIX SYSTEM. In spite of the software in which depends on can also work under other environments (e.g. Windows OS), it is not guaranteed that STATegra EMS does.
+In order to install STATegra EMS, you must first install the software dependencies. The required steps are outlined in the following subsections, which were followed to successfully install STATegra EMS on a Debian machine (Debian 7.1).
+
+<div class="imageContainer" style="text-align:center; font-size:10px; color:#898989" >
+    <img src="img/4_installation_1.jpg" title="System requirements."/>
+</div>
+
+## JDK
+
+STATegra EMS run under Apache Tomcat (v7.0 or greater) which depend on Java.
+Compatible JDKs for many platforms (or links to where they can be found) are available at [http://www.oracle.com/technetwork/java/javase/downloads/index.html](http://www.oracle.com/technetwork/java/javase/downloads/index.html).
+
+As example, these are the steps followed to install in the Debian machine.
+
+**1.DOWNLOAD JDK** FROM http://www.oracle.com/technetwork/java/javase/downloads/jdk7-downloads-1880260.html
+
+**2.Uncompress the jdk files** in the directory /opt/ and create a soft link
+
+```bash
+      $ sudo tar xzvf jdk-7u25-linux-i586.tar.gz -C /opt/     
+      $ sudo ln -s /opt/jdk1.7.0_25 /opt/jdk
+```
+
+**3.Create soft links for the java binaries** in /usr/local/bin
+
+```bash
+$ sudo ln -s /opt/jdk/bin/java /usr/bin/
+$ sudo ln -s /opt/jdk/bin/javaw /usr/bin/
+```
+
+**4. Check if java is correctly installed.**
+
+```bash
+$ java -version
+```
+## Apache Tomcat (v7.0 or greater)
+Binary downloads of the Apache Tomcat server are available from [http://tomcat.apache.org/download-70.cgi](http://tomcat.apache.org/download-70.cgi).
+
+You can find thousand of manuals for Tomcat installation in internet so we will not explain it deeper but, briefly, the following steps were performed.
+
+**1.Get Apache Tomcat 7***
+
+```bash
+$ wget http://apache.rediris.es/tomcat/tomcat-7/v7.0.57/bin/apache-tomcat-7.0.57.tar.gz
+$ sudo tar xzvf apache-tomcat-7.0.57.tar.gz -C /opt/
+$ sudo ln -s /opt/apache-tomcat-7.0.57/ /opt/tomcat/
+```
+
+2.Create the tomcat group and user.
+
+```bash
+$ groupadd tomcat
+$ useradd -g tomcat -d /opt/tomcat tomcat
+$ usermod -G www-data tomcat
+Add Tomcat as a service (managed by init.d).
+$ nano /etc/init.d/tomcat
+
+# Add the following code to file:
+
+#Tomcat auto-start
+#description: Auto-starts tomcat
+#processname: tomcat
+#pidfile: /var/run/tomcat.pid
+#this path should point to your JAVA_HOME Directory
+export JAVA_HOME=/opt/jdk
+case $1 in
+start)
+sh /opt/tomcat/bin/startup.sh
+;; 
+stop)
+sh /opt/tomcat/bin/shutdown.sh
+;; 
+restart)
+sh /opt/tomcat/bin/shutdown.sh
+sh /opt/tomcat/bin/startup.sh
+;;
+esac
+exit 0
+
+
+$ chmod 755 /etc/init.d/tomcat
+$ /etc/init.d/tomcat start
+```
+
+3.Enable Tomcat auto-start on boot
+
+```bash
+$ update-rc.d tomcat defaults
+```
+
+4.Modify Tomcat users file.
+Set the user and password for Tomcat Manager Interface adding the following lines (setting the username and the password to your own administration user and password).
+
+```bash
+$ nano /usr/local/tomcat/conf/tomcat-users.xml
+
+<tomcat-users>
+[...]
+<role rolename="tomcat"/> 
+<role rolename="manager-gui"/> 
+<user username="admin" password="adminpassword" roles="tomcat,manager-gui"/> 
+[...]
+</tomcat-users>
+```
+
+Now we can access to the Tomcat Manager Interface via http://localhost:8080/ or using the current IP address and the port 8080 (e.g. 172.24.76.218:8080).
+
+<p style="  font-size: 10px;">* STATegra EMS was developed and tested under Apache Tomcat Version 7, so it is not guaranteed to work on other Tomcat versions.</p>
+
+## MySQL 5
+STATegra EMS use MySQL 5 relational database management system for data storage.
+Binaries can be downloaded from [http://dev.mysql.com/downloads/mysql/](http://dev.mysql.com/downloads/mysql/).
+
+Again, we will not explain the MySQL installation since it is not the objective of this manual.
+
diff --git a/docs/introduction/app_structure.md b/docs/introduction/app_structure.md
@@ -0,0 +1,11 @@
+# STATegra EMS: Application architecture
+The STATegra EMS was designed as a multiuser web application and is divided in two components: the SERVER SIDE application and the CLIENT SIDE web application (Figure 1).
+
+<div class="imageContainer" style="text-align:center; font-size:10px; color:#898989" >
+    <img src="img/3_app-structure_1.jpg" title="Figure 1. Overview of the STATegra EMS architecture."/>
+    <p class="imageLegend">Figure 1. Overview of the STATegra EMS architecture.</p>
+</div>
+
+The server side is the responsible for keeping the consistency of data and for controlling the access to the stored information, is built using Java Servlets and a MySQL relational database and is unique for all clients. Although primarily designed and tested on Linux servers, the server EMS code could easily be adapted to work over other architectures due to the cross-platform nature of Java. Additionally, the server code was implemented using the Data Access Object design pattern in conjunction with the Data Transfer Object pattern. This provides an abstraction layer for interaction with databases that acts as an intermediary between server application (servlets) and the MySQL database, making easier future extensions of the application code with new features or changes in the database model.
+
+The STATegra EMS client side was developed as user-friendly and intuitive web application using Ext JS, a cross-browser JavaScript framework which provided powerful tools for building interactive web applications. The client side is based on the Model-View-Controller architecture pattern, which make easier to organize, maintain and extend large client applications. Communication between Client and Server side is handled by AJAX and HTTP GET and POST protocols using JavaScript Object Notation (JSON) for data exchange.
diff --git a/docs/introduction/data_structure.md b/docs/introduction/data_structure.md
@@ -0,0 +1,46 @@
+# STATegra EMS: The data structure
+
+The overall objective of the STATegra EMS is to serve as a logbook for high-throughput genomics projects performed at research labs by providing an easy-to-use tool for the annotation of experimental design, samples, measurements, and the analysis pipelines applied to the data. Experimental data and metadata are organized in the EMS around three major metadata modules (Figure 2): the Experiment module that records experimental design information and associated samples; the Samples module that collects all information on the used biomaterial; and the Analysis module that contains analysis pipelines and results. Both Sample and Analysis modules have been defined broadly to accommodate data from different type of omics experiments and still provide a common annotation framework. Commonly used standards in omics experimental data annotations were used when defining data specifications to facilitate EMS interoperability. In particular, we leveraged MIAPE [16] for proteomics analysis annotation, metabolomics guidelines proposed by [17] and [18] and MIAME [19] and MINSEQE [20] for sequencing experiments.
+
+<div class="imageContainer" style="text-align:center; font-size:10px; color:#898989" >
+    <img src="img/2_data-structure_1.jpg" title="Figure 1 Metadata Module structure in STATegra EMS. "/>
+    <p class="imageLegend">Figure 1 Metadata Module structure in STATegra EMS. The Sample module stores information of biological conditions, biological replicates and the associated analytical samples. The analysis module contains all analysis steps from raw to processed data. Both samples and analyses are associated to one or more experiments within the Experiment module.</p>
+</div>
+
+Sample and Analysis modules contain distinct Information Units (IUs), which are the basic elements of data input into the system and are connected by an experimental or analysis workflow. The Experiment Module is a wrapper of Samples and Analyses with one single data input form.
+
+1. **Experiment module**: The experiment is the central unit of information of the STATegra EMS. An Experiment is defined by some scientific goals and a given experimental design that addresses these goals. This design implies a number of biological samples and an array of omics measurements, which are assigned to the Experiment.
+
+2. **Sample module**. This section hosts the information about biological conditions and their associated biological replicates and analytical samples. The IUs of this module are:  
+
+    *Biological Condition*. These are defined by the experimental design and consist of a given biological material such as the organism, cell type, tissue, etc. and, when applicable, an experimental condition such as treatment, dose or time-point for time-series samples.
+
+    *Biological Replicate*. Each Biological Condition is assessed by using one or more biological replicates that may or may not correspond to the same experimental batch. The Biological Replicate stems directly from Biological Condition by adding a replicate number and, if applicable, a batch number.
+
+    *Experimental Batch*. Frequently, when an experiment is composed of a large number of samples, only some of them can be generated at the same time. These samples correspond to the same batch. Batch information is relevant to identify systematic sources of noise that might affect all samples within the batch.
+
+    *Analytical Sample*. Omics experiments analyze molecular components of biological samples using a given experimental protocol with the resulting analytical sample ready-to-be-measured by the high-throughout techniques. For example, a RNA-seq analytical sample is obtained after using a cytosolic mRNA extraction protocol. Similarly for metabolomics, different analytical samples can be obtained by applying multiple extraction protocols that target distinct metabolic compounds.
+
+
+
+3. **Analysis module**. The Analysis module stores high-throughput molecular data obtained by the omics technologies and the data generated after processing of the primary raw data files. In contrast to the Sample module where only metadata is stored, the Analysis module also stores pointers to data files. The Analysis module consists of three data and one logical IUs:
+
+    *Raw Data*. These files contain the data as produced by the omics equipment. For example, fastq files in the case of sequencing experiments and NMR .raw files in the case of metabolomics experiments. The Raw data IU also contains detailed information of the experimental protocol applied to the analytical sample, i.e., the library preparation protocol followed in a RNA-seq experiment or the NMR analysis characteristics in the case of metabolomics.
+
+    *Intermediate Data*. This IU covers all processing steps from raw data to process data. Different omics experiment might require zero, one, or several intermediate steps. For example, in the case of RNA-seq, the mapping to a reference genome that produces a bam file constitutes an intermediate step. ChIP-seq will generally have two intermediate steps consisting of read mapping and peak calling.
+
+    *Processed Data*. The Processed data IU contains the final processing step that result in a data file containing the final signal values for the omics assay.
+
+    *Analysis*. The STATegra EMS includes an additional IU, the Analysis, which is constructed by connecting some of the previous data IUs to define a data processing workflow. Figure 3 shows a generic representation of the workflow elements used in sequencing data analyses. An Analysis will start on a raw data file obtained from a particular analytical sample, continue through one or several intermediate data files covering different processing steps (such as trimming, mapping, filtering, merging, etc), and finalize in a processed data file that contain the signal values of the omics features. Alternatively, an Analysis can take as input a processed data file and apply additional processing steps to render a higher-level processed data. For example, in DNase-seq analysis, a primary workflow would be to call DNase hypersensitivity regions (DHR) by applying a peak-calling algorithm to a BAM file of mapped reads (Figure 4 A); whereas a secondary Analysis could involve merging DHR.bed files from N different samples to obtain a set of consolidated regions and then counting the number of reads of each sample in the consolidated region set to generate a per-sample signal value file (Figure 4 B).
+
+    In terms of data consistency, a unique Analysis ID is always associated to one Processed Data ID and describes the set of steps involved in obtaining that particular processed data. Moreover, an Analysis is always associated to one or more Experiments and, since the Analysis workflow can be traced back to raw data and its associated analytical samples, the Analysis provides the link between the Experiment and the Sample modules. By default, when a new Analysis is created, it will be assigned to the currently active Experiment. Figure 5 shows the data input window at the Analysis module. The central panel displays the input form for the different analysis steps, while at the bottom a graphical representation of the workflow allows easily monitoring the elements and structure of the Analysis.
+
+<div class="imageContainer" style="text-align:center; font-size:10px; color:#898989" >
+    <img src="img/2_data-structure_2.jpg" title="Figure 2. STATegra EMS analysis workflow components."/>
+    <p class="imageLegend">Figure 2. STATegra EMS analysis workflow components. The workflow is linked to an analytical sample object and consists of raw, intermediate and processed data IUs.</p>
+</div>
+
+<div class="imageContainer" style="text-align:center; font-size:10px; color:#898989" >
+    <img src="img/2_data-structure_3.jpg" title="Figure 3. STATegra EMS analysis workflow components."/>
+    <p class="imageLegend">Figure 3. Example of primary and secondary workflow for a DNase-seq analysis. Primary workflow (a) involves calling DNase hypersensitivity regions (DHR) by applying a peak-calling algorithm to a BAM file of mapped reads whereas secondary workflow (b) involves merging of DHR.bed files from different samples to obtain a set of consolidated regions and then counting the number of reads of each sample in the consolidated region set to generate a per-sample signal value file.</p>
+</div>