Update

liyao001 · May 22, 2017 · 7fae9d5 · 7fae9d5
1 parent 166f725
commit 7fae9d5
Show file tree

Hide file tree

Showing 4 changed files with 97 additions and 53 deletions.
diff --git a/source/cluster.rst b/source/cluster.rst
@@ -17,32 +17,32 @@ is the protocol.
 2. Run BioQueue web server.
 3. Login to BioQueue and open ``Settings``.
 4. Click ``Cluster Settings`` in the page and fill in the form. By default, the value for ``Cluster engine`` is ``Run on local / cloud`` and all options for clusters are disabled. Once you choose a cluster engine (For example, TorquePBS), the cluster model for BioQueue will be activated. To turn it off, change cluster engine back to ``Run on local / cloud``.
-5. Click ``Update`` to save your changes.
+5. Click ``Save changes`` to save your changes.
 6. *Start the queue*.
 
 In ``Cluster Settings`` section, we provide some options. Here is a more
 detailed explanation for them.
 
-+------------------------+------------------------------------------------------------------------------------------------------------------------------+--------------+
-|Option                  |Description                                                                                                                   |Default       |
-+========================+==============================================================================================================================+==============+
-|CPU cores for single job|Specify the number of virtual processors (a physical core on the node or an "execution slot") per node requested for this job.|1             |
-+------------------------+------------------------------------------------------------------------------------------------------------------------------+--------------+
-|Physicial memory        |Maximum amount of physical memory used by any single process of the job.                                                      |No limit      |
-+------------------------+------------------------------------------------------------------------------------------------------------------------------+--------------+
-|Virtual memory          |Maximum amount of virtual memory used by all concurrent processes in the job.                                                 |No limit      |
-+------------------------+------------------------------------------------------------------------------------------------------------------------------+--------------+
-|Destination             |Defines the destination of the job. The destination names a queue, a server, or a queue at a server.                          |Default server|
-+------------------------+------------------------------------------------------------------------------------------------------------------------------+--------------+
-|Wall-time               |Maximum amount of real time during which the job can be in the running state.                                                 |No limit      |
-+------------------------+------------------------------------------------------------------------------------------------------------------------------+--------------+
++------------------------+--------------+------------------------------------------------------------------------------------------------------------------------------+
+|Option                  |Default       |Description                                                                                                                   |
++========================+==============+==============================================================================================================================+
+|CPU cores for single job|1             |Specify the number of virtual processors (a physical core on the node or an "execution slot") per node requested for this job.|
++------------------------+--------------+------------------------------------------------------------------------------------------------------------------------------+
+|Physicial memory        |No limit      |Maximum amount of physical memory used by any single process of the job.                                                      |
++------------------------+--------------+------------------------------------------------------------------------------------------------------------------------------+
+|Virtual memory          |No limit      |Maximum amount of virtual memory used by all concurrent processes in the job.                                                 |
++------------------------+--------------+------------------------------------------------------------------------------------------------------------------------------+
+|Destination             |Default server|Defines the destination of the job. The destination names a queue, a server, or a queue at a server.                          |
++------------------------+--------------+------------------------------------------------------------------------------------------------------------------------------+
+|Wall-time               |No limit      |Maximum amount of real time during which the job can be in the running state.                                                 |
++------------------------+--------------+------------------------------------------------------------------------------------------------------------------------------+
 
 For example, when BioQueue submits job on a cluster managed by TorquePBS, the options defined above will be translated into Torque parameters like this:
 
 1. -l ppn: CPU cores BioQueue predicts the job will take, if the prediction model has not been generated, the ppn option is equal to ``CPU cores for single job``.
 2. -l mem: The physical memory BioQueue predicts the job will use, if the prediction model has not been generated, the mem option is equal to ``Physicial memory``.
 3. -l vmem: The virtual memory BioQueue predicts the job will use, if the prediction model has not been generated, the vmem option is equal to ``Virtual memory``.
-4. -q: The destination defined in ``Destination``, for example, if the cluster has four queues: high, low, FAT_HIGH, BATCH and middle, the destination should be one of them.
+4. -q: The destination defined in ``Destination``, for example, if the cluster has five queues: high, middle, low, FAT_HIGH and BATCH, the destination should be one of them.
 5. -l walltime: Maximum amount of real time during which job can be in the running state defined in ``Wall-time``. For example, 24:00:00.
 
 How to develop new cluster plugins for BioQueue
@@ -76,17 +76,17 @@ a DRM. The prototype of the function is::
 
   submit_job(protocol, job_id, job_step, cpu=0, mem='', vrt_mem='', queue='', log_file='', wall_time='', workspace='')
 
-  :param protocol: string, the command a job needs to run, like "wget http://www.a.com/b.txt"
-  :param job_id: int, job id in BioQueue, like 1
-  :param job_step: int, step order in the protocol, like 0
-  :param cpu: int, cpu cores the job will use
-  :param mem: string, allocated physical memory, eg. 64G.
-  :param vrt_mem: string, allocated virtual memory, eg. 64G.
-  :param queue: string, job queue
-  :param log_file: string, path to store the log file
-  :param wall_time: string, cpu time
-  :param workspace: string, the initial directory of the job, all output files should be stored in the folder, or the users will not be able to see them
-  :return: int, if success, return job id in the cluster, else return 0
+:param protocol: string, the command a job needs to run, like "wget http://www.a.com/b.txt"
+:param job_id: int, job id in BioQueue, like 1
+:param job_step: int, step order in the protocol, like 0
+:param cpu: int, cpu cores the job will use
+:param mem: string, allocated physical memory, eg. 64G.
+:param vrt_mem: string, allocated virtual memory, eg. 64G.
+:param queue: string, job queue
+:param log_file: string, path to store the log file
+:param wall_time: string, cpu time
+:param workspace: string, the initial directory of the job, all output files should be stored in the folder, or the users will not be able to see them
+:return: int, if success, return job id in the cluster, else return 0
 
 *Note: BioQueue will assign '' to both mem and vrt_mem if the user doesn't
 define the max amount of pyhsical memory or virtual memory a job can use and
@@ -102,8 +102,8 @@ prototype of the function is::
 
   query_job_status(job_id)
 
-  :param job_id: int, job id in the cluster
-  :return: int, job status
+:param job_id: int, job id in the cluster
+:return: int, job status
 
 If the job has completed, the function should return 0. If the job is running,
 it should return 1. If the job is queuing, it should return 2. If an
@@ -116,8 +116,8 @@ of the function is::
 
   cancel_job(job_id)
 
-  :param job_id: int, job id
-  :return: if success, return 1, else return 0
+:param job_id: int, job id
+:return: if success, return 1, else return 0
 
 3. Share the plugin with everyone
 +++++++++++++++++++++++++++++++++

diff --git a/source/faq.rst b/source/faq.rst
@@ -14,9 +14,31 @@ And users can use a ftp client (`FileZilla <https://filezilla-project.org/>`_, t
 
 How to update BioQueue
 ----------------------
-We will bring new features and fix bugs when we release a new version. So we recommand you to keep your instance as new as possible. If you have an BioQueue repository and want to update it, run::
+We will bring new features and fix bugs when we release a new version. So we recommand you to keep your instance as new as possible. If you have an BioQueue repository and want to update it, there are several ways to do so.
 
-  git pull
+1. Run update.py in worker folder
++++++++++++++++++++++++++++++++++
+We provide a python script named as ``update.py`` in ``worker`` folder, which will check updates for both BioQueue's source code and dependent packages::
+
+  python worker/update.py
+
+Also, for Linux/Unix users, BioQueue update service can run in background by run ``update_daemon.py`` instead of ``update.py``::
+
+  python worker/update_daemon.py start
+
+This service will check for update everyday.
+
+2. Click Update button in ``Settings`` page
++++++++++++++++++++++++++++++++++++++++++++
+We also provide an update button in the ``Settings`` page, clicking the button, BioQueue will call ``update.py`` to update your instance.
+
+3. git pull
++++++++++++
+You can also use git pull command to update BioQueue's source code, but this command won't update the dependent packages!
+
+4. NOTE
++++++++
+The update service relies on git, so please make sure that you have installed git and you cloned BioQueue from GitHub.
 
 Use BioQueue with Apache in Production Environment
 --------------------------------------------------
@@ -36,18 +58,23 @@ Note: For virtualenv users, please replace ``/usr/lib/python2.7/dist-packages``
 
 Cannot install MySQL-python?
 ----------------------------
-By default, BioQueue will use a python package called MySQL-python to connect to MySQL server. However, it may be hard to install it especially for non-root users. The alternative solution is to use PyMySQL (a pure python mysql client). Here is the protocol:
+By default, BioQueue will use a python package called MySQL-python to connect to MySQL server. However, it may be hard to install it especially for non-root users. The alternative solution is to use PyMySQL (a pure python mysql client). We provide a python script in BioQueue's ``deploy`` folder to help you to complete the switch. So for most of our users, the following command should be enough to solve this problem::
 
-1. Remove ``MySQL-python>=1.2.5`` from ``prerequisites.txt`` in ``deploy`` folder.
-2. Rerun ``install.py``.
-3. Copy the following code and paste them into ``manage.py`` and ``worker >> __init__.py``::
-  try:
-    import pymysql
+  python deploy/switch_from_MySQLdb_to_PyMySQL.py
+
+However, if you want to try it yourself, here is the protocol:
 
-    pymysql.install_as_MySQLdb()
+1. Remove ``MySQL-python==1.2.5`` from ``prerequisites.txt`` in ``deploy`` folder.
+2. Copy the python code and paste them into the begining of ``manage.py`` and ``worker >> __init__.py``:
+3. Rerun ``install.py``.
+
+Code::
+
+  try:
+      import pymysql
+      pymysql.install_as_MySQLdb()
   except ImportError:
-    pass
-4. Restart BioQueue.
+      pass
 
 Turn on E-mail Notification
 ---------------------------

diff --git a/source/getstarted.rst b/source/getstarted.rst
@@ -15,6 +15,7 @@ BioQueue can store data on SQLite, which means users can set up BioQueue without
 
 Since BioQueue is written in Python 2.7, please make sure that you have installed Python and pip. The following instructions are for Ubuntu 14.04, but can be used as guidelines for other Linux flavors::
 
+	sudo apt-get install build-essential
 	sudo apt-get install python-dev
 	sudo apt-get install python-pip
 
@@ -26,6 +27,8 @@ First of all, clone the project from github (Or you can download BioQueue by ope
 	Or
 	wget https://github.com/liyao001/BioQueue/zipball/master
 
+**NOTE:Download archives rather than use git makes it more difficult to stay up-to-date with BioQueue code because there is no simple way to update the copy.**
+
 Then navigate to the project's directory, and run the ``install.py`` script (All dependent python packages will be automatically installed)::
 
 	cd BioQueue

diff --git a/source/protocol.rst b/source/protocol.rst
@@ -11,7 +11,7 @@ To create a new protocol, you can either click ``Create Protocol`` button at the
 
 In BioQueue, you need to enter the software ("hisat2") into ``Software`` textbox, and then enter "-x ucsc_hg19.fastq -1 reads_1.fastq -2 reads_2.fastq -S alns.sam -t 16" into ``Parameter`` textbox.
 
-.. image:: https://cloud.githubusercontent.com/assets/17058337/21838123/37a7925e-d80b-11e6-8dc1-1176965870f0.png
+.. image:: https://cloud.githubusercontent.com/assets/17058337/26297208/7698b000-3f04-11e7-9636-66ccb820449f.png
 
 Actually, a typical protocol contains many steps, so you can click the ``Add Step`` button to add more step. After you added all steps, click ``Create Protocol`` button.
 
@@ -37,8 +37,9 @@ And if you want to analyze one more sample (for example Samples), you just need
 
 Otherwise, you will have to create a new protocol.
 
+
 Mapping Files Between Steps
-^^^^^^^^^^^^^^^^^^^^^^^^^^^
+---------------------------
 In most cases, a step denotes how to create output files from input files. Since a protocol usually consists of many steps, making mapping of files flexible enough is a very important issue. So BioQueue provides three types of file mapping methods to do this.
 
 The first method is to write the file name directly. For some tools, the output files have standard names. One example is `STAR <https://github.com/alexdobin/STAR>`_, when it finished mapping RNA-SEQ reads, it would produce those files:
@@ -72,7 +73,7 @@ The third method is to use the **“Output”** family wildcards. Here is a tabl
 +---------------+------------------------------------------------------------------------------+
 
 More About Wildcards
-^^^^^^^^^^^^^^^^^^^^
+--------------------
 There are two main types of wildcards in BioQueue: pre-defined wildcards and user-defined wildcards (experimental variables and reference). Below is a table of pre-defined wildcards:
 
 +---------------+------------------------------------------------------------------------------+
@@ -105,10 +106,6 @@ There are two main types of wildcards in BioQueue: pre-defined wildcards and use
 |ThreadN        |The number of CPUs in the running node.                                       |
 +---------------+------------------------------------------------------------------------------+
 
-BioQueue provides a auto complete function when you employee those pre-defined tags:
-
-.. image:: https://cloud.githubusercontent.com/assets/17058337/21838121/377dfb2e-d80b-11e6-860b-9ca319a73fe2.gif
-
 Now, let’s have a look at the user-defined wildcards. As we mentioned before, BioQueue suggests users use wildcards to denote experimental variables in protocols, like sample name. This type of user-defined wildcards need to be assigned as ``Job parameter`` when creating jobs. In bioinformatics, there are some data that can be cited in different protocols, such as reference genome, GENCODE annotation, etc. So, in BioQueue, biological data that may be used in multiple protocols is called a “reference”. This is the other type of user-defined wildcards and it is defined at the ``Reference`` page.
 
 .. image:: https://cloud.githubusercontent.com/assets/17058337/21838125/37c77d4e-d80b-11e6-8a3f-795ec896a824.png
@@ -129,16 +126,33 @@ Here is a table showing how the usage of reference can reduce the redundancy of
 
 Note: Don't forget to add braces before you use a reference in any of your protocol, like ``{HG38}``!
 
+Create a Protocol with Ease
+---------------------------
+To help general biologists to create a protocol with ease, we provide auxiliary functions which cover the entire process.
+
+1. Knowledge Base
++++++++++++++++++
+We set up a knowledge base on our open platform, so when our users need to search the usage information about a certain software, they can click the ``How to use the software?`` button.
+
+.. image:: https://cloud.githubusercontent.com/assets/17058337/26296755/ac4335c4-3f02-11e7-96fd-459005631ec2.gif
+
+2. Autocomplete
++++++++++++++++
+We provide an autocomplete widget to provide suggestions about pre-defined wildcards and user-defined references. Here is a demo:
+
+.. image:: https://cloud.githubusercontent.com/assets/17058337/26296868/262db83c-3f03-11e7-80e1-b421e2180dc0.gif
+
+In the demo, {HISAT2_HG38} is a user-defined reference, which refers to the path of hg38 indexes for HISAT2. While {InputFile:1}, {InputFile:1} and {ThreadN} are pre-defined wildcards.
+
+
 Edit Steps
--------------
+----------
 When you need to change parameters of a certain step, you should click ``Edit Protocol`` at the sidebar. Then you move mouse to ``Operation`` column where the protocol locates in, and click the ``Edit Protocol`` label.
 
 .. image:: https://cloud.githubusercontent.com/assets/17058337/26282377/2b41de5e-3e43-11e7-8dd2-d185217d9fba.gif
 
 When the steps' table shows up, you can click the parameter of the step. Now you can edit the parameter. Once you click any place at that page, your changes will be saved automatically.
 
-
-
 Share Protocol With Peer
 ------------------------
 We know the importance of making computational analysis in life sciences:
@@ -150,10 +164,10 @@ So, protocols written by BioQueue can be shared with a peer who are using the sa
 
 To share a protocol with a peer, you need to open the ``Edit protocol`` page, and choose ``Share`` in the ``Operation`` column.
 
-.. image:: https://cloud.githubusercontent.com/assets/17058337/21994280/e9fc3bce-dc59-11e6-8cbe-a0b9407f65a9.png
+.. image:: https://cloud.githubusercontent.com/assets/17058337/26297301/e41b2ff4-3f04-11e7-94d2-bc4a1175c7e6.gif
 
 Then enter username of the peer you want to share with, and click ``Share with a peer``.
 
-.. image:: https://cloud.githubusercontent.com/assets/17058337/21994281/e9fd1af8-dc59-11e6-940f-710997f114ee.png
+.. image:: https://cloud.githubusercontent.com/assets/17058337/26297266/afea3a7c-3f04-11e7-863a-95eea9afaba8.png
 
 To share a protocol with the public, you need to open the same dialog, and click the ``Build a sharable protocol`` button, then a protocol file would be generated. You can publish this protocol on `BioQueue Open Platform <http://open.bioqueue.org>`_ or any other web forums.