Skip to content

Commit

Permalink
Update
Browse files Browse the repository at this point in the history
  • Loading branch information
liyao001 committed May 22, 2017
1 parent 166f725 commit 7fae9d5
Show file tree
Hide file tree
Showing 4 changed files with 97 additions and 53 deletions.
60 changes: 30 additions & 30 deletions source/cluster.rst
Expand Up @@ -17,32 +17,32 @@ is the protocol.
2. Run BioQueue web server.
3. Login to BioQueue and open ``Settings``.
4. Click ``Cluster Settings`` in the page and fill in the form. By default, the value for ``Cluster engine`` is ``Run on local / cloud`` and all options for clusters are disabled. Once you choose a cluster engine (For example, TorquePBS), the cluster model for BioQueue will be activated. To turn it off, change cluster engine back to ``Run on local / cloud``.
5. Click ``Update`` to save your changes.
5. Click ``Save changes`` to save your changes.
6. *Start the queue*.

In ``Cluster Settings`` section, we provide some options. Here is a more
detailed explanation for them.

+------------------------+------------------------------------------------------------------------------------------------------------------------------+--------------+
|Option |Description |Default |
+========================+==============================================================================================================================+==============+
|CPU cores for single job|Specify the number of virtual processors (a physical core on the node or an "execution slot") per node requested for this job.|1 |
+------------------------+------------------------------------------------------------------------------------------------------------------------------+--------------+
|Physicial memory |Maximum amount of physical memory used by any single process of the job. |No limit |
+------------------------+------------------------------------------------------------------------------------------------------------------------------+--------------+
|Virtual memory |Maximum amount of virtual memory used by all concurrent processes in the job. |No limit |
+------------------------+------------------------------------------------------------------------------------------------------------------------------+--------------+
|Destination |Defines the destination of the job. The destination names a queue, a server, or a queue at a server. |Default server|
+------------------------+------------------------------------------------------------------------------------------------------------------------------+--------------+
|Wall-time |Maximum amount of real time during which the job can be in the running state. |No limit |
+------------------------+------------------------------------------------------------------------------------------------------------------------------+--------------+
+------------------------+--------------+------------------------------------------------------------------------------------------------------------------------------+
|Option |Default |Description |
+========================+==============+==============================================================================================================================+
|CPU cores for single job|1 |Specify the number of virtual processors (a physical core on the node or an "execution slot") per node requested for this job.|
+------------------------+--------------+------------------------------------------------------------------------------------------------------------------------------+
|Physicial memory |No limit |Maximum amount of physical memory used by any single process of the job. |
+------------------------+--------------+------------------------------------------------------------------------------------------------------------------------------+
|Virtual memory |No limit |Maximum amount of virtual memory used by all concurrent processes in the job. |
+------------------------+--------------+------------------------------------------------------------------------------------------------------------------------------+
|Destination |Default server|Defines the destination of the job. The destination names a queue, a server, or a queue at a server. |
+------------------------+--------------+------------------------------------------------------------------------------------------------------------------------------+
|Wall-time |No limit |Maximum amount of real time during which the job can be in the running state. |
+------------------------+--------------+------------------------------------------------------------------------------------------------------------------------------+

For example, when BioQueue submits job on a cluster managed by TorquePBS, the options defined above will be translated into Torque parameters like this:

1. -l ppn: CPU cores BioQueue predicts the job will take, if the prediction model has not been generated, the ppn option is equal to ``CPU cores for single job``.
2. -l mem: The physical memory BioQueue predicts the job will use, if the prediction model has not been generated, the mem option is equal to ``Physicial memory``.
3. -l vmem: The virtual memory BioQueue predicts the job will use, if the prediction model has not been generated, the vmem option is equal to ``Virtual memory``.
4. -q: The destination defined in ``Destination``, for example, if the cluster has four queues: high, low, FAT_HIGH, BATCH and middle, the destination should be one of them.
4. -q: The destination defined in ``Destination``, for example, if the cluster has five queues: high, middle, low, FAT_HIGH and BATCH, the destination should be one of them.
5. -l walltime: Maximum amount of real time during which job can be in the running state defined in ``Wall-time``. For example, 24:00:00.

How to develop new cluster plugins for BioQueue
Expand Down Expand Up @@ -76,17 +76,17 @@ a DRM. The prototype of the function is::

submit_job(protocol, job_id, job_step, cpu=0, mem='', vrt_mem='', queue='', log_file='', wall_time='', workspace='')

:param protocol: string, the command a job needs to run, like "wget http://www.a.com/b.txt"
:param job_id: int, job id in BioQueue, like 1
:param job_step: int, step order in the protocol, like 0
:param cpu: int, cpu cores the job will use
:param mem: string, allocated physical memory, eg. 64G.
:param vrt_mem: string, allocated virtual memory, eg. 64G.
:param queue: string, job queue
:param log_file: string, path to store the log file
:param wall_time: string, cpu time
:param workspace: string, the initial directory of the job, all output files should be stored in the folder, or the users will not be able to see them
:return: int, if success, return job id in the cluster, else return 0
:param protocol: string, the command a job needs to run, like "wget http://www.a.com/b.txt"
:param job_id: int, job id in BioQueue, like 1
:param job_step: int, step order in the protocol, like 0
:param cpu: int, cpu cores the job will use
:param mem: string, allocated physical memory, eg. 64G.
:param vrt_mem: string, allocated virtual memory, eg. 64G.
:param queue: string, job queue
:param log_file: string, path to store the log file
:param wall_time: string, cpu time
:param workspace: string, the initial directory of the job, all output files should be stored in the folder, or the users will not be able to see them
:return: int, if success, return job id in the cluster, else return 0

*Note: BioQueue will assign '' to both mem and vrt_mem if the user doesn't
define the max amount of pyhsical memory or virtual memory a job can use and
Expand All @@ -102,8 +102,8 @@ prototype of the function is::

query_job_status(job_id)

:param job_id: int, job id in the cluster
:return: int, job status
:param job_id: int, job id in the cluster
:return: int, job status

If the job has completed, the function should return 0. If the job is running,
it should return 1. If the job is queuing, it should return 2. If an
Expand All @@ -116,8 +116,8 @@ of the function is::

cancel_job(job_id)

:param job_id: int, job id
:return: if success, return 1, else return 0
:param job_id: int, job id
:return: if success, return 1, else return 0

3. Share the plugin with everyone
+++++++++++++++++++++++++++++++++
Expand Down
49 changes: 38 additions & 11 deletions source/faq.rst
Expand Up @@ -14,9 +14,31 @@ And users can use a ftp client (`FileZilla <https://filezilla-project.org/>`_, t

How to update BioQueue
----------------------
We will bring new features and fix bugs when we release a new version. So we recommand you to keep your instance as new as possible. If you have an BioQueue repository and want to update it, run::
We will bring new features and fix bugs when we release a new version. So we recommand you to keep your instance as new as possible. If you have an BioQueue repository and want to update it, there are several ways to do so.

git pull
1. Run update.py in worker folder
+++++++++++++++++++++++++++++++++
We provide a python script named as ``update.py`` in ``worker`` folder, which will check updates for both BioQueue's source code and dependent packages::

python worker/update.py

Also, for Linux/Unix users, BioQueue update service can run in background by run ``update_daemon.py`` instead of ``update.py``::

python worker/update_daemon.py start

This service will check for update everyday.

2. Click Update button in ``Settings`` page
+++++++++++++++++++++++++++++++++++++++++++
We also provide an update button in the ``Settings`` page, clicking the button, BioQueue will call ``update.py`` to update your instance.

3. git pull
+++++++++++
You can also use git pull command to update BioQueue's source code, but this command won't update the dependent packages!

4. NOTE
+++++++
The update service relies on git, so please make sure that you have installed git and you cloned BioQueue from GitHub.

Use BioQueue with Apache in Production Environment
--------------------------------------------------
Expand All @@ -36,18 +58,23 @@ Note: For virtualenv users, please replace ``/usr/lib/python2.7/dist-packages``

Cannot install MySQL-python?
----------------------------
By default, BioQueue will use a python package called MySQL-python to connect to MySQL server. However, it may be hard to install it especially for non-root users. The alternative solution is to use PyMySQL (a pure python mysql client). Here is the protocol:
By default, BioQueue will use a python package called MySQL-python to connect to MySQL server. However, it may be hard to install it especially for non-root users. The alternative solution is to use PyMySQL (a pure python mysql client). We provide a python script in BioQueue's ``deploy`` folder to help you to complete the switch. So for most of our users, the following command should be enough to solve this problem::

1. Remove ``MySQL-python>=1.2.5`` from ``prerequisites.txt`` in ``deploy`` folder.
2. Rerun ``install.py``.
3. Copy the following code and paste them into ``manage.py`` and ``worker >> __init__.py``::
try:
import pymysql
python deploy/switch_from_MySQLdb_to_PyMySQL.py

However, if you want to try it yourself, here is the protocol:

pymysql.install_as_MySQLdb()
1. Remove ``MySQL-python==1.2.5`` from ``prerequisites.txt`` in ``deploy`` folder.
2. Copy the python code and paste them into the begining of ``manage.py`` and ``worker >> __init__.py``:
3. Rerun ``install.py``.

Code::

try:
import pymysql
pymysql.install_as_MySQLdb()
except ImportError:
pass
4. Restart BioQueue.
pass

Turn on E-mail Notification
---------------------------
Expand Down
3 changes: 3 additions & 0 deletions source/getstarted.rst
Expand Up @@ -15,6 +15,7 @@ BioQueue can store data on SQLite, which means users can set up BioQueue without

Since BioQueue is written in Python 2.7, please make sure that you have installed Python and pip. The following instructions are for Ubuntu 14.04, but can be used as guidelines for other Linux flavors::

sudo apt-get install build-essential
sudo apt-get install python-dev
sudo apt-get install python-pip

Expand All @@ -26,6 +27,8 @@ First of all, clone the project from github (Or you can download BioQueue by ope
Or
wget https://github.com/liyao001/BioQueue/zipball/master

**NOTE:Download archives rather than use git makes it more difficult to stay up-to-date with BioQueue code because there is no simple way to update the copy.**

Then navigate to the project's directory, and run the ``install.py`` script (All dependent python packages will be automatically installed)::

cd BioQueue
Expand Down
38 changes: 26 additions & 12 deletions source/protocol.rst
Expand Up @@ -11,7 +11,7 @@ To create a new protocol, you can either click ``Create Protocol`` button at the

In BioQueue, you need to enter the software ("hisat2") into ``Software`` textbox, and then enter "-x ucsc_hg19.fastq -1 reads_1.fastq -2 reads_2.fastq -S alns.sam -t 16" into ``Parameter`` textbox.

.. image:: https://cloud.githubusercontent.com/assets/17058337/21838123/37a7925e-d80b-11e6-8dc1-1176965870f0.png
.. image:: https://cloud.githubusercontent.com/assets/17058337/26297208/7698b000-3f04-11e7-9636-66ccb820449f.png

Actually, a typical protocol contains many steps, so you can click the ``Add Step`` button to add more step. After you added all steps, click ``Create Protocol`` button.

Expand All @@ -37,8 +37,9 @@ And if you want to analyze one more sample (for example Samples), you just need

Otherwise, you will have to create a new protocol.


Mapping Files Between Steps
^^^^^^^^^^^^^^^^^^^^^^^^^^^
---------------------------
In most cases, a step denotes how to create output files from input files. Since a protocol usually consists of many steps, making mapping of files flexible enough is a very important issue. So BioQueue provides three types of file mapping methods to do this.

The first method is to write the file name directly. For some tools, the output files have standard names. One example is `STAR <https://github.com/alexdobin/STAR>`_, when it finished mapping RNA-SEQ reads, it would produce those files:
Expand Down Expand Up @@ -72,7 +73,7 @@ The third method is to use the **“Output”** family wildcards. Here is a tabl
+---------------+------------------------------------------------------------------------------+

More About Wildcards
^^^^^^^^^^^^^^^^^^^^
--------------------
There are two main types of wildcards in BioQueue: pre-defined wildcards and user-defined wildcards (experimental variables and reference). Below is a table of pre-defined wildcards:

+---------------+------------------------------------------------------------------------------+
Expand Down Expand Up @@ -105,10 +106,6 @@ There are two main types of wildcards in BioQueue: pre-defined wildcards and use
|ThreadN |The number of CPUs in the running node. |
+---------------+------------------------------------------------------------------------------+

BioQueue provides a auto complete function when you employee those pre-defined tags:

.. image:: https://cloud.githubusercontent.com/assets/17058337/21838121/377dfb2e-d80b-11e6-860b-9ca319a73fe2.gif

Now, let’s have a look at the user-defined wildcards. As we mentioned before, BioQueue suggests users use wildcards to denote experimental variables in protocols, like sample name. This type of user-defined wildcards need to be assigned as ``Job parameter`` when creating jobs. In bioinformatics, there are some data that can be cited in different protocols, such as reference genome, GENCODE annotation, etc. So, in BioQueue, biological data that may be used in multiple protocols is called a “reference”. This is the other type of user-defined wildcards and it is defined at the ``Reference`` page.

.. image:: https://cloud.githubusercontent.com/assets/17058337/21838125/37c77d4e-d80b-11e6-8a3f-795ec896a824.png
Expand All @@ -129,16 +126,33 @@ Here is a table showing how the usage of reference can reduce the redundancy of

Note: Don't forget to add braces before you use a reference in any of your protocol, like ``{HG38}``!

Create a Protocol with Ease
---------------------------
To help general biologists to create a protocol with ease, we provide auxiliary functions which cover the entire process.

1. Knowledge Base
+++++++++++++++++
We set up a knowledge base on our open platform, so when our users need to search the usage information about a certain software, they can click the ``How to use the software?`` button.

.. image:: https://cloud.githubusercontent.com/assets/17058337/26296755/ac4335c4-3f02-11e7-96fd-459005631ec2.gif

2. Autocomplete
+++++++++++++++
We provide an autocomplete widget to provide suggestions about pre-defined wildcards and user-defined references. Here is a demo:

.. image:: https://cloud.githubusercontent.com/assets/17058337/26296868/262db83c-3f03-11e7-80e1-b421e2180dc0.gif

In the demo, {HISAT2_HG38} is a user-defined reference, which refers to the path of hg38 indexes for HISAT2. While {InputFile:1}, {InputFile:1} and {ThreadN} are pre-defined wildcards.


Edit Steps
-------------
----------
When you need to change parameters of a certain step, you should click ``Edit Protocol`` at the sidebar. Then you move mouse to ``Operation`` column where the protocol locates in, and click the ``Edit Protocol`` label.

.. image:: https://cloud.githubusercontent.com/assets/17058337/26282377/2b41de5e-3e43-11e7-8dd2-d185217d9fba.gif

When the steps' table shows up, you can click the parameter of the step. Now you can edit the parameter. Once you click any place at that page, your changes will be saved automatically.



Share Protocol With Peer
------------------------
We know the importance of making computational analysis in life sciences:
Expand All @@ -150,10 +164,10 @@ So, protocols written by BioQueue can be shared with a peer who are using the sa

To share a protocol with a peer, you need to open the ``Edit protocol`` page, and choose ``Share`` in the ``Operation`` column.

.. image:: https://cloud.githubusercontent.com/assets/17058337/21994280/e9fc3bce-dc59-11e6-8cbe-a0b9407f65a9.png
.. image:: https://cloud.githubusercontent.com/assets/17058337/26297301/e41b2ff4-3f04-11e7-94d2-bc4a1175c7e6.gif

Then enter username of the peer you want to share with, and click ``Share with a peer``.

.. image:: https://cloud.githubusercontent.com/assets/17058337/21994281/e9fd1af8-dc59-11e6-940f-710997f114ee.png
.. image:: https://cloud.githubusercontent.com/assets/17058337/26297266/afea3a7c-3f04-11e7-863a-95eea9afaba8.png

To share a protocol with the public, you need to open the same dialog, and click the ``Build a sharable protocol`` button, then a protocol file would be generated. You can publish this protocol on `BioQueue Open Platform <http://open.bioqueue.org>`_ or any other web forums.

0 comments on commit 7fae9d5

Please sign in to comment.