Permalink
Branch: master
Find file Copy path
Fetching contributors…
Cannot retrieve contributors at this time
189 lines (154 sloc) 7.88 KB

Introduction

CWig is a format and toolkit for storing and analysing genome-wide density signal data. CWig files use small space and provide fast access operations. It was developed as an alternative for bigWig format from UCSC. The project aims to give flexible and convenience tools to support visualization and analysis process.

Installation

Downloads

The binary executables of CWig tools are available for Windows (win32, win64) and Linux (i686, x86_64). CWig is an open source software.

Source code is available at bitbucket Other binary builds are provided upon request. (Please e-mail to us for the release candiate code.)

Requirements

CWig can be compiled and used in both Windows, Linux and MacOS. CWig's core features requires the following software and libraries: C++11 compiler (e.g. gcc 4.6+), boost library, CMake. CWig's API for Python requires SWIG and Python library. CWig's API for Java requires SWIG and Java JNI library. bigWig2cwig tool requires UCSC jksrc library. Remote access using HTTPS feature requires OpenSSL library.

Build

Download and extract the source to a directory. From that directory, create a build directory, use cmake to generate a Makefile, and compile the package. For example:
mkdir build
cd build
cmake ..
make

If the process is successful, the excutables are produced at ./bin, and the libraries are placed at ./lib.

Documentations

Command-line tools

CWig provide the following command-line tools:

  • bedgraph2cwig, bigWig2cwig, cwig2bedgraph: convert cwig to/from other formats.
  • cwigSummary, cwigSummaryBatch: query cwig file.
  • cwigInfo: checks cwig file.

Create cwig file from bedgraph file

# bedgraph2cwig (input_bedGraph_file) (output_cwig_file)
$ bedgraph2cwig data.bedGraph data.cwig

Create cwig file from bigWig file

# bedgraph2cwig (input_bedWig_file) (output_cwig_file)
$ bigWig2cwig data.bedWig data.cwig

Decompress cwig file to bedGraph file

# bedgraph2cwig (input_cwig_file) (output_bedGraph_file)
$ cwig2bedGraph data.cwig data.bedGraph

Note that, we do not provide utility to convert cwig file to bigWig file directly. However, bedGraph file can be converted to bigWig file using UCSC jksrc tools. i.e. bedGraphTobigWig program.

Query cwig file

cwigSummary commandline tool allows user to query some regions in a cwig file. The synatx for the command is:
# cwigSummary [avg|cov|min|max|lst|map] (cwig_file)
#             (chromosome_name) (start_position) (end_position)
#             [window_count=1]
The input file name is specified by cwig_file parameter. Note that the cwig_file can be located in a web-server e.g. http://yourdomain.com/path/data.cwig. The query regions are specified by four parameters: chromosome_name, start_position, end_position, and window_count. The utility divides the range from start_position to end_position from the chromosome into window_count regions. It performs the query in each region and returns the result. For example:
$ cwigSummary avg data.cwig chr1 100000 500000 4
0.00023571  0.000276345  0.000120334  9.97036e-05
The first number is the average of the values in the file for the region from [100000, 200000) in chromosome "chr1". The second number is the average for the range [200000, 30000), etc.

The supported query are:
  • avg query: returns average signals value for each region.
  • cov query: returns the coverage (percentage of the bases that have data) for each region.
  • min query: returns the minimal signal value for each region.
  • max query: returns the maximal signal value for each region.
  • lst query: returns a list of bedGraph intervals that are intersect start_position to end_position region. (window_count value is ignore.)
  • map query: returns the signal values (for each base) inside the region. (window_count value is ignore.)

More examples:
avg example:
$ cwigSummary avg data.cwig chr1 100000 500000 4
0.0174  0.0784  0.02679  0.01059

lst query example:

$ cwigSummary lst data.cwig chr1 300000 330000
chr1	258729	320867	0
chr1	320867	320872	0.00079
chr1	320872	320877	0.00089
chr1	320877	320883	0.001
             ...
chr1	324138	324139	0.00089
chr1	324139	333869	0

map query example:

$ cwigSummary map data.cwig chr1 324125 324135
0.0012 0.0012 0.0012 0.0012 0.00109 0.00109 0.00109 0.00109 0.001 0.001

Batch query cwig file

cwigSummaryBatch is similar to cwigSummary program. However, the input regions are read from a query file (one region per line). The syntax is:
# cwigSummaryBatch [avg|cov|min|max|lst|map] (cwig_file) (query_file) [window_count=1]

For example, the query_file may look like:

chr1 100 200
chr2 200 300
chrX 300 400

It is recommended to use cwigSummaryBatch rather than call cwigSummary multiple times. Especially when the cwig file is not in the same computer to avoid initial loading time.

Print basic information of cwig file

This command load and check the integrity of a cwig file. It also prints basic information.
# cwigInfo (cwig_file)

API

All the functions of the commandline tools can be used directly through the API. We provides native C++ API. Python and Java APIs are available through SWIG wrappers. To use the the API, include the cwig sub-folder, and uses the classes in cwig.h and chrfmt.h headers. The detailed class documentations will be available soon.

Experiments

The dataset used in the paper can be download at UCSC Encode database. The bigWig files can be convert to cwig files using the convertion tools. To measure the query time of cwig, please use the cwigSummaryBatch tool. For query bigWig file in batch, you can use our modification of bigWigSummaryBatch.

Information

License

CWig is released under the LGPL License.
Copyright (c) 2012-2014 Do Huy Hoang and Sung Wing-Kin

Citation

Do.H.H and Sung.W.K. CWig: Compressed representation of Wiggle/BedGraph format. Bioinformatics. 2014

Acknowledgements

 

Contacts

Hoang or ksung