Skip to content

Commit

Permalink
more on datahub and tracks
Browse files Browse the repository at this point in the history
  • Loading branch information
lidaof committed Oct 10, 2018
1 parent 9b0c5fa commit abafe77
Show file tree
Hide file tree
Showing 2 changed files with 202 additions and 31 deletions.
146 changes: 127 additions & 19 deletions docs/datahub.rst
Original file line number Diff line number Diff line change
Expand Up @@ -77,7 +77,7 @@ Example bigWig track
"color": "blue"
}
}
Example methylC track
----------------------

Expand Down Expand Up @@ -108,27 +108,132 @@ Example categorical track
"name": "ChromHMM",
"url": "https://egg.wustl.edu/d/hg19/E017_15_coreMarks_dense.gz",
"options": {
"category": {
"1": {"name": "Active TSS", "color": "#ff0000"},
"2": {"name": "Flanking Active TSS", "color": "#ff4500"},
"3": {"name": "Transcr at gene 5' and 3'", "color": "#32cd32"},
"4": {"name": "Strong transcription", "color": "#008000"},
"5": {"name": "Weak transcription", "color": "#006400"},
"6": {"name": "Genic enhancers", "color": "#c2e105"},
"7": {"name": "Enhancers", "color": "#ffff00"},
"8": {"name": "ZNF genes & repeats", "color": "#66cdaa"},
"9": {"name": "Heterochromatin", "color": "#8 a91d0"},
"10": {"name": "Bivalent/Poised TSS", "color": "#cd5c5c"},
"11": {"name": "Flanking Bivalent TSS/Enh", "color": "#e9967a"},
"12": {"name": "Bivalent Enhancer", "color": "#bdb76b"},
"13": {"name": "Repressed PolyComb", "color": "#808080"},
"14": {"name": "Weak Repressed PolyComb", "color": "#c0c0c0"},
"15": {"name": "Quiescent/Low", "color": "#ffffff"}
}
"category": {
"1": {"name": "Active TSS", "color": "#ff0000"},
"2": {"name": "Flanking Active TSS", "color": "#ff4500"},
"3": {"name": "Transcr at gene 5' and 3'", "color": "#32cd32"},
"4": {"name": "Strong transcription", "color": "#008000"},
"5": {"name": "Weak transcription", "color": "#006400"},
"6": {"name": "Genic enhancers", "color": "#c2e105"},
"7": {"name": "Enhancers", "color": "#ffff00"},
"8": {"name": "ZNF genes & repeats", "color": "#66cdaa"},
"9": {"name": "Heterochromatin", "color": "#8 a91d0"},
"10": {"name": "Bivalent/Poised TSS", "color": "#cd5c5c"},
"11": {"name": "Flanking Bivalent TSS/Enh", "color": "#e9967a"},
"12": {"name": "Bivalent Enhancer", "color": "#bdb76b"},
"13": {"name": "Repressed PolyComb", "color": "#808080"},
"14": {"name": "Weak Repressed PolyComb", "color": "#c0c0c0"},
"15": {"name": "Quiescent/Low", "color": "#ffffff"}
}
}
}
Supported options: backgroundColor_, color_, color2_, yScale_, yMax_, and yMin_
Supported options: backgroundColor_, color_, color2_, yScale_, yMax_, and yMin_.

Example longrange track
-----------------------

.. code-block:: json
{
"type": "longrange",
"name": "ES-E14 ChIA-PET",
"url": "https://egg.wustl.edu/d/mm9/GSE28247_st3c.gz"
}
Example bigInteract track
-------------------------

.. code-block:: json
{
"type": "biginteract",
"name": "test bigInteract",
"url": "https://epgg-test.wustl.edu/dli/long-range-test/interactExample3.inter.bb"
}
Example repeatmasker track
--------------------------

.. code-block:: json
{
"type": "repeatmasker",
"name": "RepeatMasker",
"url": "https://vizhub.wustl.edu/public/mm10/rmsk16.bb"
}
Example geneAnnotation track
----------------------------

.. code-block:: json
{
"type": "geneAnnotation",
"name": "refGene",
"genome": "mm10"
}
.. note:: Please specify the ``genome`` attibute for gene annotation tracks.

Example bigbed track
--------------------

.. code-block:: json
{
"type": "bigbed",
"name": "test bigbed",
"url": "https://vizhub.wustl.edu/hubSample/hg19/bigBed1"
}
Example bed track
-----------------

.. code-block:: json
{
"type": "bed",
"name": "mm10 bed",
"url": "https://epgg-test.wustl.edu/d/mm10/mm10_cpgIslands.bed.gz"
}
Example HiC track
-----------------

.. code-block:: json
{
"type": "hic",
"name": "test hic",
"url": "https://epgg-test.wustl.edu/dli/long-range-test/test.hic",
"options": {
"displayMode": "arc"
}
}
Example genomealign track
-------------------------

.. code-block:: json
{
"name": "hg19 to mm10 alignment",
"type": "genomealign",
"metadata": {
"genome": "mm10"
}
}
Example Ruler track
--------------------

.. code-block:: json
{
"type": "ruler",
"name": "Ruler"
}
Track properties
----------------
Expand All @@ -148,6 +253,9 @@ type
* repeatmasker
* geneAnnotation
* genomealign
* longrange
* bigInteract
* ruler

.. note:: ``type`` is case insensitive.

Expand Down
87 changes: 75 additions & 12 deletions docs/tracks.rst
Original file line number Diff line number Diff line change
Expand Up @@ -39,13 +39,14 @@ The following sections introduce the track types that the browser supports.

Binary track file formats like bigWig_ and HiC_ can be used directly with the browser.

bedGraph_, methylC_, categorical_, and bed_ track files need to
bedGraph_, methylC_, categorical_, longrange_ and bed_ track files need to
be `compressed by bgzip and indexed by tabix`_ for use by the browser.
The resulting index file with suffix ``.tbi`` needs to be located
at the same URL with the ``.gz`` file.

Bed like format track files need be sorted before submission. For example, if we have a track file named ``track.bedgraph``
we can use the generic Linux ``sort`` command, the ``bedSort`` tool from UCSC, or the ``sort-bed`` command from BEDOPS. Here is an example command using each of the three methods::
we can use the generic Linux ``sort`` command, the ``bedSort`` tool from UCSC, or the ``sort-bed`` command from BEDOPS.
Here is an example command using each of the three methods::

# Using Linux sort
sort -k1,1 -k2,2n track.bedgraph > track.bedgraph.sorted
Expand All @@ -64,7 +65,8 @@ The two files must be in the same directory. Obtain the URL to "track.bedgraph.s

.. _`compressed by bgzip and indexed by tabix`: http://www.htslib.org/doc/tabix.html

SAM files first need to be compressed to BAM_ files. BAM_ files need to be coordinate sorted and indexed for use by the browser.
SAM files first need to be compressed to BAM_ files. BAM_ files need to be coordinate sorted and
indexed for use by the browser.
The resulting index file with suffix ``.bai`` needs be located
at the same URL with the ``.bam`` file.

Expand All @@ -82,7 +84,8 @@ Here is an example command::
Annotation Tracks
----------------

Annotation tracks represent genomic features or intervals across the genome. Popular examples include SNP files, CpG Island files, and blacklisted regions.
Annotation tracks represent genomic features or intervals across the genome.
Popular examples include SNP files, CpG Island files, and blacklisted regions.

bed
~~~
Expand All @@ -97,13 +100,21 @@ Example lines are below::
chr9 3036420 3036660 Blacklist_157 . +

Every line must consist of at least 3 fields separated by the ``Tab`` delimiter. The required fields from
left to right are ``chromosome``, ``start position`` (0-based), and ``end position`` (not included). A fourth (optional) column is reserved for the name of the interval and the sixth column (optional) is reserved for the strand. All other columns are ignored, but can be present in the file.
left to right are ``chromosome``, ``start position`` (0-based), and ``end position`` (not included).
A fourth (optional) column is reserved for the name of the interval and the sixth column (optional)
is reserved for the strand. All other columns are ignored, but can be present in the file.

.. image:: _static/Bed_format_with_different_columns.png

.. note:: The display of a bed file differs by how many columns are provided in the file (see image above). The simplest, 3 column, format just displays blocks for each interval. The four column format displays the name of each element over each interval. If the sixth column is provided in the file then ``>>>`` or ``<<<`` will be displayed over each interval to represent strand information.
.. note:: The display of a bed file differs by how many columns are provided in the file
(see image above). The simplest, 3 column, format just displays blocks for
each interval. The four column format displays the name of each element over each interval.
If the sixth column is provided in the file then ``>>>`` or ``<<<`` will be displayed over
each interval to represent strand information.

.. _UCSC bed: https://genome.ucsc.edu/FAQ/FAQformat.html#format1
.. _`UCSC bed`: https://genome.ucsc.edu/FAQ/FAQformat.html#format1

This format needs to be compressed by bgzip and indexed by tabix for submission as a track. See `Prepare track files`_.

Numerical Tracks
----------------
Expand All @@ -119,7 +130,7 @@ bigWig
``bigWig`` is a popular format to represent numerical values over genomic coordinates.
Please check the `UCSC bigWig`_ page to learn more about this format.

.. _UCSC bigWig: https://genome.ucsc.edu/goldenpath/help/bigWig.html
.. _`UCSC bigWig`: https://genome.ucsc.edu/goldenpath/help/bigWig.html

bedGraph
~~~~~~~~
Expand All @@ -143,8 +154,10 @@ left to right are ``chromosome``, ``start position`` (0-based), ``end position``

.. _UCSC bedGraph: https://genome.ucsc.edu/goldenpath/help/bedgraph.html

This format needs to be compressed by bgzip and indexed by tabix for submission as a track. See `Prepare track files`_.

Read Alignment BAM Tracks
----------------
-------------------------

BAM
~~~~
Expand All @@ -154,7 +167,6 @@ Please check the `Samtools Documentation`_ page to learn more about this format

.. _Samtools Documentation: https://samtools.github.io/hts-specs/SAMv1.pdf


Methylation tracks
------------------

Expand All @@ -181,22 +193,48 @@ Each line contains 7 fields separated by Tab. The fields are
``methylation context`` (CG, CHG, CHG etc.), ``methylation value``, ``strand``,
and ``read depth``.

This format needs to be compressed by bgzip and indexed by tabix for submission as a track. See `Prepare track files`_.

Categorical tracks
------------------

Categorical tracks represent genomic bins for different categories. The most popular
example is the represnetation of chromHMM data which indicates which region is likely an enhancer, likely a promoter, etc.
Other uses for the track include the display of different types of methylation (DMRs, DMVs, LMRs, UMRs, etc.) or even peaks colored by tissue type.
Other uses for the track include the display of different types of methylation
(DMRs, DMVs, LMRs, UMRs, etc.) or even peaks colored by tissue type.

categorical
~~~~~~~~~~~
The ``categorical`` track uses the first three columns of the standard `bed`_ format (``chromosome``, ``start position`` (0-based), and ``end position`` (not included)) with the addition of a 4th column indicating the category type which can be a string or number::

The ``categorical`` track uses the first three columns of the standard `bed`_ format
(``chromosome``, ``start position`` (0-based), and ``end position`` (not included))
with the addition of a 4th column indicating the category type which can be a string or number::

chr1 start1 end1 category1
chr2 start2 end2 category2
chr3 start3 end3 category3
chr4 start4 end4 category4

.. important:: when you use numbers like 1, 2 and 3 as category names, in the datahub definition,
please use it a string for the ``category`` attribute in options, see the example below:

.. code-block:: json
{
"type": "categorical",
"name": "ChromHMM",
"url": "https://egg.wustl.edu/d/hg19/E017_15_coreMarks_dense.gz",
"options": {
"category": {
"1": {"name": "Active TSS", "color": "#ff0000"},
"2": {"name": "Flanking Active TSS", "color": "#ff4500"},
"3": {"name": "Transcr at gene 5' and 3'", "color": "#32cd32"}
}
}
}
This format needs to be compressed by bgzip and indexed by tabix for submission as a track. See `Prepare track files`_.

Long range chromatin interaction
--------------------------------

Expand All @@ -207,3 +245,28 @@ HiC
~~~

To learn more about the HiC format please check https://github.com/aidenlab/juicer/wiki/Data.

longrange
~~~~~~~~~

The ``longrange`` track is a `bed`_ format-like file type. Each row contains columns from left to right:
``chromosome``, ``start position`` (0-based), and ``end position`` (not included), interaction target
in this format ``chr2:333-444,55``. As an example, interval "chr1:111-222" interacts with
interval "chr2:333-444" on a score of 55,
we will use following two lines to represent this interaction::

chr1 111 222 chr2:333-444,55
chr2 333 444 chr1:111-222,55

.. important:: Be sure to make **TWO** records for a pair of interacting loci,
one record for each locus.

This format needs to be compressed by bgzip and indexed by tabix for submission as a track. See `Prepare track files`_.

bigInteract
~~~~~~~~~~~

The bigInteract format from UCSC can also be used at the browser, for more details about
this format, please check the `UCSC bigInteract format`_ page.

.. _`UCSC bigInteract format`: https://genome.ucsc.edu/goldenPath/help/interact.html

0 comments on commit abafe77

Please sign in to comment.