The Genome U-Plot is a JavaScript tool to visualize Chromosomal abnormalities in the Human Genome using a U-shape layout.
Whole Genome U-Plot. Visible are the 24 human chromosomes arranged in a U-shape, the cytobands, the chromosome junctions and the copy number variations (CNVs). The axes at the bottom right of the graph are respectively for the chromosomes on the right side of the plot.
Node.js is an open-source, cross-platform JavaScript runtime environment for developing a diverse variety of server tools and applications.
We use Node for basically everything in this project, so we are going to need it. Please visit the download page for macOS or Windows binaries, or the package manager installations page for Linux distributions.
In this project we used Node.js v6.10.0 LTS.
If you need the flexibility to use multiple versions of Node, check out NVM or Windows NVM.
NPM is the default package manager for Node. It is automatically installed alongside with Node. Package managers are used to install and manage packages (modules of code that you or someone else wrote). We are going to use a lot of packages but we'll use Yarn, another package manager.
Yarn is a Node.js package manager which is much faster than NPM, has offline support, and fetches dependencies more predictably.
Use NPM and run:
> $ npm install --global yarn
Start a command shell, change directory to the directory of the project and install the project dependencies using:
> $ yarn install
Use:
> $ yarn start
Using a modern browser visit:
http://localhost:8000/GenomePlot.html?sampleId=LNCAP
A sample (LNCAP) with all required files is provided in the public/data
directory
LNCAP/LNCAP_alts_comprehensive.csv (Sample Rearrangements)
LNCAP/LNCAP_cnvIntervals.csv (Sample Copy Number Variation - Intervals)
LNCAP/LNCAP_genomePlot_cnv30.json (Sample Copy Number Variation - Raw Frequency)
LNCAP/LNCAP_visualization.json (Sample Definition)
In order to run the application against a different sample (eg. MY_SAMPLE) you need to create an appropriate directory and file structure replacing for example LNCAP with MY_SAMPLE. Finally don't forget to replace your sample name in the URL parameter of the app.
- A Human Genome Assembly GRCh38 cytobands reference file is provided by the visualization (
public/reference/cytobands/hg38/cytoBand.json
), however if you want to use your own you may download and uncompress a definition file from ftp://hgdownload.cse.ucsc.edu/goldenPath/hg38/database/cytoBand.txt.gz. Then you must convert the file to a json format of the following form:
[
{
chrom: "chr1",
chromStart: 0,
chromEnd: 2300000,
gieStain: "gneg",
name: "p36.33"
}, {
chrom: "chr1",
chromStart: 2300000,
chromEnd: 5300000,
gieStain: "gpos25",
name: "p36.32"
},
...
]
A sample specific json file must be provided (as in LNCAP\LNCAP_visualization.json
):
{
fileFormatVersion: 1,
altsComprehensive: "sampleId_alts_comprehensive.csv",
cnvBinned30KJson: "sampleId_genomePlot_cnv30.json",
cnvIntervals: "sampleId_cnvIntervals.csv"
}
In order to visualize chromosomal rearrangements, a csv file is required (as in LNCAP/LNCAP_alts_comprehensive.csv
) and the following columns of integers must be supplied:
Nassoc,chrA,chrB,posA,posB
where Nassoc is the number (integer) of supporting fragments of the events.
In order to visualize copy number, two files of a specific format must be supplied. First, a file (as in LNCAP/LNCAP_genomePlot_cnv30.json
) with the raw frequency data from a 30000 bin moving window.
The second file contains the copy number state information; a csv file (as in LNCAP/LNCAP_cnvIntervals.csv
) with the following columns must be supplied:
chr,start,end,cnvState,nrd
where cnvState is one of 1 (loss), 2 (normal) or 3 (gain) and nrd is a floating point value corresponding to the Normalized Read Depth score that provides a quantitative measure of how far the CNV deviates from the calculated normal level (nrd = 2.0).
In order to run the application against a sample that is stored in a VCF file, we provide an R script vcftoUplot.R
(which resides in the public/data
directory). The script was tested with R-3.3.3 and requires the R package VariantAnnotation
, which will be automatically installed if not present. The script takes as input a VCF file (tested VCF v4.1 and v4.2) and produces the file structure hierarchy required by the Genome U-Plot in order to visualize the sample. Finally don't forget to replace your sample name in the URL parameter of the app.
Given a VCF sample file NA12878.vcf (provided in the public/data
directory), run
Rscript vcftoUplot.R NA12878.vcf
This will produce the following directory hierarchy
NA12878/
├── NA12878_alts_comprehensive.csv
└── NA12878_visualization.json
Then, using a modern browser visit:
http://localhost:8000/GenomePlot.html?sampleId=NA12878
Note: For this particular example you should use the "Filter on # of Frags" GUI option in order to reduce the number of visualized Chromosomal abnormalities. You can also uncheck the "Line width to # Frags" to disassociate the line thickness from the number of fragments supporting the event.
Note II: The Human Genome Assembly GRCh38 is assumed
If you want to use Genome U-Plot in commercial settings, please contact us.
Gaitatzes AG, Johnson SH, Smadbeck JB and Vasmatzis G.; Genome U-Plot: a whole genome visualization. Bioinformatics 2017 Dec 21. https://doi.org/10.1093/bioinformatics/btx829