In [None]:
## This code chunk was hidden in the original document, but was exectute in the background
knitr::opts_chunk$set(results="hide", message=FALSE, warning=FALSE, fig.show="hide", echo=TRUE)

<h1>
Changing genomic coordinate systems with rtracklayer::liftOver
</h1>
The liftOver facilities developed in conjunction with the UCSC browser track infrastructure are available for transforming data in GRanges formats. This is illustrated here with an image of the NHGRI GWAS catalog that is, as of Oct. 31 2014, distributed with coordinates defined by NCBI build hg38.

<h2>
Setup: The NHGRI GWAS catalog as an hg38-based GRanges
</h2>

In [1]:
## This code chunk was hidden in the original document, but was exectute in the background
library(gwascat)
if (!exists("cur")) load("cur.rda")

Loading required package: Homo.sapiens
Loading required package: AnnotationDbi
Loading required package: stats4
Loading required package: BiocGenerics
Loading required package: parallel

Attaching package: ‘BiocGenerics’

The following objects are masked from ‘package:parallel’:

    clusterApply, clusterApplyLB, clusterCall, clusterEvalQ,
    clusterExport, clusterMap, parApply, parCapply, parLapply,
    parLapplyLB, parRapply, parSapply, parSapplyLB

The following objects are masked from ‘package:stats’:

    IQR, mad, xtabs

The following objects are masked from ‘package:base’:

    anyDuplicated, append, as.data.frame, as.vector, cbind, colnames,
    do.call, duplicated, eval, evalq, Filter, Find, get, grep, grepl,
    intersect, is.unsorted, lapply, lengths, Map, mapply, match, mget,
    order, paste, pmax, pmax.int, pmin, pmin.int, Position, rank,
    rbind, Reduce, rownames, sapply, setdiff, sort, table, tapply,
    union, unique, unlist, unsplit

Loading required package: Bioba

``` r
library(gwascat)
cur = makeCurrentGwascat()  # result varies by day
```

In [2]:
cur

gwasloc instance with 17865 records and 35 attributes per record.
Extracted:  2014-10-31 
Genome:  GRCh38 
Excerpt:
GRanges object with 5 ranges and 35 metadata columns:
      seqnames               ranges strand | Date.Added.to.Catalog  PUBMEDID
         <Rle>            <IRanges>  <Rle> |           <character> <integer>
  [1]       17 [79831041, 79831041]      * |            10/22/2014  24528284
  [2]        5 [31766326, 31766326]      * |            10/22/2014  24528284
  [3]       11 [13107616, 13107616]      * |            10/22/2014  24528284
  [4]       10 [94922089, 94922089]      * |            10/22/2014  24528284
  [5]       10 [94922089, 94922089]      * |            10/22/2014  24528284
      First.Author        Date             Journal
       <character> <character>         <character>
  [1]         Ji Y  08/01/2014 Br J Clin Pharmacol
  [2]         Ji Y  08/01/2014 Br J Clin Pharmacol
  [3]         Ji Y  08/01/2014 Br J Clin Pharmacol
  [4]         Ji Y  08/01/2014 Br J 

<h2>
Resource: The chain file for hg38 to hg19 transformation
</h2>
The transformation to hg19 coordinates is defined by a chain file provided by UCSC. rtracklayer::import.chain will bring the data into R.

In [None]:
library(rtracklayer)
ch = import.chain("hg38ToHg19.over.chain")
ch
str(ch[[1]])

Some more details about the chain data structure are available in the import.chain man page

<pre>
   A chain file essentially details many local alignments, so it is
   possible for the "from" ranges to map to overlapping regions in
   the other sequence. The "from" ranges are guaranteed to be
   disjoint (but do not necessarily cover the entire "from"
   sequence).
</pre>
<h2>
Action: liftOver
</h2>
The liftOver function will create a GRangesList.

In [None]:
seqlevelsStyle(cur) = "UCSC"  # necessary
cur19 = liftOver(cur, ch)
class(cur19)

We unlist and coerce to the gwaswloc class, a convenient form for the GWAS catalog with its many mcols fields.

In [None]:
cur19 = unlist(cur19)
genome(cur19) = "hg19"
cur19 = new("gwaswloc", cur19)
cur19

We see that the translation leads to a loss of some loci.

In [None]:
length(cur)-length(cur19)
setdiff(cur$SNPs, cur19$SNPs)

It may be interesting to &lt;a href=<http://www.ncbi.nlm.nih.gov/projects/SNP/snp_ref.cgi?rs=687289>&gt; follow up</a> some of the losses.