<h1>
Changing genomic coordinate systems with rtracklayer::liftOver
</h1>
The liftOver facilities developed in conjunction with the UCSC browser track infrastructure are available for transforming data in GRanges formats. This is illustrated here with an image of the NHGRI GWAS catalog that is, as of Oct. 31 2014, distributed with coordinates defined by NCBI build hg38.

<h2>
Setup: The NHGRI GWAS catalog as an hg38-based GRanges
</h2>

``` r
library(gwascat)
cur = makeCurrentGwascat()  # result varies by day
```

In [1]:
cur

ERROR: Error in eval(expr, envir, enclos): object 'cur' not found


<h2>
Resource: The chain file for hg38 to hg19 transformation
</h2>
The transformation to hg19 coordinates is defined by a chain file provided by UCSC. rtracklayer::import.chain will bring the data into R.

In [2]:
library(rtracklayer)
ch = import.chain("hg38ToHg19.over.chain")
ch
str(ch[[1]])

Loading required package: GenomicRanges
Loading required package: BiocGenerics
Loading required package: parallel

Attaching package: ‘BiocGenerics’

The following objects are masked from ‘package:parallel’:

    clusterApply, clusterApplyLB, clusterCall, clusterEvalQ,
    clusterExport, clusterMap, parApply, parCapply, parLapply,
    parLapplyLB, parRapply, parSapply, parSapplyLB

The following objects are masked from ‘package:stats’:

    IQR, mad, xtabs

The following objects are masked from ‘package:base’:

    anyDuplicated, append, as.data.frame, as.vector, cbind, colnames,
    do.call, duplicated, eval, evalq, Filter, Find, get, grep, grepl,
    intersect, is.unsorted, lapply, lengths, Map, mapply, match, mget,
    order, paste, pmax, pmax.int, pmin, pmin.int, Position, rank,
    rbind, Reduce, rownames, sapply, setdiff, sort, table, tapply,
    union, unique, unlist, unsplit

Loading required package: S4Vectors
Loading required package: stats4
Loading required package: IRanges


Chain of length 25
names(25): chr22 chr21 chr19 chr20 chrY chr18 ... chr6 chr5 chr4 chr3 chr2 chr1

Formal class 'ChainBlock' [package "rtracklayer"] with 6 slots
  ..@ ranges  :Formal class 'IRanges' [package "IRanges"] with 6 slots
  .. .. ..@ start          : int [1:6842] 16367189 16386933 16386970 16387001 16387128 16395491 16395528 16395841 16395860 16395956 ...
  .. .. ..@ width          : int [1:6842] 19744 36 31 112 8362 36 312 18 95 33 ...
  .. .. ..@ NAMES          : NULL
  .. .. ..@ elementType    : chr "integer"
  .. .. ..@ elementMetadata: NULL
  .. .. ..@ metadata       : list()
  ..@ offset  : int [1:6842] -480662 -480702 -480702 -480726 -480726 -480726 -480726 -480726 -480726 -480726 ...
  ..@ score   : int [1:1168] -1063867308 68830488 21156147 20814926 7358950 3927744 2928210 991419 880681 802146 ...
  ..@ space   : chr [1:1168] "chr22" "chr14" "chr22" "chr21" ...
  ..@ reversed: logi [1:1168] FALSE FALSE FALSE FALSE FALSE FALSE ...
  ..@ length  : int [1:1168] 1124 1280 173 465 398 110 43 173 342 84 ...


Some more details about the chain data structure are available in the import.chain man page

<pre>
   A chain file essentially details many local alignments, so it is
   possible for the "from" ranges to map to overlapping regions in
   the other sequence. The "from" ranges are guaranteed to be
   disjoint (but do not necessarily cover the entire "from"
   sequence).
</pre>
<h2>
Action: liftOver
</h2>
The liftOver function will create a GRangesList.

In [3]:
seqlevelsStyle(cur) = "UCSC"  # necessary
cur19 = liftOver(cur, ch)
class(cur19)

ERROR: Error in seqlevelsStyle(cur) = "UCSC": object 'cur' not found


ERROR: Error in liftOver(cur, ch): error in evaluating the argument 'x' in selecting a method for function 'liftOver': Error: object 'cur' not found



ERROR: Error in eval(expr, envir, enclos): object 'cur19' not found


We unlist and coerce to the gwaswloc class, a convenient form for the GWAS catalog with its many mcols fields.

In [4]:
cur19 = unlist(cur19)
genome(cur19) = "hg19"
cur19 = new("gwaswloc", cur19)
cur19

ERROR: Error in unlist(cur19): error in evaluating the argument 'x' in selecting a method for function 'unlist': Error: object 'cur19' not found



ERROR: Error in genome(cur19) = "hg19": object 'cur19' not found


ERROR: Error in getClass(Class, where = topenv(parent.frame())): “gwaswloc” is not a defined class


ERROR: Error in eval(expr, envir, enclos): object 'cur19' not found


We see that the translation leads to a loss of some loci.

In [5]:
length(cur)-length(cur19)
setdiff(cur$SNPs, cur19$SNPs)

ERROR: Error in eval(expr, envir, enclos): object 'cur' not found


ERROR: Error in setdiff(cur$SNPs, cur19$SNPs): error in evaluating the argument 'x' in selecting a method for function 'setdiff': Error: object 'cur' not found



It may be interesting to &lt;a href=<http://www.ncbi.nlm.nih.gov/projects/SNP/snp_ref.cgi?rs=687289>&gt; follow up</a> some of the losses.