Permalink
Switch branches/tags
Nothing to show
Find file Copy path
Fetching contributors…
Cannot retrieve contributors at this time
106 lines (83 sloc) 2.53 KB
title author date output layout toc
Translating addresses between genome builds
Vince
March 19, 2015
html_document
page
true
library(knitr)
opts_chunk$set(fig.path=paste0("figure/", sub("(.*).Rmd","\\1",basename(knitr:::knit_concord$get('infile'))), "-"))
suppressWarnings({
suppressMessages({
suppressPackageStartupMessages({
library(BSgenome.Hsapiens.UCSC.hg19)
library(TxDb.Hsapiens.UCSC.hg19.knownGene)
library(TxDb.Hsapiens.UCSC.hg38.knownGene)
library(Biostrings)
library(GenomicRanges)
library(IRanges)
library(ph525x)
library(Homo.sapiens)
library(rtracklayer)
})
})
})

Translating addresses between genome builds: liftOver

The rtracklayer package includes an interface to the liftOver utilities developed for the UCSC genome browser. The idea is that a collection of local alignments can be defined and used to remap coordinates from one reference build to another.

We can illustrate this with gene addresses created for hg38, the current reference build. We want to translate them for comparison to addresses asserted for hg19.

Acquiring a chain file

Address translation between reference builds can be specified using a chain format file. Two ways of getting the chain file are:

Direct manual acquisition

You can get it from the following URL, and use gunzip on your system to uncompress in your home dir, if you would like to emulate the commands below.

"ftp://hgdownload.cse.ucsc.edu/goldenPath/hg38/liftOver/hg38ToHg19.over.chain.gz"

Acquisition through AnnotationHub

This is fully programmatic but may involve acquiring and caching a metadata database with the AnnotationHub package.

library(AnnotationHub)
ah = AnnotationHub()
q1 = query(ah, c("chain")) # list all resources with 'chain' in metadata
q1
q2 = query(ah, c("chain", "hg38ToHg19")) # the one we want
ch = ah[[names(q2)]]
library(rtracklayer)
# following only if you do not use AnnotationHub
# ch = import.chain("~/hg38ToHg19.over.chain")
ch
str(ch[[1]])

Let's get the addresses for genes on chromosome 1 in hg38.

library(TxDb.Hsapiens.UCSC.hg38.knownGene)
tx38 = TxDb.Hsapiens.UCSC.hg38.knownGene
seqlevels(tx38) = "chr1"
g1_38 = genes(tx38)

Now execute the liftOver:

g1_19L = liftOver(g1_38, ch)

The result is a list of GRanges, one for each translation event.

g1_19L

Verification of accuracy of translation is covered in exercises.