Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Summarise FACETS results to gene-level results #48

Closed
charlottekyng opened this issue Aug 11, 2015 · 4 comments
Closed

Summarise FACETS results to gene-level results #48

charlottekyng opened this issue Aug 11, 2015 · 4 comments

Comments

@charlottekyng
Copy link
Contributor

#### turn segmented copy number data to gene-based copy number with findOverlaps
## define HomDel as TCN=0, loss as TCN<ploidy, gain as TCN>ploidy, amp as TCN>=ploidy+4
## where ploidy= mode of TCN
### some variant of the below, also need one for the breast panel, IMPACT310 and exome

genes <- read.delim("/home/ngk1/share/reference/IMPACT410_genes_for_copynumber.txt", as.is=T)

genesGR <- GRanges(seqnames=genes$chromosome, 
        ranges=IRanges(as.numeric(genes$start_position), as.numeric(genes$end_position)),
        mcols=genes[,c("order", "Cyt", "hgnc_symbol")])

facets_files <- dir("facets", pattern="txt", full=T)

mm <- do.call("cbind", lapply(facets_files, function(f) {
    tab <- read.delim(f, as.is=T)
    tab$chrom[which(tab$chrom==23)] <- "X"

    tabGR <- GRanges(seqnames=tab$chrom, 
        ranges=IRanges(as.numeric(tab$loc.start), as.numeric(tab$loc.end)),
        mcols=tab[,-c(1:4)])

    fo <- findOverlaps(tabGR, genesGR)
    rr <- ranges(fo, ranges(tabGR), ranges(genesGR))
    df <- cbind(as.data.frame(fo), as.data.frame(rr))

    df <- cbind(df, mcols(genesGR)[df$subjectHits,], mcols(tabGR)[df$queryHits,])

#when genes span multiple segments
    oo <- tapply(df$mcols.cnlr.median, df$subjectHits, function(x){which.max(abs(x))})
    oo <- oo[match(1:409, names(oo))]
    oo[which(is.na(oo))] <- 1

    df <- df[unlist(lapply(1:409, function(x) { which(df$mcols.order==x)[oo[which(names(oo)==x)]]})),]

    ploidy <- table(df$mcols.tcn)
    ploidy <- as.numeric(names(ploidy)[which.max(ploidy)])

    df$GL <- 0
    df$GL[which(df$mcols.tcn<ploidy)] <- -1
    df$GL[which(df$mcols.tcn==0)] <- -2
    df$GL[which(df$mcols.tcn>ploidy)] <- 1
    df$GL[which(df$mcols.tcn>=ploidy+4)] <- 2

    df <- df[match(genes$order, df$mcols.order),]
    df$GL
}))
colnames(mm) <- facets_files
mm <- cbind(genes, mm)
write.table(mm, file="GL.txt", sep="\t", row.names=F, na="", quote=F)
@raylim
Copy link
Collaborator

raylim commented Aug 13, 2015

just fyi, github supports markdown (three single quotes are used to delimit code)

@charlottekyng
Copy link
Contributor Author

This isn't done yet. Still need reference files for the breast panel v2 and exome and defaulting the optList$geneLocFile to the appropriate reference.

@charlottekyng
Copy link
Contributor Author

Please fix this - I have 4 projects waiting for this fix. Thanks very much.

@charlottekyng
Copy link
Contributor Author

Sorry i have to re-open this - the number of rows in the geneCN.txt file is smaller than the number of genes. Please print genes that cannot be assigned to segments. If there are genes for which positions cannot be retrieved, please check if they are the official gene names.

@charlottekyng charlottekyng reopened this Aug 18, 2015
@raylim raylim closed this as completed in 338e075 Aug 18, 2015
@raylim raylim reopened this Aug 18, 2015
@raylim raylim closed this as completed in e311572 Aug 18, 2015
@raylim raylim reopened this Aug 18, 2015
@raylim raylim closed this as completed in 0cd3837 Aug 18, 2015
raylim added a commit that referenced this issue Aug 18, 2015
raylim added a commit that referenced this issue Aug 19, 2015
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

No branches or pull requests

2 participants