Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Highlight multiple different vectors and colour them differently? #38

Closed
evad80 opened this issue Apr 8, 2016 · 5 comments
Closed

Highlight multiple different vectors and colour them differently? #38

evad80 opened this issue Apr 8, 2016 · 5 comments

Comments

@evad80
Copy link

evad80 commented Apr 8, 2016

Hi, I'm not sure if this is the right place to post this type of question.

I am a biologist. I ran an association analysis, and I got back a set of SNPs of interest (called sig_snps). However, some of these SNPs are associated with body size (called sig_snps_but_assoc_with_body_size), and I want to show the difference between the set of SNPs that are significant, and the set of SNPs that are also significant with body size (if they are also associated with body size, they are less interesting).

On my manhattan plot zoomed in one on chromosome, I want to have three sets of coloured points: the non-associated SNPs, the SNPs associated with the trait, and the SNPs associated with the trait, but also associated with body size.

I wrote this code:

library("qqman")
table <-read.table("Original.linear",header=T)
data <-as.data.frame(table)
sig_snps <-scan("SigSNPsAfterBodySize.snps","")
sig_snps_but_assoc_with_body_size <-scan("SigSNPsButAssocWithBodySize","")
pdf("Test.pdf")
manhattan(subset(data,CHR == 1),highlight = c(sig_snps_after_body_size,sig_snps_but_assoc_with_body_size), main = "Chromosome 1", cex= 2.5, ylab = expression(paste("Association with Cancer Mortality (-",log[10],"[P])")),cex.lab = 1.2, cex.axis=1.5, cex.main = 1.2,suggestiveline=F,genomewideline=F)

As you can see, to achieve this, I tried to make a vector for the two sets of SNPs that I wanted coloured (sig SNPs, and sig SNPs but associated with body size). As you can see from the output attached:
Test.pdf

The command ran without error, but didn't work.
So I have two questions:

  1. Is what I'm doing possible,if the SNP is in the "sig_snps" object colour it red. If the SNP is in the "sig_snps_but_assoc_with_body_size", colour it blue. Else, colour it black.
  2. I saw that I can change the source code colours here: "If you'd like to change the color of the highlight or the suggestive/genomewide lines, you'll need to modify the source code. Search for col="blue", col="red", or col="green3" to modify the suggestive line, genomewide line, and highlight colors, respectively."

My problem is: I'm not a programmer, and also I'm not sure how I would change them to include two different colours for my above problem.

If someone could help with this code I would appreciate it!

Thanks
Eva

@stephenturner
Copy link
Owner

This is the place, and thanks for posting. I'll keep this issue open for now, but in all honesty, it won't be a feature for a while. The general approach would be to pass in two lists: one list of SNP rsid vectors, and one list of colors. The SNP vectors in the elements of the first list would be highlighted by the list of colors in list 2. That's the general strategy if you or anyone else wants to take a shot at implementing.

@evad80
Copy link
Author

evad80 commented Apr 8, 2016

Thanks for the quick reply! I would love to but unfortunately it would be over my head. But I really appreciate the response; at least now I know it's not something that I'm just not doing properly with the existing code.

@evermane
Copy link

evermane commented Jun 1, 2016

Sorry if this is too late to be useful, but I wanted to do a similar thing, I think. I modified the source code so that I could highlight two sets of snps in the same manhattan plot and changed the color so that one set appeared green (highlight1) and the other (highlight2) appeared blue. The command to run this modified code is the same as before except the highlight part. For example:

manhattan(data, highlight1=set1, highlight2=set2)

Here is the modified code:

manhattan1<-function (x, chr = "CHR", bp = "BP", p = "P", snp = "SNP", col = c("gray10",
"gray60"), chrlabs = NULL, suggestiveline = -log10(1e-05),
genomewideline = -log10(5e-08), highlight1 = NULL, highlight2 = NULL, logp = TRUE,
...)
{
CHR = BP = P = index = NULL
if (!(chr %in% names(x)))
stop(paste("Column", chr, "not found!"))
if (!(bp %in% names(x)))
stop(paste("Column", bp, "not found!"))
if (!(p %in% names(x)))
stop(paste("Column", p, "not found!"))
if (!(snp %in% names(x)))
warning(paste("No SNP column found. OK unless you're trying to highlight."))
if (!is.numeric(x[[chr]]))
stop(paste(chr, "column should be numeric. Do you have 'X', 'Y', 'MT', etc? If so change to numbers and try again."))
if (!is.numeric(x[[bp]]))
stop(paste(bp, "column should be numeric."))
if (!is.numeric(x[[p]]))
stop(paste(p, "column should be numeric."))
d = data.frame(CHR = x[[chr]], BP = x[[bp]], P = x[[p]])
if (!is.null(x[[snp]]))
d = transform(d, SNP = x[[snp]])
d <- subset(d, (is.numeric(CHR) & is.numeric(BP) & is.numeric(P)))
d <- d[order(d$CHR, d$BP), ]
if (logp) {
d$logp <- -log10(d$P)
}
else {
d$logp <- d$P
}
d$pos = NA
d$index = NA
ind = 0
for (i in unique(d$CHR)) {
ind = ind + 1
d[d$CHR == i, ]$index = ind
}
nchr = length(unique(d$CHR))
if (nchr == 1) {
options(scipen = 999)
d$pos = d$BP/1e+06
ticks = floor(length(d$pos))/2 + 1
xlabel = paste("Chromosome", unique(d$CHR), "position(Mb)")
labs = ticks
}
else {
lastbase = 0
ticks = NULL
for (i in unique(d$index)) {
if (i == 1) {
d[d$index == i, ]$pos = d[d$index == i, ]$BP
}
else {
lastbase = lastbase + tail(subset(d, index ==
i - 1)$BP, 1)
d[d$index == i, ]$pos = d[d$index == i, ]$BP +
lastbase
}
ticks = c(ticks, (min(d[d$CHR == i, ]$pos) + max(d[d$CHR ==
i, ]$pos))/2 + 1)
}
xlabel = "Chromosome"
labs <- unique(d$CHR)
}
xmax = ceiling(max(d$pos) * 1.03)
xmin = floor(max(d$pos) * -0.03)
def_args <- list(xaxt = "n", bty = "n", xaxs = "i", yaxs = "i",
las = 1, pch = 20, xlim = c(xmin, xmax), ylim = c(0,
ceiling(max(d$logp))), xlab = xlabel, ylab = expression(-log10))
dotargs <- list(...)
do.call("plot", c(NA, dotargs, def_args[!names(def_args) %in%
names(dotargs)]))
if (!is.null(chrlabs)) {
if (is.character(chrlabs)) {
if (length(chrlabs) == length(labs)) {
labs <- chrlabs
}
else {
warning("You're trying to specify chromosome labels but the number of labels != number of chromosomes.")
}
}
else {
warning("If you're trying to specify chromosome labels, chrlabs must be a character vector")
}
}
if (nchr == 1) {
axis(1, ...)
}
else {
axis(1, at = ticks, labels = labs, ...)
}
col = rep(col, max(d$CHR))
if (nchr == 1) {
with(d, points(pos, logp, pch = 20, col = col[1], ...))
}
else {
icol = 1
for (i in unique(d$index)) {
with(d[d$index == unique(d$index)[i], ], points(pos,
logp, col = col[icol], pch = 20, ...))
icol = icol + 1
}
}
if (suggestiveline)
abline(h = suggestiveline, col = "blue")
if (genomewideline)
abline(h = genomewideline, col = "red")
if (!is.null(highlight1)) {
if (any(!(highlight1 %in% d$SNP)))
warning("You're trying to highlight1 SNPs that don't exist in your results.")
d.highlight1 = d[which(d$SNP %in% highlight1), ]
with(d.highlight1, points(pos, logp, col = "green3", pch = 20,
...))
}
if (!is.null(highlight2)) {
if (any(!(highlight2 %in% d$SNP)))
warning("You're trying to highlight2 SNPs that don't exist in your results.")
d.highlight2 = d[which(d$SNP %in% highlight2), ]
with(d.highlight2, points(pos, logp, col = "dodgerblue", pch = 20,
...))
}
}

@stephenturner
Copy link
Owner

Thanks @evermane!

@clagiamba
Copy link

Hi,

I am trying to solve the same issue but with many different colors, I feel like the code is almost all there but I am missing something because legend and categories don't quite match...

This is the part I am adding:

##############

Highlight2: highlight snps from a second character vector

keyword2 = "group"

if (!is.null(highlight2)) {
if (any(!(highlight2 %in% d$SNP))) warning("You're trying to highlight SNPs that don't exist in your results.")
d.highlight2=d[which(d$SNP %in% highlight2), ]
if (!is.null(x[[keyword2]])) d.highlight2=transform(d.highlight2, keyword2=x[[keyword2]])

jColors <- with(d.highlight2,
data.frame(keyword2 = levels(as.factor(d.highlight2$keyword2)),
highlight.col = rainbow(length(levels(as.factor(d.highlight2$keyword2)))), name = 'highlight2'))
d.highlight2$highlight.col = jColors[match(d.highlight2$keyword2, jColors$keyword2), "highlight.col"]
d.highlight2$highlight.col = factor(d.highlight2$highlight.col)

with(d.highlight2, points(pos, logp, col=levels(d.highlight2$highlight.col), pch=20, ...)) 

# with(d.highlight2, points(pos, logp, col=rainbow(length(levels(as.factor(d.highlight2$keyword2)))), pch=20, ...)) 
# rainbow(length(levels(as.factor(d.highlight2$keyword2))))
legend = TRUE
# if (legend)  legend("topleft", legend=levels(as.factor(d.highlight2[,"keyword2"])), text.col=seq_along(levels(as.factor(d.highlight2[,"keyword2"]))), cex=0.5,ncol=1,horiz=F)
if (legend)  {
legend(x = 'topleft', 
   legend = levels(d.highlight2$keyword2),
   col = levels(d.highlight2$highlight.col), pch = par("pch"), bty = 'n', xjust = 1)
   }

}
##############

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants