Plotting large (400 taxa) trees with VisualizeMatching #124

ammaraziz · 2024-08-20T08:44:29Z

Thanks for the fantastic R package, it's been great to work with.

One issue I am facing is that I'd like to compare 2 trees with identical taxa. When using VisualizeMatching to compare these two trees the result is not great:

Could you provide advice on how to plot large trees?

Thanks,

Ammar

The text was updated successfully, but these errors were encountered:

ms609 · 2024-08-20T08:49:53Z

These are large trees! What is your desired outcome in visualizing the matching? Presumably inspecting all 400 matched clades is going to yield limited information, or at the very least be difficult to interpret. Is there a subset of clades you are particularly interested in matching, for example?

ammaraziz · 2024-08-20T12:47:25Z

I'd like to see which internal branches are incongruent between a known good tree and another tree. I am following this example here: https://ms609.github.io/TreeDist/articles/using-distances.html#testing-similarity-to-a-known-tree

For big trees there are a few things that would be great to control:

Remove the border around the node number
Control over font size
Control over taxa labels (on/off)
Control over showing high matching score

I hacked together a solution that gets about 80% of the way there by modifying VisualizeMatching. However, I got stuck around the last point to declutter the tree.

Here's the modified function:

function(Func, tree1, tree2, setPar = TRUE, precision = 3L, 
                 Plot = plot.phylo, matchZeros = TRUE, plainEdges = FALSE, 
                 edge.width = 1, edge.color = "black", cex = cex, 
                 show.edgelabels = TRUE, ...) 
{
    splits1 <- as.Splits(tree1)
    edge1 <- tree1$edge
    child1 <- edge1[, 2]
    nTip <- attr(splits1, "nTip")
    splitEdges1 <- vapply(as.integer(rownames(splits1)), function(node) which(child1 == 
                                                                                  node), integer(1))
    splits2 <- as.Splits(tree2, tipLabels = tree1)
    edge2 <- tree2$edge
    child2 <- edge2[, 2]
    splitEdges2 <- vapply(as.integer(rownames(splits2)), function(node) which(child2 == 
                                                                                  node), integer(1))
    matching <- Func(tree1, tree2, reportMatching = TRUE)
    pairings <- attr(matching, "matching")
    scores <- attr(matching, "pairScores")
    pairScores <- signif(mapply(function(i, j) scores[i, j], 
                                seq_along(pairings), pairings), precision)
    adjNo <- c(0.5, -0.2)
    adjVal <- c(0.5, 1.1)
    faint <- "#aaaaaa"
    if (setPar) {
        origPar <- par(mfrow = c(1, 2), mar = rep(0.5, 4))
        on.exit(par(origPar))
    }
    LabelUnpaired <- function(splitEdges, unpaired) {
        if (any(unpaired)) {
            edgelabels(text = expression("-"), edge = splitEdges[unpaired], 
                       frame = "n", col = faint, adj = adjNo)
            edgelabels(text = "0", edge = splitEdges[unpaired], 
                       frame = "n", col = faint, cex = cex, adj = adjVal)
        }
    }
    if (plainEdges) {
        Plot(tree1, edge.width = edge.width, edge.color = edge.color, 
             ...)
    }
    else {
        Normalize <- function(scores, na.rm = FALSE) {
            if (length(scores) == 0) 
                return(scores)
            if (na.rm) {
                scores <- scores[!is.na(scores)]
            }
            else {
                scores[is.na(scores)] <- 0
            }
            if (any(scores < 0)) {
                stop("Negative scores not supported")
            }
            if (max(scores) == 0) 
                return(scores)
            if (min(scores) == max(scores)) 
                return(rep(1L, length(scores)))
            scores/max(scores)
        }
        OtherRootEdge <- function(splitNodes, edge) {
            parent <- edge[, 1]
            child <- edge[, 2]
            rootEdges <- which(parent == min(parent))
            rootChildren <- child[rootEdges]
            treeIsRooted <- length(rootChildren) < 3
            if (treeIsRooted) {
                splitEdges <- vapply(splitNodes, match, table = child, 
                                     0)
                got <- rootChildren %in% splitNodes
                if (any(got)) {
                    if (sum(got) != 1) {
                        warning("Unexpected polytomy")
                    }
                    c(score = as.integer(which(splitNodes %in% 
                                                   rootChildren[got])), edge = rootEdges[!got])
                }
                else {
                    c(score = NA_integer_, edge = NA_integer_)
                }
            }
            else {
                c(score = NA_integer_, edge = NA_integer_)
            }
        }
        edgeColPalette <- sequential_hcl(n = 256L, palette = "Viridis")
        EdgyPlot <- function(tree, splits, edge, splitEdges, 
                             normalizedScores, ...) {
            splitNodes <- as.integer(names(splits))
            ore <- OtherRootEdge(splitNodes, edge)
            if (length(normalizedScores) && !is.na(ore[1])) {
                ns <- c(normalizedScores, normalizedScores[ore["score"]])
                se <- c(splitEdges, ore[2])
            }
            else {
                ns <- normalizedScores
                se <- splitEdges
            }
            edge.width <- rep(1, nrow(edge))
            edge.width[se] <- 1 + (10 * ns)
            edge.color <- rep("black", nrow(edge))
            edge.color[se] <- edgeColPalette[1 + ceiling(255 * 
                                                             ns)]
            Plot(tree, edge.width = edge.width, edge.color = edge.color, 
                 ...)
        }
        EdgyPlot(tree1, splits1, edge1, splitEdges1, Normalize(pairScores), 
                 ...)
    }
    paired1 <- if (matchZeros) {
        !is.na(pairings)
    }
    else {
        !is.na(pairings) & pairScores > 0
    }
    palette <- qualitative_hcl(sum(paired1), c = 42, l = 88)
    pairedPairScores <- pairScores[paired1]
    pairLabels <- seq_len(sum(paired1))
    
    if (any(pairLabels)) {
        if (show.edgelabels){
            edgelabels(text = pairLabels, edge = splitEdges1[paired1], 
                       frame = "n", bg = palette, adj = adjNo, cex = cex)
            edgelabels(text = pairedPairScores, edge = splitEdges1[paired1], 
                       frame = "n", adj = adjVal, cex = cex, col = ifelse(pairedPairScores, 
                                                                          "black", faint))
        }
    }

    LabelUnpaired(splitEdges1, !paired1)
    paired2 <- seq_along(splitEdges2) %in% pairings[paired1]
    pairNames2 <- pairings[paired1]
    if (plainEdges) {
        Plot(tree2, edge.width = edge.width, edge.color = edge.color, cex=cex,
             ...)
    }
    else {
        EdgyPlot(tree2, splits2[[pairNames2]], edge2, splitEdges2[pairNames2], 
                 Normalize(pairedPairScores, na.rm = TRUE), ...)
    }
    if (any(pairLabels)) {
        if (show.edgelabels){
            edgelabels(text = pairLabels, edge = splitEdges2[pairNames2], 
                       frame = 'n', bg = palette, adj = adjNo, cex = cex)
            edgelabels(text = pairedPairScores, edge = splitEdges2[pairNames2], 
                       frame = "n", adj = adjVal, cex = cex, col = ifelse(pairedPairScores, 
                                                                          "black", faint))
        }
    }
    LabelUnpaired(splitEdges2, !paired2)
    invisible()
}

I can submit a pull request with the above changes (with some modification to be more customisable) if you think it's acceptable.

ammaraziz · 2024-08-20T12:49:40Z

Alternatively, is there a way to find which internal branches differ (have low score) between two trees and which taxa are affected?

ms609 · 2024-08-22T14:31:58Z

#125 follows up on your suggestions by:

Implementing edge.cex and value.cex parameters within VisualizeMatching(), which support a FALSE value to suppress these labels entirely;
Invisibly returning the matching in VisualizeMatching()
Adding the scores of individually matched splits to the attributes of the returned matching object.

Thus you can review the contributions of individual matched splits with

matching <- VisualizeMatching(SharedPhylogeneticInfo, tree1, tree2)
attr(matching, "matchedScores")

I hope this helps you to make the best use of the function!
Thanks for your suggestions.

ammaraziz · 2024-08-27T08:53:51Z

Amazing thank you. Once it's merged I'll give it a go.

Feel free to close this issue.

ms609 mentioned this issue Aug 22, 2024

Control over VisualizeMatching() #125

Merged

ms609 closed this as completed in #125 Aug 28, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Plotting large (400 taxa) trees with VisualizeMatching #124

Plotting large (400 taxa) trees with VisualizeMatching #124

ammaraziz commented Aug 20, 2024

ms609 commented Aug 20, 2024

ammaraziz commented Aug 20, 2024 •

edited

Loading

ammaraziz commented Aug 20, 2024

ms609 commented Aug 22, 2024

ammaraziz commented Aug 27, 2024

Plotting large (400 taxa) trees with VisualizeMatching #124

Plotting large (400 taxa) trees with VisualizeMatching #124

Comments

ammaraziz commented Aug 20, 2024

ms609 commented Aug 20, 2024

ammaraziz commented Aug 20, 2024 • edited Loading

ammaraziz commented Aug 20, 2024

ms609 commented Aug 22, 2024

ammaraziz commented Aug 27, 2024

ammaraziz commented Aug 20, 2024 •

edited

Loading