Skip to content

Commit

Permalink
Working towards a knittable version of the vignette
Browse files Browse the repository at this point in the history
More changes to ensure that the vignette steps run and the Rmd file knits
on all computers. Commented out some code referencing local repo directories
and modified a few functions to help with export and dependencies. Also
updated species names in two files to make them compatible with the trees.
Remaining issue:
-plotTreeHighlightBranches prevents knitting. The function is set to export
in plottingFuncs and not within the if(F) loop. Still troubleshooting.
  • Loading branch information
Wynn Meyer committed Aug 9, 2018
1 parent 808ed21 commit 0f9dedc
Show file tree
Hide file tree
Showing 6 changed files with 39 additions and 67 deletions.
Binary file modified .DS_Store
Binary file not shown.
6 changes: 5 additions & 1 deletion R/RERfuncs.R
Expand Up @@ -216,7 +216,11 @@ naresidCPP=function(data, mod, weights=NULL){
out
}

#' @keywords internal
#' Provides names for paths/RERs representing terminal branches for plotting
#' Originally an internal function but necessary for the vignette/walk-through
#' @param masterTree The master tree used for analysis
#' @return Names corresponding to the paths/RERs for terminal branches
#' @export
namePathsWSpecies=function(masterTree){
mat=transformMat(masterTree)
n=length(masterTree$tip.label)
Expand Down
2 changes: 1 addition & 1 deletion R/plottingFuncs.R
@@ -1,6 +1,6 @@
#comment everything out

#require(dplyr)
require(dplyr)
require(ggplot2)

if(F){
Expand Down
Binary file modified data/logAdultWeightcm.RData
Binary file not shown.
1 change: 1 addition & 0 deletions inst/extdata/MarineTreeBinCommonNames_noCGM.txt
@@ -0,0 +1 @@
(((((((((((((Aardvark:0,(Tenrec:0,Elephant_shrew:0):0):0,(Elephant:0,Manatee:1):0):0,Armadillo:0):0,((Opossum:0,(Tasmanian_devil:0,Wallaby:0):0):0,Platypus:0):0):0,((((((Alpaca:0,Bactrian_camel:0):0,((((Tibetan_antelope:0,(Goat:0,Sheep:0):0):0,Cow:0):0,(Dolphin:1,Killer_whale:1):1):0,Pig:0):0):0,(Horse:0,Rhinoceros:0):0):0,((Brown_bat:0,(Microbat:0,Myotis_bat:0):0):0,(Flying_fox:0,Megabat:0):0):0):0,(Cat:0,(Dog:0,((Ferret:0,(Seal:1,Walrus:1):0):0,Panda:0):0):0):0):0,((Hedgehog:0,Shrew:0):0,Star_nosed_mole:0):0):0):0,(((((((Brush_tailed_rat:0,Chinchilla:0):0,Guinea_pig:0):0,Naked_mole_rat:0):0,Squirrel:0):0,((((Chinese_hamster:0,Golden_hamster:0):0,Vole:0):0,(Mouse:0,Rat:0):0):0,Jerboa:0):0):0,(Pika:0,Rabbit:0):0):0,Chinese_tree_shrew:0):0):0,Bushbaby:0):0,(Marmoset:0,Squirrel_monkey:0):0):0,((Baboon:0,(Crab_eating_macaque:0,Rhesus_macaque:0):0):0,Green_monkey:0):0):0,Gibbon:0):0,Orangutan:0):0,Gorilla:0):0,Chimp:0,Human:0);
97 changes: 32 additions & 65 deletions vignettes/RERconvergeAnalysisFullWalkthrough.Rmd
Expand Up @@ -110,20 +110,23 @@ Here is the basic method, with the recommended settings:
#if(!exists('AdultWeightLog')) {
# load(paste(rerpath,'/data/logAdultWeightcm.RData',sep='')) #The variable saved in this .RData file is 'AdultWeightLog.' #Check to verify that this loads as 'AdultWeightLog' and not as 'logAdultWeightcm'.
#}
data(RERconverge::logAdultWeightcm)
#data(RERconverge::logAdultWeightcm)
data("logAdultWeightcm") #this loads whereas the previous does not
mamRERw=RERconverge::getAllResiduals(toyTrees,useSpecies=names(logAdultWeightcm),
transform = "sqrt", weighted = T, scale = T)
```

The first output of this function tells you that the cutoff is set to 3.2e-05. This means that any branches shorter than this will be excluded from the analysis. It then prints out i= 1 ... 200, showing the progress as it estimates correlations for sets of gene trees.

The plots generated by this function show the heteroscedasticity in the original data (on the left) and the data after transformation and weighted regression (on the right). The x-axis displays bins of branch lengths on the tree, and the y-axis is the (log-scaled) variance in these branch lengths across trees. As you can see by comparing the right plot to the left plot, transforming and performing a weighted regression reduces the relationship between the mean branch length (x-axis) and the variance in branch length (y-axis).

Now that we have RERs, we can visualize these for any given gene using the `plotRers` function. Here is an example.

```{r fig1, fig.height=9,fig.width=10, warning = FALSE, cache = TRUE, tidy = FALSE}
if (!exists('plotTreeHighlightBranches')) {
repopath = '~/repos/RERconverge' #modify for other machines
suppressMessages(source(paste(repopath,'/R/plottingFuncs.R',sep='')))
}
#if (!exists('plotTreeHighlightBranches')) {
# repopath = '~/repos/RERconverge' #modify for other machines
# suppressMessages(source(paste(repopath,'/R/plottingFuncs.R',sep='')))
#}
#make average and gene tree plots
noneutherians <- c("Platypus","Wallaby","Tasmanian_devil","Opossum")
par(mfrow=c(1,2))
Expand Down Expand Up @@ -153,7 +156,7 @@ Now we will associate variation in these RERs with variation in a **binary trait
1) Provide a binary trait tree file. This should be a file in Newick format with branch lengths zero for background branches and one for foreground branches. An example is provided in *extdata/MarineTreeBinCommonNames.txt*. This tree must have the same topology as the master tree or a subset of the master tree.

```{r fig3, fig.height=9,fig.width=10, warning = FALSE, cache = TRUE}
marineb=read.tree(paste(rerpath,"/extdata/MarineTreeBinCommonNames.txt",sep=""))
marineb=read.tree(paste(rerpath,"/extdata/MarineTreeBinCommonNames_noCGM.txt",sep=""))
marinebrooted = root(marineb,outgroup=noneutherians)
mb1 = marineb
mb1$edge.length = c(rep(1,length(mb1$edge.length)))
Expand Down Expand Up @@ -225,15 +228,15 @@ Note that, in this tree, the branch representing the ancestor of the killer whal
<!-- Determine whether to include the following. It is based on the implementation
of the interactive tool to select foreground branches using RShiny. -->

3) Use the interactive branch selection tool (not currently a part of the main RERconverge package; email the developers if you wish to use it). The following should open a plot of the master tree. When the GUI opens, select the marine foreground branches (Walrus, Seal, Killer whale, Dolphin, Killer whale-Dolphin ancestor, and Manatee), and click 'End selection.'
3) Use the interactive branch selection tool (not currently a part of the main RERconverge package; email the developers if you wish to use it). *If you have it installed*, the following should open a plot of the master tree. When the GUI opens, select the marine foreground branches (Walrus, Seal, Killer whale, Dolphin, Killer whale-Dolphin ancestor, and Manatee), and click 'End selection.'

```{r, eval = FALSE}
marineb3=selectforegroundbranches(toyTrees$masterTree)
```

<!-- If including the interactive selection tool.... -->

<!-- You can double check that the tree you created by manual selection (3) matches the binary tree you provided in (1) (it should) using the following code: -->
You can double check that the tree you created by manual selection (3) matches the binary tree you provided in (1) (it should) using the following code:

```{r, eval = FALSE}
marineb3=selectforegroundbranches(toyTrees$masterTree)
Expand Down Expand Up @@ -316,22 +319,34 @@ head(logAdultWeightcm)
```


We must convert my trait vector to paths comparable to the paths in the RER matrix. To do that, I can use the function 'char2Paths' as shown here:
We must convert the trait vector to paths comparable to the paths in the RER matrix. To do that, we can use the function 'char2Paths' as shown here:

```{r}
if (!exists('char2Paths')) {
repopath = '~/repos/RERconverge' #modify for other machines
suppressMessages(source(paste(repopath,'/R/RERfuncs.R',sep='')))
}
#if (!exists('char2Paths')) {
# repopath = '~/repos/RERconverge' #modify for other machines
# suppressMessages(source(paste(repopath,'/R/RERfuncs.R',sep='')))
#}
charpaths=char2Paths(logAdultWeightcm, toyTrees)
```

I am using "metric diff", which means that branch lengths are assigned to my trait tree based on the difference in trait values on the nodes connected to that branch. The "char2Paths" function creates a paths vector with length equal to the number of columns in the RER matrix. The phylogenetic relationships represented in the "char2Paths" output are the same as those represented in the RER matrix.
We are using `metric diff`, which means that branch lengths are assigned to the trait tree based on the difference in trait values on the nodes connected to that branch.

The function tells us that there is one species in the master tree that is not present in the trait tree: the cape golden mole.

The `char2Paths` function creates a paths vector with length equal to the number of columns in the RER matrix. The phylogenetic relationships represented in the "char2Paths" output are the same as those represented in the RER matrix.

Finally, we can perform our ultimate analysis to find correlations between the rate of evolution of a genomic element (encoded in the RER matrix) and the rate of change of a phenotype (encoded in charpaths). The final output is the list of input genes with relevant statistics. As input, I provide the RER matrix and trait path. I set method="p" to use a Pearson correlation, which is the most appropriate option for continuous traits. I set min.pos=0 (min.pos for "minimum positive") so no foreground species are required; correlation statistics will be calculated even if all species have negative RER values. Finally, winsorize=3 pulls the outermost three points in all four directions to the center of the data before calculating correlations to mitigate the effects of outlier points.
Finally, we can perform our ultimate analysis to find correlations between the rate of evolution of a genomic element (encoded in the RER matrix) and the rate of change of a phenotype (encoded in charpaths) using `correlateWithContinuousPhenotype`. The final output is the list of input genes with relevant statistics. As input, we provide the RER matrix and trait path.
<!-- I set method="p" to use a Pearson correlation, which is the most appropriate option for continuous traits. I set min.pos=0 (min.pos for "minimum positive") so no foreground species are required; correlation statistics will be calculated even if all species have negative RER values. -->

<!-- getAllCor is now mostly an internal function called by the wrapper scripts correlateWithBinaryPhenotype and correlateWithContinuousPhenotype. The user has to make fewer choices about the parameters, since they are set to the defaults. -->
This function uses the following input variables (all the options set here are also the defaults):

* `min.sp`: the minimum number of species in the gene tree for that gene to be included in the analysis. The default is 10, but you may wish to modify it depending upon the number of species in your master tree.
* `winsorize`: pulls the outermost three (or whatever number you set) points in all four directions to the center of the data before calculating correlations to mitigate the effects of outlier points.

```{r}
res=RERconverge::getAllCor(mamRERw, charpaths, method = "p", min.pos = 0, winsorize = 3)
#res=RERconverge::getAllCor(mamRERw, charpaths, method = "p", min.pos = 0, winsorize = 3)
res=correlateWithContinuousPhenotype(mamRERw, charpaths, min.sp = 10, winsorize = 3)
head(res[order(res$P),])
```

Expand All @@ -351,55 +366,7 @@ y=mamRERw['TTN',]
# repopath = '~/repos/RERconverge' #modify for other machines
# suppressMessages(source(paste(repopath,'/R/RERfuncs.R',sep='')))
# }
getAncestors=function(tree, nodeN){
if(is.character(nodeN)){
nodeN=which(tree$tip.label==nodeN)
}
im=which(tree$edge[,2]==nodeN)
if(length(im)==0){
return()
}
else{
anc=tree$edge[im,1]
return(c(anc, getAncestors(tree, anc)))
}
}
transformMat=function(tree){
nA=length(tree$tip.label)+tree$Nnode
matIndex=matrix(nrow=nA, ncol=nA)
mat=matrix(nrow=nrow(tree$edge), ncol=0)
index=1
for ( i in 1:nA){
ia=getAncestors(tree,i)
if(length(ia)>0){
thisindex=double()
ia=c(i,ia)
for (k in 2:length(ia)){
j=ia[k]
thisindex=c(thisindex, which(tree$edge[,2]==ia[k-1]&tree$edge[,1]==ia[k]))
vals=rep(0, nrow(mat))
vals[thisindex]=1
mat=cbind(mat, vals)
}
}
}
mat
}
namePathsWSpecies=function(masterTree){
mat=transformMat(masterTree)
n=length(masterTree$tip.label)
#these are the tip edges in the master tree
iim=match(1:n, masterTree$edge[,2])
#each column in the mat is composed of at most one tip edge
tip.edge=apply(mat[iim,],2,function(x){if(max(x)>0){which(x==1)} else{NA}})
return(masterTree$tip[tip.edge])
}
pathnames=namePathsWSpecies(toyTrees$masterTree)
pathnames=namePathsWSpecies(toyTrees$masterTree) #what about the excluded species from useSpecies?
names(y)=pathnames
plot(x,y, cex.axis=1, cex.lab=1, cex.main=1, xlab="Weight Change", ylab="Evolutionary Rate", main="Gene ADSS Pearson Correlation",pch=19, cex=1, xlim=c(-2,2))
text(x,y, labels=names(y), pos=4)
Expand Down

0 comments on commit 0f9dedc

Please sign in to comment.