# Specific leaf area and leaf endopolyploidy

James Seery (jseery@mail.uoguelph.ca)

Load packages for phylogenetic regression (caper) and function scripts.

In [None]:
tryCatch(    # To run caper, load this package dependency.
    library(mvtnorm),
    error = install.packages("mvtnorm", repos = "http://cran.utstat.utoronto.ca/"))
tryCatch(    # ... and this one too.
    library(ape),
    error = install.packages("ape", repos = "http://cran.utstat.utoronto.ca/"))
tryCatch(    # Now load caper package.
    library(caper),
    error = install.packages("caper", repos = "http://cran.utstat.utoronto.ca/"))

Load phylogenies: (1) The Angiosperm Phylogeny Group has family-level resolution; therefore, each family is a polytomy. (2) The Zanne phylogeny has species-level resolution, but does not have all species.

In [2]:
setwd("../../Raw_data")

apg.tree = read.tree("APG_phylo/Webb_ages_pruned_unrooted.nwk")
apg.tree = makeLabel(apg.tree) # Fix duplicate node names (because some are blank) error for comparative.data()
z.tree = read.tree("Zanne_phylo/Zanne_pruned.nwk")
z.tree = makeLabel(z.tree)

Loading required package: ape
Loading required package: MASS
Loading required package: mvtnorm


Load and compile relevant datasets: specific leaf area (SLA); (2) relative leaf water content (RWC); (3) leaf chlorophyll content (CC); (4) flow cytometry data on genome size and leaf endopolyploidy; and (5) growth form (GF). Endopolyploidy is represented in two ways: (1) leaf endoreduplication index (EI) and (2) mean leaf ploidy (MeanC). EXPORT THIS INTO FUNCTIONS HELD IN Raw_data

In [None]:
SLA_RWC = read.csv("SLA_RWC.csv")
CC = read.csv("CC.csv")
Flow_cytometry = read.csv("Flow_cytometry.csv")
GF = read.csv("GF.csv")
names(GF) = c("Species", "Lifespan", "Ann_per", "GH", "Wood_herb")

# Turn raw data into something sensible
SLA_RWC$SLA = SLA_RWC$LeafArea/(10000*SLA_RWC$DriedMass) # Standardize SLA to square metres per gram
SLA_RWC$RWC = (SLA_RWC$FreshMass - SLA_RWC$DriedMass)/SLA_RWC$DriedMass
Flow_cytometry$EI = rowMeans(cbind(Flow_cytometry$EI.FL2, Flow_cytometry$EI.FL3), na.rm=TRUE) # Get mean EI across the cytometers two detectors: FL2 and FL3
Flow_cytometry$MeanC = rowMeans(cbind(Flow_cytometry$MeanC.FL2, Flow_cytometry$MeanC.FL3), na.rm=TRUE)
GF$A_HP_WP = character(length(GF$Species))
GF$A_HP_WP[GF$Wood_herb == 'H'] = "HP"
GF$A_HP_WP[GF$Ann_per == 'A'] = "A" # This command must come after the former. The former command incorrectly assigns "HP" to annual species.
GF$A_HP_WP[GF$Wood_herb == 'W'] = "WP"

# Get species-level mean of each trait
mean.SLA_RWC = aggregate(data.frame(SLA = SLA_RWC$SLA, RWC = SLA_RWC$VcVw),
                         by=list(Species = SLA_RWC$Species), mean, na.rm=TRUE)
mean.CC = aggregate(list(CC = CC$CC),
                    by=list(Species = CC$Species), mean, na.rm=TRUE)
mean.Flow_cytometry = aggregate(data.frame(Genome.size = Flow_cytometry$Genome.size,
                                           EI = Flow_cytometry$EI
                                           MeanC = Flow_cytometry$MeanC
                                           Day = Flow_cytometry$Day),
                     by=list(Species = DNA$Species), mean, na.rm=TRUE)

# Merge into one data frame
data_partA = merge(mean.SLA_RWC,
                   list(Species = GF$Species, Wood_herb = GF$Wood_herb, A_HP_WP = GF$A_HP_WP),
                   by="Species", all.x=TRUE)
data_partB = merge(mean.Flow_cytometry,
                   mean.CC,
                   by="Species", all=TRUE)
data = merge(dataA,
             dataB,
             by="Species", all.x=TRUE)

write.csv(data, file="SLA+endopolyploidy.csv")

Subset data for (1) only herbaceous species and (2) only endopolyploid species.

In [None]:
data.herb = subset(data, data$Wood_herb == 'H')
data.endo = subset(data, data$EI >= 0.1)

Combine the data and phylogenies into a object of type comparative.data.

In [None]:
z.data = comparative.data(phy=z.tree, data=data, names.col="Species", vcv=TRUE) # vcv=TRUE will do some pre-processing for the regression by calculating the variance-covariance matrix
z.data.herb = comparative.data(phy=z.tree, data=data.herb, names.col="Species", vcv=TRUE)
z.data.endo = comparative.data(phy=z.tree, data=data.endo, names.col="Species", vcv=TRUE)
data$Species = tolower(data$Species))
apg.data = comparative.data(phy=apg.tree, data=data, names.col="Species", vcv=TRUE)
data.herb$Species = tolower(data.herb$Species))
apg.data.herb = comparative.data(phy=apg.tree, data=data.herb, names.col="Species", vcv=TRUE)
data.endo$Species = tolower(data.endo$Species))
apg.data.endo = comparative.data(phy=apg.tree, data=data.endo, names.col="Species", vcv=TRUE)

Decide how to transform each data column.

In [None]:
attach(data)
op = par(mfrow = c(2,2)) # Display 2x2 plots

hist(SLA, breaks=12)
qqnorm(SLA)
qqline(SLA)
hist(log(SLA), breaks=12)
qqnorm(log(SLA))
qqline(log(SLA))

par(op) # Reset plot display
detach(data)

In [None]:
hist(MeanC, breaks=12)
qqnorm(MeanC)
qqline(MeanC)
hist(log(MeanC), breaks=12)
qqnorm(log(MeanC))
qqline(log(MeanC))


In [None]:
hist(sqrt(MeanC), breaks=12) # worse
hist(exp(MeanC), breaks=12) # even worse
hist(1/MeanC, breaks=12) # Ughh
hist(1/sqrt(MeanC), breaks=12) # still crap
hist(MeanC^2, breaks=12) # worse than untransformed
hist(MeanC^(-2), breaks=12) # Better. crap like log transform though
hist(MeanC^(-3), breaks=12) # Similar. crap like log transform though
hist(MeanC^(-4), breaks=12) # Better. crap like log transform though
hist(MeanC^(-5), breaks=12) # Best?
qqnorm(MeanC^(-5))
qqline(MeanC^(-5)) # Nah. log is best. Maybe just use EI.

hist(EI, breaks=12)
qqnorm(EI)
hist(log(EI), breaks=12) # Great! !!!!!!!!!!!!!!!
qqnorm(log(EI))
qqline(log(EI)) # not perfect
hist(sqrt(EI), breaks=12) # worse
hist(exp(EI), breaks=12) # even worse
hist(1/EI, breaks=12) # Ughh
hist(1/sqrt(EI), breaks=12) # Good. Not as good as log
qqnorm(1/sqrt(EI))
qqline(1/sqrt(EI)) # Similar in quality to log?
hist(EI^2, breaks=12) # worse than untransformed
hist(EI^3, breaks=12) # Better. crap like log transform though
hist(EI^4, breaks=12) # Similar. crap like log transform though

hist(Genome.size, breaks=12)
qqnorm(Genome.size)
hist(log(Genome.size), breaks=12) # much better
qqnorm(log(Genome.size))
qqline(log(Genome.size)) # Not perfect but pretty good.
hist(sqrt(Genome.size), breaks=12) # Worse
hist(exp(Genome.size), breaks=12) # terrible
hist(1/Genome.size, breaks=12) # heavier tails than log
hist(1/sqrt(Genome.size), breaks=12) # neat! kinda multimodal though
qqnorm(1/sqrt(Genome.size))
qqline(1/sqrt(Genome.size)) # Even better than log. !!!!!!!!!
hist(Genome.size^2)
hist(Genome.size^3) # Terrible
hist(Genome.size^(-2))
hist(Genome.size^(-3)) # Also terrible

hist(VcVw, breaks=12)
qqnorm(VcVw)
hist(log(VcVw), breaks=12) # Tails are too heavy.... ok..
qqnorm(log(VcVw))
qqline(log(VcVw)) # Ok.
hist(sqrt(VcVw), breaks=12) # Worse
hist(exp(VcVw), breaks=12) # terrible
hist(1/VcVw, breaks=12) # quite bad
hist(1/sqrt(VcVw), breaks=12) # Bimodal... Try different growth forms !!!!!!!!!!!!!!!
hist(1/sqrt(VcVw[Gh2 == "H"]), breaks=12) # Unimodal
hist(1/sqrt(VcVw[Gh2 == "W"]), breaks=12) # Mostly unimodal
hist(1/sqrt(VcVw[Dur2 == "A"]), breaks=12) # Multimodal garbage
hist(1/sqrt(VcVw[Dur2 == "P"]), breaks=12) # Bimodal. Separate these into woody and herbaceous
hist(1/sqrt(VcVw[Dur2 == "P" & Gh2 == "H"]), breaks=12) # Unimodal
hist(1/sqrt(VcVw[Dur2 == "P" & Gh2 == "W"]), breaks=12) # Unimodal
# It seems that when an inverse square root transformation is applied to VcVw, there is a bimodality mostly caused by the woody/herbaceous growth habit.
# Try this with log too
hist(log(VcVw[Gh2 == "H"]), breaks=12) # Mostly unimodal but biased
hist(log(VcVw[Gh2 == "W"]), breaks=12) # Mostly unimodal but biased
# qqplots
qqnorm(1/sqrt(VcVw[Gh2 == "H"]))
qqline(1/sqrt(VcVw[Gh2 == "H"])) # Excellent
qqnorm(log(VcVw[Gh2 == "H"]))
qqline(log(VcVw[Gh2 == "H"])) # Not as good.
qqnorm(1/sqrt(VcVw[Gh2 == "W"]))
qqline(1/sqrt(VcVw[Gh2 == "W"])) # Ok. not great
qqnorm(log(VcVw[Gh2 == "W"]))
qqline(log(VcVw[Gh2 == "W"])) # Ok. not great
# Even though log works better for woody species, I'm going to choose inverse square root because it looks better for all species and herbaceous species which we have more of.
hist(VcVw^2)
hist(VcVw^3) # Terrible
hist(VcVw^(-2))
hist(VcVw^(-3)) # Also terrible

hist(Day)

plot(log(SLA) ~ log(MeanC))
plot(1/SLA ~ log(MeanC))
plot(1/sqrt(SLA) ~ log(MeanC)) # Log and fancy transform for SLA are similar really...

plot(log(SLA) ~ log(EI)) # Beauty! !!!!
plot(1/SLA ~ log(EI))
plot(1/sqrt(SLA) ~ log(EI)) # Fancy transform just as good

plot(log(SLA) ~ log(Genome.size)) # Best !!!!
plot(log(SLA) ~ 1/sqrt(Genome.size)) # Terrible

plot(log(VcVw) ~ log(EI))

plot(1/sqrt(VcVw) ~ log(EI)) # not better

# log(SLA)
# 1/SLA
# 1/sqrt(SLA)
# log(MeanC)
# log(EI)
# log(Genome.size)
# 1/sqrt(VcVw)