-   [Quantifying in-host patterns](#chap:c14)
    -   [The experiments](#the-experiments)
    -   [Data](#data)
    -   [PCA of FIV data](#pca-of-fiv-data)
    -   [LDA of FIV data](#lda-of-fiv-data)
    -   [MANOVA of FIV day 59 data](#manova-of-fiv-day-59-data)
    -   [PCA of Mouse malaria](#sec:c16mal)
    -   [FDA of Mouse malaria](#fda-of-mouse-malaria)

Quantifying in-host patterns
============================

The experiments
---------------

In addition to the mouse malaria data discussed in sections \[sec:c6mal\] and \[sec:c13rep\]. We consider a co-infection study of FIV in cats . The experiment showed that disease in cats caused by infection with a virulent feline immunodeficiency viruses (FIV$_f$) can be attenuated by prior infection with strains of lower pathogenicity from cougars. The data are from twenty cats that were experimentally infected with two strains of FIV, the virulent house cat (Felix) strain (FIV$_f$) and a mild wild cougar (Puma) strain (FIV$_p$). On day 0, 10 cats were infected with FIV$_p$ and 10 were sham inoculated. On day 28 five cats from each group were inoculated with FIV$_f$ and the other ten cats were again sham inoculated. This resulted in four treatment groups: C (control, only sham inoculation), P (FIV$_p$ on day 0 and sham on day 28), F (sham on day 0, FIV$_f$ on day 28) and D (dual infection, FIV$_p$ on day 0 and FIV$_f$ on day 28). A variety of cytokines and cell counts that were thought to relate to protective immunity were measures approximately every 7 days. Details of the experiment can be found in .

Data
----

For the FIV analysis we focus on the multivariate measures on days 31 and 59 to create two data sets `Day31` and `Day59`, three and thirty days respectively, after the second treatment (FIV$_f$ infection). We strip some unnecessary columns 1, 14, 15, 16 that are extraneous or were not measured on these days, remove lines with missing values (using `na.omit`), and make sure that each row is labeled with the correct animal Id (using `dimnames`).

In [None]:
data(fiv)
Day31 = fiv[fiv$Day == 31, ]
dimnames(Day31)[[1]] = Day31$Id
Day31 = na.omit(Day31[, -c(1, 14, 15, 16)])
Day59 = fiv[fiv$Day == 59, ]
dimnames(Day59)[[1]] = Day59$Id
Day59 = na.omit(Day59[, -c(1, 14, 15, 16)])

For our malaria analysis we also strip some unnecessary columns 1, 3, 4, 7, 8 and 11 that are extraneous and focus on the red blood cell count (RBC).

In [None]:
data(SH9)
SH9RBC = SH9[, -c(1, 3, 4, 7, 8, 10, 11)]

In addition to the `long`-format used for the repeated measures analysis (section \[sec:c13rep\]), we need ‘wide’ formatted data (denoted ...w) for both the principal component analysis (PCA) and linear discriminant analysis (LDA) we will use to study the dynamics. We make the wide-formatted data using `reshape`. The `-seq(4,50,by=2)` strips extraneous columns generated by the `reshape`. The `names(...)[2]=Treatment` renames column 2.

In [None]:
SH9RBCw = reshape(SH9RBC, idvar = "Ind2",
     direction = "wide", timevar = "Day")
SH9RBCw = SH9RBCw[,-seq(4,50,by = 2)]
names(SH9RBCw)[2] = "Treatment"

PCA of FIV data
---------------

The FIV data has counts of various effector cells (lymphocytes, neutrophils, CD4, CD8B and CD25), virus load (provirus and overall viremia) and measurements on a number of cytokines (IFN$\gamma$, IL-4, IL-10, IL-12, TNF-$\alpha$). The goal of the experiment was to elucidate what immunological conditions best distinguished sever from attenuated infections. The `ade4`-package has refined statistical and graphical methods to explore multivariate patterns. According to the ‘French protocol’ , as implemented in the `ade4`-package, biplot-like decompositions are referred to as ‘duality-diagrams’ (because of the arrows and points); Thus the naming of `dudi.pca` for principal component analysis. We use the `dudi.pca` function to elaborate on the biplot. By providing an explicit ‘group’ annotation we can add group means as well as group ellipses (which reflect within-group variability) to the biplot using the `s.class` function. The `add.scatter.eig` function adds the eigenvalue histogram to the bottom right corner that shows the relative importance of each PCA axis(fig. \[fig:pca31\]).

In [None]:
require(ade4)
pca31=dudi.pca(Day31[,1:11], scannf = FALSE, nf = 5)
#select 5 axes
groups = Day31$Treatment
s.arrow(dfxy = pca31$co[,1:2]*8, ylim = c(-7,9), 
    sub = "Day 31", possub = "topleft", csub = 2)
s.class(dfxy = pca31$li[,1:2], fac = groups, cellipse = 2, 
    axesell = FALSE, cstar = 0 , col = c(2:5), add.plot = TRUE)
add.scatter.eig(pca31$eig, xax = 1, yax = 2, 
    posi = "bottomright")

On Day 59 patterns are starting to resolve and treatment units are starting to separate with FIV$_f$ infected cats having the lowest white blood cell counts (fig. \[fig:pca59\]).

In [None]:
pca59 = dudi.pca(Day59[,1:11], scannf = FALSE, nf = 5)
groups = Day59$Treatment
s.arrow(dfxy = pca59$co[,1:2]*8, ylim = c(-7,9), 
    sub = "Day 59",  possub = "topleft", csub = 2)
s.class(dfxy = pca59$li[,1:2], fac = groups, cellipse = 2, 
    axesell = FALSE, cstar = 0 , col = c(2:5), add.plot = TRUE)
add.scatter.eig(pca59$eig, xax = 1, yax = 2, 
    posi = "bottomright")

LDA of FIV data
---------------

In contrast to PCA which broadly explores the overall variability in the multivariate data, discriminant analysis explicitly considers ‘group membership’ (such as experimental treatment or other types of grouping) and asks what linear combination of response variables (a kin to the components in PCA) allow for the best discrimination among groups. The `MASS`-package has the `lda`-function to do such analysis. Since the variables are heterogeneous we normalize each prior to the analysis by applying the `scale` function to each of the first eleven columns of the data set.

In [None]:
require(MASS)
Day31sc = Day31
Day31sc[, 1:11] = apply(Day31[, 1:11], 2, scale)

The `lda`-function uses the group response formulation as its argument. The LDA plot depicts the discrimination among the groups along the discriminant axes (fig. \[fig:lda31\]).

In [None]:
lda31 = lda(Treatment ~ CD4 + CD8B + CD25 + FAS + 
     IFNg + IL_10 + IL_12 + lymphocyte + neutrophils +
     TNF_a, data = Day31sc)
plot(lda31)

Figure \[fig:lda31\] shows how discriminant axis 1 clearly discriminate between the Dual (D) / Cougar (P) vs the Control (C) / Feline (F) groups. Axis 2 separates the Dual (D) group from the Cougar (P) group. Axis 3 provides imperfect separation between the Control (C) group and the Feline (F) group. We can further check how the predicted LDA group assignments compare to the true treatment groupings:

In [None]:
pr = predict(lda31, method = "plug-in")$class
table(pr, Day31sc$Treatment)

For the most part the discrimination is good, but as figure \[fig:lda31\] suggests there is some difficulty in discriminating between the C and F group on day 31; There is one misclassification among the groups.

To see how the group-informed LDA ordination differs from the PCA we can represent the LDA analysis as a biplot (fig. \[fig:lda31b\]). (The first two lines in the below code calculates the coordinates of each cat along the first two LDA axes to be compatible with the `ADE4`-package). The discrimination is largely along LDA axis one.

In [None]:
ld1 = as.matrix(Day31sc[,attr(lda31$terms,
    "term.labels")])%*%matrix(lda31$scaling[,1], ncol = 1)
ld2 = as.matrix(Day31sc[,attr(lda31$terms,
    "term.labels")])%*%matrix(lda31$scaling[,2], ncol = 1)
groups = Day31$Treatment

contribs = lda31$svd/sum(lda31$svd)
s.arrow(dfxy = lda31$scaling[,1:2], sub = "Day 31", 
     possub = "topleft", csub = 2)
s.class(dfxy = cbind(ld1, ld2)*2.5, fac = groups, 
    cellipse = 2,  axesell = FALSE, cstar = 0, 
    col = c(2:5), add.plot = TRUE)
add.scatter.eig(contribs, xax = 1, yax = 2, 
    posi = "bottomright")

We repeat the analysis for the data from day 59 to see that discrimination among all four groups are very good by this time (fig. \[fig:lda59\]). The linear discriminant (LD) 1 separates treatments C from D/F and P and LD2 separates F from the other treatments.

In [None]:
Day59sc = Day59
Day59sc[,1:11] = apply(Day59[,1:11],2,scale)
lda59  =  lda(Treatment ~ CD4 + CD8B + CD25 + FAS + 
   IFNg + IL_10 + IL_12 + lymphocyte + neutrophils + 
   TNF_a, data = Day59sc)
pr = predict(lda59, method = "plug-in")$class
table(pr, Day59sc$Treatment)

ld1 = as.matrix(Day59sc[,attr(lda59$terms,
   "term.labels" )])%*%matrix(lda59$scaling[,1], ncol = 1)
ld2 = as.matrix(Day59sc[,attr(lda59$terms,
   "term.labels" )])%*%matrix(lda59$scaling[,2], ncol = 1)
groups = Day59$Treatment

contribs  =  lda59$svd/sum(lda59$svd)
s.arrow(dfxy = lda59$scaling[,1:2], sub = "Day 59", 
   possub = "topleft", csub = 2)
s.class(dfxy = cbind(ld1, ld2), fac = groups, cellipse = 2, 
   axesell = FALSE, cstar = 0 , col = c(2:5), add.plot = TRUE)
add.scatter.eig(contribs, xax = 1, yax = 2, 
   posi = "bottomright")

The severe disease (treatment `F`) is associated with reduction in counts of several cell types and modulation of the expression of various cytokines.

MANOVA of FIV day 59 data
-------------------------

In addition to the exploratory analysis provided by PCA and LDA we may also want to do a formal multivariate test between our treatment groups. The most traditional approach is through the use of multivariate analysis of variance (manova). The `manova`-function has many test options – The Hotelling $T^2$ is the multivariate version of the t-test. According to the R help pages, the Pillai-Bartlett statistic is recommended by and is the default. There are many assumptions involved (including multivariate normality).

In [None]:
options(width=50)
Y = cbind(Day59sc$CD4, Day59sc$CD8B, Day59sc$CD25, 
   Day59sc$FAS, Day59sc$IFNg, Day59sc$IL_10, 
   Day59sc$IL_12, Day59sc$lymphocyte, 
   Day59sc$neutrophils, Day59sc$TNF_a)
X = Day59$Treatment
mova59 = manova(Y~X)
summary(mova59, test = "Pillai")

PCA of Mouse malaria
--------------------

A preliminary PCA of the RBC time series reveals that thefate of the animals completely dominates the patterns since RBCs were scored as 0 after death (fig. \[fig:pcarbc\]).

In [None]:
require(ade4)
dead = ifelse(SH9RBCw[,27]==0, "dead", "alive")
pcaRBC = dudi.pca(SH9RBCw[,3:27], scale = FALSE, 
   scannf = FALSE, nf = 5)
s.arrow(dfxy = pcaRBC$co[,1:2]*3, xlim = c(-10, 10), 
   ylim = c(-5,5), sub = "RBC", possub = "topleft", csub = 2)
s.class(dfxy = pcaRBC$li[,1:2]*.3, fac = as.factor(dead), 
   cellipse = 2, axesell = FALSE, cstar = 0 , 
   col = c(2:7), add.plot = TRUE)
add.scatter.eig(pcaRBC$eig, xax = 1, yax = 2, 
   posi = "bottomright")

We therefore omit the 11 animals that died and redo the analysis (but note that these were non-random with respect to treatment; the dead were 7 CB, 2 AT, 1 BC, 0 AQ and 0 control) [1].

In [None]:
SH9RBCw2 = SH9RBCw[dead=="alive",]
groups = SH9RBCw2$Treatment
pcaRBC = dudi.pca(SH9RBCw2[,3:27], scale = FALSE, scannf  =  
   FALSE, nf  =  5)
s.arrow(dfxy = pcaRBC$co[,1:2]*3, xlim = c(-4,9), 
   ylim = c(-5,5), sub = "RBC", possub = "topleft", csub = 2)
s.class(dfxy = pcaRBC$li[,1:2]*.3, fac = groups, cellipse = 2, 
   axesell = FALSE, cstar = 0 , col = c(2:7), add.plot = TRUE)
add.scatter.eig(pcaRBC$eig, xax = 1, yax = 2, 
   posi = "bottomright")

All the arrows are pointing in the same direction for axis one (fig. \[fig:pcarbc2\]). This axis is therefore broadly a ‘means’ effect, meaning that individuals with more positive axis one scores tend overall to have more RBCs. Clearly the main driver of this variation is control versus treatment animals. There is further some level of separation among the treatment animals along axis two, with BC generally having negative values.

FDA of Mouse malaria
--------------------

We can get some deeper insights into the differences revealed by the PCA by considering how the mouse data is of a ’functional’ nature. That is, we can consider each of the time series of RBC counts as sampled along a curve through time. We can ask how each curve can be thought of as being generated by adding or subtracting underlying component curves. Generally speaking this multivariate approach is referred to as functional data analysis .

While specialized packages exists, we can treat our PCA as a simple FDA by considering the loadings along each axis to comprise a component time series (a so called ‘empirical orthogonal function’, EOF), and the score for each individual as a weight of how much of that EOF to add or subtract to reconstitute the data. Figure \[fig:fdarbc\] depicts the loadings of axis one and two as EOFs along the top row. The bottom row shows how adding or subtracting – corresponding to having positive or negative scores – these EOFs modulates the shape of the overall average curve among all experimental animals.

In [None]:
par(mfrow = c(1,2))
#Gets the experimental days
day = unique(SH9$Day)
#Calculate the average time series
avg = apply(SH9RBCw2[,3:27], 2, mean)
plot(day, avg, type = "b", ylim = range(SH9RBCw[,3:27]), 
   ylab  =  "RBC", xlab = "Day")
title("Mean +/- 1 SD eof 1")
lines(day, avg+1*pcaRBC$co[,1], col = 2, 
   type = "b", pch = "+")
lines(day, avg-1*pcaRBC$co[,1], col = 2, 
   type = "b", pch = "-")
plot(day, avg, type = "b", ylim = range(SH9RBCw[,3:27]), 
     ylab  =  "RBC", xlab = "Day")
title("Mean +/- 1 SD eof 2")
lines(day, avg+1*pcaRBC$co[,2], col = 2, 
   type = "b", pch = "+")
lines(day, avg-1*pcaRBC$co[,2], col = 2, 
   type = "b", pch = "-")

The analysis offers some interesting insights. As previously suggested, axis one measures the overall anemia. Animals with positive scores experience less anemia. Axis two, in contrast, is more interesting as it reveals that the second most important pattern broadly distinguishes between animals that have peak anemia before day 10 (negative scores; broadly comprised of individuals infected with the BC clone) *versus* the other more slowly progressing infections (positive scores) that have peak anemia around day 15. To confirm our interpretation we plot the actual time series for the 10 most extreme mice along EOF1 and EOF2 axes (fig. \[fig:fdarbc2\]).

In [None]:
par(mfrow = c(1,2))
so = order(pcaRBC$li[,1])
plot(day, t(SH9RBCw2[so[1],3:27]), type = "l", ylab = "RBC", 
     xlab = "Day")
for (i in 1:5) lines(day, t(SH9RBCw2[so[i],3:27]))
for (i in 36:41) lines(day, t(SH9RBCw2[so[i],3:27]), 
     col = 2, lty = 2)
so = order(pcaRBC$li[,2])
plot(day, t(SH9RBCw2[so[1],3:27]), type = "l", ylab = "RBC", 
     xlab = "Day")
for (i in 1:5) lines(day, t(SH9RBCw2[so[i],3:27]))
for (i in 36:41) lines(day, t(SH9RBCw2[so[i],3:27]), 
     col = 2, lty = 2)

[1] An approach that uses all available data would be to code dead RBCs as `NA`s and do a PCA with missing data using nonlinear iterative partial least-squares (`nipals`) as done by .