# Longevity phenotypes in mice of various ages

In this notebook we consider a [mouse study](https://www.nature.com/articles/s41467-022-34515-y) that examines how phenotypes change as mice age. 


## Data

The study data are available [here](https://data.mendeley.com/preview/ypz9zyc9rp?a=09b16f74-4581-48f7-94af-469e01757949), but you do not need to download the data directly from this link, as the script below will read a prepared version of the data.  You will need to download the data as a json file from [Github](https://github.com/kshedden/case_studies/tree/main/mouse_aging), and change the path below to point to the location of the file on your system.  

If you are curious about how the data were prepared, see the [pool.py](https://github.com/kshedden/case_studies/blob/main/mouse_aging/pool.py) Python script. Lists of all phenotypes with their abbreviations (used below) and brief descriptions of the phenotypes are in the supplementary data 2 file linked in the [paper](https://www.nature.com/articles/s41467-022-34515-y), direct link [here](https://static-content.springer.com/esm/art%3A10.1038%2Fs41467-022-34515-y/MediaObjects/41467_2022_34515_MOESM4_ESM.xlsx).

## Scientific aims

The overarching aim of this study is to understand how phenotypes change with age in mice, either naturalistically or following an intervention.  These are referred to as _age specific phenotypes_, or "ASPs".  The authors claim to perform "deep phenotyping", by which they mean that a large number of phenotypes (around two-hundred) are assessed. This is an open-ended exploratory study that does not test a specific pre-specified hypothesis.  The researchers considered all phenotypes in an "unbiased" manner for changes over the mouse lifespan, and in response to intervention.

This study has an [observational](https://en.wikipedia.org/wiki/Observational_study) component in which changes in some phenotypes occur naturalistically over the mouse lifespan.  It also includes an interventional component with three independent interventions: two genetic manipulations and a dietary intervention (calorie restriction).  

The authors used both univariate and multivariate methods in their study.  In univariate analyses, a single phenotype was considered in relation to age, and (if present) intervention group assignment. 

## Study design and analytic methods

This study considers phenotypic change over the mouse lifespan. Since some of the phenotypes of interest can only be assessed after sacrificing the mouse, it is implemented as a [cross sectional](https://en.wikipedia.org/wiki/Cross-sectional_study) rather than a [longitudinal](https://en.wikipedia.org/wiki/Longitudinal_study) study.  In the observational component of the study, the authors collected data at 6 distinct ages during the mouse lifespan (from 3-26 months), with around 15 independent mice assessed at each age. In the interventional component of the study, only two time points were considered.  In both cases, mice observed at different ages are mutually independent.  

The authors emphasize that at the first age (3 months) the mice are considered to be very young, and are not yet subject to any aging effects.  The effect of an intervention can be limited to the older ages, or alternatively can affect mice at all ages roughly equally.  Although both effects are interesting, the authors argue that the interventions that specifically impact older mice are more likely to translate to human therapies, or to reveal important mechanisms underlying aging.

All mice in this study are male.  Current NIH guidance strongly advocates for sex balanced designs (this study was conducted in Europe).

A large number of phenotypes are measured.  Some, such as heart rate, can be measured on a living mouse, while others such as organ weights can only be made upon sacrifice of the animal -- this is why the design is cross sectional rather than longitudinal (i.e. by making repeated measures on the same mice). The authors make a number of comments about their study design choices in the Discussion section of their paper.

An important practical consideration is whether the phenotypes approximately change linearly over time, or if the phenotype changes are more complex than can be described in linear form.  These two patterns of effects can be distinguished by considering models that are either additive, or that include a time by intervention group interaction.

We will use the following libraries in our analysis:

In [None]:
import pandas as pd
import json
import gzip
import re
import io
import seaborn as sns
import matplotlib.pyplot as plt
import numpy as np
from scipy import stats
import statsmodels.api as sm
from statsmodels.stats.anova import anova_lm
import string
import prince
from scipy.stats import distributions
from statsmodels.stats.multitest import local_fdr

The next cell loads the data from a compressed json "blob".  The data take the form of a dictionary that associates the figures of the paper with various data tables.

Each data table has two or three columns, giving the age and the recorded value for one phenotype, along with the intervention group assignment if an intervention was performed.

In [None]:
with gzip.open("mouse_data.json.gz") as gg:
    md = json.load(gg)

We can count the number of datasets present in each section of the paper.

In [None]:
{k: len(md[k]) for k in md.keys()}

Here is an example of one of the datasets (in text form):

In [None]:
md["Figure2_phenotypes"]["tibia_length"]

Here is what the data look like after extracting to a data frame:

In [None]:
pd.read_csv(io.StringIO(md["Figure2_phenotypes"]["tibia_length"]), sep=",").head(10)

The study data are stored as a large collection of small datasets.  The function below extracts one dataset and does some basic preprocessing of it.

In [None]:
def get_data(dset, vname, convert_age=True, standardize=False):
  """
  Extract the data for phenotype 'vname' from data collection 'dset'.  
  If 'convert_age' is True, the age is converted to numeric values.  If
  'standardize' is True, the phenotype values are standardized to zero 
  mean and unit standard deviation.
  """
  da = pd.read_csv(io.StringIO(md[dset][vname]), sep=",")
  # Standardize the variable names since they will appear in formulas
  vname1 = vname.replace(" ", "_")
  vname1 =  "n"+vname1 if re.match("^\d", vname1) is not None else vname1 
  vname1 = vname1.translate(str.maketrans('', '', string.punctuation.replace("_", "")))
  da = da.rename({vname: vname1}, axis=1)
  da[vname1] = pd.to_numeric(da[vname1], errors="coerce")
  if convert_age:
    da["age"] = pd.to_numeric(da["age"].str.replace("_mo", ""))
  if standardize:
    da[vname1] = (da[vname1] - da[vname1].mean()) / da[vname1].std()
  return vname1, da.dropna()

Here is what the data look like after pre-processing:

In [None]:
get_data("Figure2_phenotypes", "tibia_length")[1]

We also have the option to convert the measurements to Z-scores:

In [None]:
get_data("Figure2_phenotypes", "tibia_length", standardize=True)[1].head(10)

# Analysis of observational data

## Visualization

As a form of [exploratory data analysis](https://en.wikipedia.org/wiki/Exploratory_data_analysis), the next cell uses [boxplots](https://en.wikipedia.org/wiki/Box_plot) to visualize a few of the phenotypes from the "figure 2" set of experimental results (this is an observational component of the study, looking at associations between age and naturally-varying phenotypes).  We also print a frequency table for the number of observations (independent mice) at each age in each dataset.  

The boxplots reflect one of the primary research aims of the study, which is to identify age-related changes in phenotypes.  The median line in each box should follow the trajectory of age-related change for each phenotype, and the widths of the boxes should reflect the degree of dispersion around the central value.

In [None]:
for vname in ["Body_mass_NMR", "HR"]:
  _, da = get_data("Figure2_phenotypes", vname)
  c = da.groupby("age").count().sort_index()
  print(c)
  plt.figure()
  sns.boxplot(da, x="age", y=vname)

## Formal statistical analyses

In their paper, the authors claim that 59% of phenotypes have a [statistically significant](https://en.wikipedia.org/wiki/Statistical_significance) association with age.  The underlying analysis uses a nonparametric one-way ANOVA procedure ([Kruskal-Wallis](https://en.wikipedia.org/wiki/Kruskal%E2%80%93Wallis_test)) for (semi) quantitative phenotypes, and [Fisher's exact test](https://en.wikipedia.org/wiki/Fisher%27s_exact_test) for binary phenotypes.  Below we do a simplified version of this analysis, using for simplicity the Kruskal-Wallis ANOVA procedure for all phenotypes.  This analysis does not impose a mean structure model, such as assuming linear change of the phenotype with age.

The results are quite similar to those reported in the publication (57% versus 59%).

In [None]:
n0, n1 = 0, 0
nobs = []
for vname in md["Figure2_phenotypes"].keys():
  vname1, da = get_data("Figure2_phenotypes", vname)
  ga = da.groupby("age").groups
  dx = [da[vname1][g].values for g in ga.values()]
  kr = stats.kruskal(*dx)
  n0 += 1
  n1 += kr.pvalue < 0.05
  nobs.append(da.shape[0])
(n0, n1, n1/n0)

The graph below shows the distribution of the number of observations (across all ages) per phenotype.

In [None]:
nobs = np.asarray(nobs)
plt.hist(nobs)
plt.xlabel("Number of observations")
plt.ylabel("Number of phenotypes")
np.median(nobs)

We can also assess for relationships using Pearson correlation.  This approach is more sensitive to linear change but is completely unable to detect "U-shaped" relationships.  In this case the results of the Pearson correlation analysis are almost identical to the results of the "nonparametric" approach.

In [None]:
n0, n1 = 0, 0
for vname in md["Figure2_phenotypes"].keys():
  vname1, da = get_data("Figure2_phenotypes", vname)
  r = np.corrcoef(da["age"], da[vname1])[0,1]
  se = 1/np.sqrt(da.shape[0])
  n0 += 1
  n1 += np.abs(r) > 2*se
(n0, n1, n1/n0)

We can take a [large scale inference](https://www.cambridge.org/core/books/largescale-inference/A0B183B0080A92966497F12CE5D12589) perspective and consider how the Kruskal-Wallis and Pearson correlation approaches relate across the large collection of phenotypes considered here.

In [None]:
rr = []
for vname in md["Figure2_phenotypes"].keys():
  vname1, da = get_data("Figure2_phenotypes", vname)

  # Kruskal-Wallis analysis
  ga = da.groupby("age").groups
  dx = [da[vname1][g].values for g in ga.values()]
  kr = stats.kruskal(*dx)
    
  # Convert the Kruskal-Wallis statistic to a Z-score
  ng = len(ga.values()) # number of age groups
  zstat = distributions.norm().ppf(distributions.chi2(ng-1).cdf(kr.statistic))

  # Pearson correlation analysis
  r = np.corrcoef(da["age"], da[vname1])[0,1]
  se = 1/np.sqrt(da.shape[0])
    
  rr.append([vname1, kr.statistic, kr.pvalue, zstat, r/se])

rr = pd.DataFrame(rr, columns=["pheno", "KWstat", "KWpval", "KWZ", "PCZ"])

plt.grid(True)
sns.scatterplot(rr, x="KWZ", y="PCZ")

In [None]:
rr.head()

These are the phenotypes that have strong evidence for differences under the Kruskal-Wallis analysis and minimal evidence for differences under the Pearson correlation analysis.

In [None]:
ii = (rr.KWZ > 2) & (np.abs(rr.PCZ) < 1) 
rq = rr[ii]
for vname in rq["pheno"]:
    _, da = get_data("Figure2_phenotypes", vname)
    plt.figure()
    sns.boxplot(da, x="age", y=vname)

Although 107 (around 58%) of the phenotypes show age dependence based on having a p-value smaller than 0.05, this set potentially includes many false positives since we are testing multiple hypotheses.  The traditional way to address this is to control the Family-Wise Error Rate (FWER).  The Bonferroni method achieves this and is easy to use.  However it can be quite conservative when the data corresponding to different hypotheses are not independent.  As shown below, if we control the FWER, then less than half of the phenotypes remain "significant".

In [None]:
ntest = rr.shape[0]
print((rr["KWpval"] < 0.05).sum())
print((rr["KWpval"] < 0.05/ntest).sum())
print((ntest*rr["KWpval"] < 0.05).sum())

False Discovery Rates (FDR) are a popular method to assess the results of large numbers of hypothesis tests.  The FDR is calculated at a threshold t, and considers a hypothesis to be a "discovery" if its test statistic exceeds t (usually we do two-sided analysis, so we consider whether the magnitude of the test statistic exceeds t).  The FDR is the proportion of discoveries that are false, so the number of discoveries belongs in the denominator of the FDR, and the numerator of the FDR is an estimate of the number of false discoveries that would be made for a given number of tests m and a given test statistic threshold t.  If our test statistic is a Z-score and follows a standard normal distribution under the null, this numerator can be easily estimated (or bounded).

Below we calculate FDR values for a series of thresholds t.  These FDR values pertain to the null hypothesese that each phenotype is unrelated to age (i.e. follows the same distribution at each age).  We use as a conservative estimator the ratio of the expected number of false discoveries to the number of discoveries.  

In [None]:
for t in [1, 2, 3, 4, 5, 10]:
    n_expected = rr.shape[0]*(1 - stats.norm.cdf(t))
    n_observed = (rr["KWZ"] > t).sum()
    fdr = n_expected / n_observed
    print([t, n_expected, n_observed, fdr])

Efron devised the "local" false discovery rate which is the ratio of the density of observed statistics to its null density.  The local FDR is $p_0f_0(z)/f(z)$, where $p_0$ is the proportion of true null hypotheses, $f_0$ is the density of null Z-scores, and $f$ is the density of observed Z-scores.  A conservative estimate of the local FDR is obtained by setting $p_0=1$.  In the next cell we estimate $f$ using a histogram and assume that $f_0$ is standard normal.

In [None]:
plt.hist(rr["KWZ"], bins=20, density=True);
x = np.linspace(-3, 6, 100)
y = np.exp(-x**2/2)/np.sqrt(2*np.pi)
plt.plot(x, y, "-")
plt.xlabel("KW Z-score")
plt.ylabel("Density")

The actual local FDR approach uses "Lindsey's method" instead of a histogram to estimate $f$.  There is also an "empirical null" strategy that allows the data to inform the value of $f_0$, but we do not consider that further here.  Below we see that under the standard of local FDR, we can be confident that 63 of the phenotypes are age-specific.  This is modestly more than under FWER control.

In [None]:
lfdr = local_fdr(rr["KWZ"])
np.sum(lfdr < 0.05)

The following plot shows the relationship between the Kruskal-Wallis Z-score and the local FDR, which is specific to this dataset.

In [None]:
plt.plot(rr["KWZ"], lfdr, "o")
plt.xlabel("KW Z-score")
plt.ylabel("Local FDR")

## Analysis of interventional data

Now we turn to diet, which is one of the interventional components of the study.  Mice were on either a calorie-restricted or regular diet.  In these experiments, there were two age groups denoted "young" and "old".

We begin by using boxplots to explore the distributions for a few phenotypes.

In [None]:
for vname in ["ST", "Body_mass_NMR", "HR"]:
    vname1, da = get_data("Figure5_phenotypes", vname, convert_age=False, standardize=False)
    da["group"] = ["%s:%s" % (a, b) for a,b in zip(da["age"], da["diet"])]
    plt.figure()
    sns.boxplot(da, x="group", y=vname1)

For more formal analysis, we can use [regression](https://en.wikipedia.org/wiki/Regression_analysis) and [analysis of variance](https://en.wikipedia.org/wiki/Analysis_of_variance) to consider the combined association of age and diet type in relation to each phenotype.  Here we use [ordinary least squares (OLS)](https://en.wikipedia.org/wiki/Ordinary_least_squares) to fit the models since we want effect estimates and not only significance levels.  We fit a "saturated" model for each phenotype, which has an intercept, main effects for age and diet, and an [interaction](https://en.wikipedia.org/wiki/Interaction_(statistics)) between age and diet.  

In interpreting the results of the analysis, we consider the evidence for additive and non-additive associations between the intervention variable (diet) and the outcome phenotype.  Age is either a [nuisance variable](https://en.wikipedia.org/wiki/Nuisance_variable) or a [moderator](https://en.wikipedia.org/wiki/Moderation_(statistics)) (also known as a _modifier_).  

Since age and diet happen to both be binary here, this is a _two-way ANOVA_, specifically a _2x2 layout_.  The additive model can be parameterized in three degrees of freedom and the non-additive model has four degrees of freedom.  We focus on two [Wald tests](https://en.wikipedia.org/wiki/Wald_test) - one testing for an additive contribution of diet, and one testing for non-additivity of the age and diet effects.  We have minimal interest in the main effect of age so do not formally test that effect below.

There are over 200 phenotypes so we first calculate all of the relevant statistics and store them in a dataframe.

In [None]:
anovas, models, vnames, prev = {}, {}, [], []
for vname in md["Figure5_phenotypes"].keys():
    vname1, da = get_data("Figure5_phenotypes", vname, convert_age=False, standardize=True)
    mm = sm.OLS.from_formula("%s ~ age * diet" % vname1, da)
    rr = mm.fit()
    mx = rr.model.exog
    _,s,_ = np.linalg.svd(mx,0)
    if s[0]/s[-1] > 100000:
        print("Skipping %s" % vname)
        continue
    aa = anova_lm(rr)
    anovas[vname1] = aa
    models[vname1] = rr
    vnames.append(vname1)
    prev.append([(da["age"] == "young").mean(), (da["diet"]=="restricted").mean()])

In [None]:
[x for x in all_data.keys() if "mass" in x.lower()]
print(all_data["Body_mass_NMR"])
print(all_data["HR"])

Here is the result of one of the model fits above, specifically that for the heart rate (HR) phenotype.  The notation "T.young" indicates that this is the coefficient estimate for the indicator of a mouse being young, which (implicitly) tells us that the reference level for the age variable is "old". Similarly, the diet coefficient corresponds to the "restricted" diet and the reference level is "control" (an _ad lib_ diet).

In [None]:
models["HR"].summary()

The next cell organizes all of the regression findings into a dataframe.  This dataframe has one row for each phenotype.

In [None]:
zscores = [models[v].params / models[v].bse for v in vnames]
young_main = [models[v].params["age[T.young]"] for v in vnames]
young_main_z = [z["age[T.young]"] for z in zscores]
young_main_se = [models[v].bse["age[T.young]"] for v in vnames]
restricted_main = [models[v].params["diet[T.restricted]"] for v in vnames]
restricted_main_z = [z["diet[T.restricted]"] for z in zscores]
restricted_main_se = [models[v].bse["diet[T.restricted]"] for v in vnames]
interaction = [models[v].params["age[T.young]:diet[T.restricted]"] for v in vnames]
interaction_z = [z["age[T.young]:diet[T.restricted]"] for z in zscores]
interaction_se = [models[v].bse["age[T.young]:diet[T.restricted]"] for v in vnames]
nobs = [models[v].nobs for v in vnames]
prev = np.asarray(prev)

effects = pd.DataFrame({"Variable": vnames,
                        "nobs": nobs,
                        "young_main": young_main, 
                        "young_main_z": young_main_z,
                        "young_main_se": young_main_se,
                        "restricted_main": restricted_main, 
                        "restricted_main_z": restricted_main_z,
                        "restricted_main_se": restricted_main_se,
                        "interaction": interaction, 
                        "interaction_z": interaction_z,
                        "interaction_se": interaction_se,
                        "age_sd": np.sqrt(prev[:, 0] * (1 - prev[:, 0])),
                        "diet_sd": np.sqrt(prev[:, 1] * (1 - prev[:, 1]))})

# Standardized effects
effects["young_main_s"] = effects["young_main"] * effects["age_sd"]
effects["restricted_main_s"] = effects["restricted_main"] * effects["diet_sd"]
effects["restricted_main_s_se"] = effects["restricted_main_se"] * effects["diet_sd"]
effects["interaction_s"] = effects["interaction"] * effects["age_sd"] * effects["diet_sd"]
effects["interaction_s_se"] = effects["interaction_se"] * effects["age_sd"] * effects["diet_sd"]

Map the t-scores to z-scores.

In [None]:
for v in ["young_main_z", "restricted_main_z", "interaction_z"]:
    dof = effects["nobs"] - 4
    effects[v] = stats.distributions.norm().ppf(stats.distributions.t(df=dof).cdf(effects[v]))

In [None]:
effects

### Visualization of Z-scores

The function below creates a dot-plot of the statistically standardized effects for each phenotype, considering one particular term in the model (i.e. one of the two main effects, or the interaction).  These "statistically standardized effects" are point estimates divided by the corresponding standard errors.  

The vertical lines correspond to commonly-applied decision thresholds for Z-scores.  The dashed lines correspond to a "nominal" threshold corresponding to an unadjusted p-value being less than 0.05. The dotted lines correspond to thesholds corrected for multiple hypothesis testing using the [Bonferroni method](https://en.wikipedia.org/wiki/Bonferroni_correction), which controls the family-wise coverage rate.

In [None]:
def gen_anova_z_plot(term):

  # Set up the plot
  plt.figure().set_figheight(30)
  plt.yticks(fontsize=6);
  plt.axvline(x=0, color="grey")

  sns.stripplot(data=effects.sort_values(term), y="Variable", x=term)

  # Multiplier for confidence intervals with 95% simultaneous (family-wise) coverage. 
  t = stats.distributions.norm().ppf(1 - 0.025/effects.shape[0])
    
  plt.axvline(x=-t, color="grey", ls="dotted")
  plt.axvline(x=t, color="grey", ls="dotted")
    
  # 95% CI's (no consideration of multiple coverage)
  plt.axvline(x=-2, color="grey", ls="dashed")
  plt.axvline(x=2, color="grey", ls="dashed")

The plot below shows the Z-scores for the diet main effect with respect to each phenotype.  Since there is an interaction term in the model, these effects correspond to the difference between diet groups (restricted minus control) when fixing age at "old" (the reference level of age).

In [None]:
gen_anova_z_plot("restricted_main_z")

The plot below shows the estimated interaction effects for each phenotype.  These interaction terms can be interpreted as the difference between the diet effect (restricted minus control) in the old age group and the same difference in the young age group.  That is, it is a "difference of differences".  Note that none of them reach the threshold required for [family-wise](https://en.wikipedia.org/wiki/Family-wise_error_rate) error control.  This is consistent with the authors' claims that interventions seldom change the rate of change of phenotypes with respect to aging, although some of the phenotypes do change their overall level in response to the intervention.

In [None]:
gen_anova_z_plot("interaction_z")

### Visualization of standardized effects

Next we visualize the same effects using a different approach.  A "standardized effect" refers here to the effect of one covariate relative to its standard deviation (above we considered Z-scores which are the effects relative to their [standard error](https://en.wikipedia.org/wiki/Standard_error)). Standardized effects are a measure of the [size of an effect](https://en.wikipedia.org/wiki/Effect_size), not its statistical significance.  In order to convey information about statistical significance in these plots, we also include 95% [confidence intervals](https://en.wikipedia.org/wiki/Confidence_interval) (adjusted for multiple comparisons) for each standardized effect.  In the plots below, the orange dots are the point estimates and the blue dots are the lower and upper limits of 95% confidence intervals.

Z-scores are dimension free, meaning that their values are unaffected by rescaling either the independent of dependent variables.  Standardized effects are dimension-free with respect to rescaling of the covariate but not with respect to the outcome.  However here we have Z-scored the outcomes so the standardized effects are also dimension free.  In these data, the age covariate is quantitative, but it can only take on two distinct values.  The diet variable is categorical, but has been dummy-coded.  We can present standardized effects for covariates that are binary indicators, but it is important to remember that the standardization is with respect to the frequency of the categories, not with respect a quantitatively measured value.

In [None]:
def gen_anova_seffect_plot(term):
  plt.figure().set_figheight(30)
  ee = effects.copy()
  t = stats.distributions.norm().ppf(1 - 0.025/effects.shape[0])
  ee["lcb"] = ee[term] - t*ee["%s_se" % term]
  ee["ucb"] = ee[term] + t*ee["%s_se" % term]
  ee = ee[["Variable", "lcb", "ucb", term]]
  ee = ee.set_index("Variable").stack().reset_index()
  ee.columns = ["Variable", "hue", term]
  ee["hue"] = ee["hue"].replace({"lcb": "cb", "ucb": "cb"})
  ee = ee.sort_values(term)
  sns.stripplot(data=ee, y="Variable", x=term, hue="hue")
  plt.axvline(x=0, color="grey")
  plt.gca().legend().set_visible(False)

First we plot the standardized main effects and their 95% confidence intervals.  These are the main effects (or "simple effects") for diet (restricted minus control) in the older mice.  The heart rate, HDL, cholesterol, etc. are lower in mice on restricted diets than mice on ad lib diets.

In [None]:
gen_anova_seffect_plot("restricted_main_s")

Next we look at standardized "difference of differences", none of which are significantly different from zero.

In [None]:
gen_anova_seffect_plot("interaction_s")

## Multivariate analyses

Next we use principal component analysis and biplots to undestand the joint behavior of the phenotypes in aging mice.

First we create a function that creates a single dataframe containing all phenotypes.  These are the phenotypes from the observational component of the study that are shown in figure 2 of the paper.  Since there are repeated observations (multiple independent mice) for each phenotype at each age, we summarize the repeated values here, using the mean and the standard deviation.

In [None]:
def get_all_data(dset):
    ages, pheno_mean, pheno_sd = [], [], []
    kyx = md[dset].keys()
    for k in kyx:
        vname, da = get_data(dset, k, standardize=True)
        da = da.sort_values("age")
        dg = da.groupby("age")[da.columns[1]].agg([np.mean, np.std])
        ages.append(dg.index)
        pheno_mean.append(dg["mean"].values)
        pheno_sd.append(dg["std"].values)
    ages = ages[0]
    d1 = np.vstack(pheno_mean)
    d2 = np.vstack(pheno_sd)
    d1 = pd.DataFrame(d1, index=kyx, columns=["%d_mn" % a for a in ages])
    d2 = pd.DataFrame(d2, index=kyx, columns=["%d_sd" % a for a in ages])
    return pd.concat((d1, d2), axis=1)

Below is an analogous function for the interventional data:

In [None]:
def get_all_data_intervention(dset):
    ages, diets, pheno_mean, pheno_sd = [], [], [], []
    kyx = md[dset].keys()
    for k in kyx:
        vname, da = get_data(dset, k, convert_age=False, standardize=True)
        da = da.sort_values("age")
        dg = da.groupby(["age", "diet"])[da.columns[2]].agg([np.mean, np.std])
        dg = dg.reset_index()
        dg = dg.sort_values(["age", "diet"])
        ages.append(dg["age"])
        diets.append(dg["diet"])
        pheno_mean.append(dg["mean"].values)
        pheno_sd.append(dg["std"].values)
    ages = ages[0]
    diets = diets[0]
    d1 = np.vstack(pheno_mean)
    d2 = np.vstack(pheno_sd)
    cols = ["%s_%s_mn" % (x, y) for (x, y) in zip(ages, diets)]
    d1 = pd.DataFrame(d1, index=kyx, columns=cols)
    cols = ["%s_%s_sd" % (x, y) for (x, y) in zip(ages, diets)]
    d2 = pd.DataFrame(d2, index=kyx, columns=cols)
    return pd.concat((d1, d2), axis=1)

Below are the aggregated summary statistics for the interventional study.

In [None]:
dci = get_all_data_intervention("Figure5_phenotypes")
dci.head()

Next we use the function constructed above to obtain a dataframe.

In [None]:
dset = "Figure2_phenotypes"
dc = get_all_data(dset)

# Remove a few phenotypes for which the SD cannot be computed (due to n=1)
dc = dc.loc[pd.notnull(dc).all(1), :]

The top few rows of the dataframe are shown below.  Columns ending in "mn" contain means and columns ending in "sd" contain standard deviations.

In [None]:
dc.head()

Below we create a biplot of the phenotypes means.  This is an interactive plot so you can use the mouse to identify the ages (green squares) and phenotypes (blue points).

In [None]:
dcmn = dc.loc[:, [not("sd" in x) for x in dc.columns]]
pca = prince.PCA(n_components=2)
pca = pca.fit(dcmn)
pca.plot(dcmn)

Unsurprisingly, the mouse ages (green squares) are arranged in accordance with their numeric values.  In this case, the younger ages (3-8) are on the right side of the plot, the older ages (20-26) are on the left side of the plot, and the middle age (14) is at the top of the plot.

Two phenotypes that fall close together will generally have similar values.  Note that the points "DisTTot" and "WholeAverSpeed" fall near each other in the cell (2, 3) x (1, 2).  Note that their values below are quite similar.

In [None]:
dcmn.loc[["DisTTot", "WholeAverSpeed"]]

Now we can identify some points that fall close to the 3 months and 5 month points, such as NK and ST110.  These variables peak in the early ages.

In [None]:
dcmn.loc[["NK", "ST110"]]

Phenotypes near the top of the plot peak during middle ages:

In [None]:
dcmn.loc[["Fat_mass_NMR", "B2"]]

Phenotypes that fall on the left side of the plot peak during the later ages.

In [None]:
dcmn.loc[["CD8", "PLT", "Kidney_weight"]]

Above we only considered the mean value of each phenotype within each age group.  We can also consider the dispersion, as measured by the standard deviation (SD).

In [None]:
pca = prince.PCA(n_components=2)
pca = pca.fit(dc)
pca.plot(dc)

Four phenotypes whose scores fall in the upper right corner of the plot have distinct age-specific patterns of mean levels (peaking at 20 months), and distinct patterns of variation (also peaking at 20 months).

In [None]:
dc.loc[["Ekrea", "AP", "conc_IL6"]]