# Are the half-lives of small and large intestine macrophage subpopulations different and what are they?

Administration of tamoxifen allows translocation of the cre enzyme into the nucleus and irreversibly induces YFP expression by CX3CR1+ cells, strongly labeling CX3CR1+ gut macrophages. Adult animals 8–9 wk of age were given tamoxifen orally for five consecutive days to ensure robust and irreversible induction of YFP. Any cells developing from blood monocytes after withdrawal of tamoxifen will express the cre enzyme latently in the cytoplasm such that they remain YFP– (Shaw et al. 2018).

YFP+ blood monocytes are replaced rapidly by YFP- monocytes from the bone marrow.
As the resident YFP+ macrophages are depleted from the gut, they will be replaced by YFP- macrophages developing from YFP- blood monocytes. 

The proportion of cells of each subpopulation expressing YFP have been measured over 20 weeks. By fitting a model to their decay curves we can estimate the rate at which each is depleted from the gut, and by a simple transformation, estimate their half-lives.  

Starting with the small intestine, determine if the decay rates of the three subpopulations of macrophages differ significantly, estimate their decay rates (including 95% confidence intervals), and estimate their half-lives. 

The data are in the file `Data/yfp_SI.csv`.

### Import plotting and analysis modules

Make sure to run this code cell to import the necessary modules

In [None]:
import pandas as pd
import seaborn as sns
from numpy import log
from statsmodels.formula.api import ols

### Read in the YFP data for small intestine and print it to see what it looks like

There are five headers in the dataset:

- **intestine**: small or large intestine
- **cell_type**: cell type, one of CD4- Tim4-, CD4+ Tim4- and CD4+ Tim4+
- **week**: weeks post tamoxifen treatment
- **mouse**: mouse ID
- **yfp**: the proportion of that cell type expressing YFP

## Plot the data of all cell types

Next plot the untransformed data as line and scatter plots. This is a first check of the data to see what it looks like.

It's okay to copy, paste and adapt the code from the lecture notebook. Click [here](lecture.ipynb#now-compare-drugs-A-and-B) to go to the correct code cell.

## Eye-ball the data and think what it means

Before proceeding with the formal analysis, have a look at these data closely and think about what's happening. Here are some things to think about.

1. Which cell types are decaying faster and which slower?
2. Are some cell types decaying at the same rate?
3. Can you take a guess at the half-lives?
4. Why don't the curves all start at 1 (i.e., 100% of each cell type expressing YFP)?
5. Do you think these curves exhibit exponential decay?

## Do small intestine macrophages decay exponentially? 

The next thing to check is that the YFP+ macrophage subpopulations are actually decaying exponentially. 

Remember from the lecture that to do this you have to log-transform the response variable (i.e., the proportion of YFP+ cells) and plot against time.

Again, it's okay to copy, paste and adapt the code from the lecture notebook.

Does this look like exponential decay?

## Fit a linear model

Having convinced yourself that YFP+ macrophages decay exponentially, you need to fit a linear model to the data to:

1. Estimate decay rates
2. Test if cell types differ significantly in their decay rates

## Think about the output of the linear model

Make sure you understand the output of the linear model. Each row corresponds to an estimated model parameter and a statistical test (a *t*-test) of whether that estimate is significantly different from zero.



Column name | Description
:-- | :--
column 1 | The name of the parameter being estimated
coef | The estimated mean of that parameter
std err | The uncertainty in the estimated mean
t | The *t*-statistic ("coef" divided by "std err") 
P>t | The probability (*p*-value) of obtaining a *t*-statistic at least as extreme as observed assuming that the parameter is actually zero
[0.025 0.975] | The 95% confidence interval of the estimated mean

If the 95% CI contains zero, then the *p*-value will be greater than 0.05, and "Intercept" is not significantly different from zero.

On the other hand, if the 95% CI does not contain zero, then the *p*-value will be less than 0.05, and "Intercept" is significantly different from zero.

Note that standard error and 95% CI are both measures of uncertainty in the estimated parameter value. Either or both can be reported, but at least one should be to indicate how certain - or uncertain - you are in its value.

The model parameters describe the straight lines in the log-linear plot of log-proportion of YFP+ cells against time. You need two pieces of information to describe a straight line: the *y*-intercept where the line crosses the *y*-axis, and the slope (or gradient) of the line. There are three lines in your log-linear plot corresponding to your three cell types. Therefore, there are six parameters in total to estimate and therefore six rows in the output summary of the linear model. 

See if you can answer the following questions.

1. Which rows are *y*-intercept parameters?
2. Which rows are slope parameters?
3. What is the reference cell type?
4. Which rows correspond to the reference cell type?
5. Which rows are estimates of the differences between a cell type and the reference cell type?

## Check the fit of the model by looking at the residuals

Always check the residuals against the explanatory variables to make sure they do not show any structure. If they do you are probably fitting the wrong model.

What do you conclude about the model fit?

Residuals against week look okay. They are spread between -2 and +2 as we would expect and there is no major trend or deviation across weeks.

Residuals against cell type are okay for CD4- Tim4- and CD4+ Tim4- cells. However, the residuals for CD4+ Tim4+ have a narrower distribution between -1 and +1. This is probably because the proportion of these cells is close to zero causing the observations to bunch up. However, this is unlikely to overly affect the linear model fit and the parameter estimates. 

## Calculate the estimated decay rate and half-life for each macrophage subpopulation

Using the values in the model fit summary table, calculate the estimated decay rates and half-lives.

The linear model estimates the uncertainties in the decay rates - these are the standard errors and 95% CIs. Unfortunately, because half-life is the inverse of decay rate, uncertainty in half-life is not simple to calculate. So we won't do that here. 

> Hint: You may want to change the reference cell type in your linear model. This is so you can
>
>    1. get the correct standard errors for each macrophage subpopulation
>    2. get the simplest description of which decay rates differ significantly from other decay rates 

## Summarise your findings in words

# Decay rates and half-lives of macrophages in the large intestine

Now do the same analysis for large intestine macrophages. The data are in the file `Data/yfp_LI.csv`.

> Summarise your findings

# Does macrophage turnover occur at the same rate in the small and large intestine?

The total proportion of macrophages expressing YFP in the small and large intestines over time are in the file `Data/yfp_both.csv`. The CD4- Tim4-, CD4+ Tim4- and CD4+ Tim4+ subpopulations have been pooled to give a total.

Determine if the macrophage decay rates are significantly different between the small and large intestines.

There are four headers in the dataset:

- **intestine**: small or large intestine
- **week**: weeks post tamoxifen treatment
- **mouse**: mouse ID
- **yfp**: the proportion of macrophages expressing YFP

> Summarise your findings

---

# The maths of exponential decay (ignore if you want to)

### Why a plot of $\ln y$ against time $t$, is a straight line for exponential decay (and growth)

Exponential decline is described by the equation

$$y(t) = y_0 e^{-rt}$$

Where, for example, $y(t)$ is the proportion of YFP+ macrophages at time $t$, $y_0$ is the proportion of YFP+ macrophages at time $t=0$, and $r$ is the decay rate. Taking natural logs ($\ln=\log_e$) on both sides and using the laws of logs to simplify:

$$
\begin{align}
\ln y(t) &= \ln\left(y_0 e^{-rt}\right) \\
\ln y(t) &= \ln y_0 + \ln e^{-rt} \\
\ln y(t) &= \ln y_0 -rt \ln e \\
\ln y(t) &= \ln y_0 -rt \\
\end{align}
$$

Let's compare terms of this equation to the terms of an equation of a straight line

$$Y=c+mX$$

where $c$ is the *Y*-intercept (the value of $Y$ when $X=0$) and $m$ is the slope or gradient of the line. By comparing terms in this equation and equation (4), we can see that $Y=\ln y(t)$, $c=\ln y_0$, $m=-r$ and $X=t$.

So if YFP+ macrophages decay exponentially, plotting the log of YFP+ proportion against time will result in a straight line. The $y$-intercept of this line will be the log of YFP+ proportion at time zero, and the gradient of this line will be the negative rate of decay, i.e., $-r$.

### Why $\frac{\ln 2}{r}$ is half-life, $t_{1/2}$.

Take the equation of exponential decay:

$$y(t) = y_0 e^{-rt}$$

Let $t_{1/2}$ be the time it takes the proportion of YFP+ macrophages to halve, that is

$$\frac{y(t+t_{1/2})}{y(t)} = \frac{1}{2}$$

Substituting the equation for exponential decay into this equation gives

$$\frac{y_0 e^{-r(t+t_{1/2})}}{y_0 e^{-rt}} = \frac{1}{2}$$

Cancelling factors gives

$$e^{-rt_{1/2}} = \frac{1}{2}$$

Taking logs on both sides:

$$-rt_{1/2} = -\ln2$$

Therefore,

$$t_{1/2} = \frac{\ln2}{r}$$