<h1>Dementia unfolding in the HRS cohort, 1992-2012</h1>

Alzheimer's Disease and related dementias (ADRD) are tricky, to put it mildly. Unlike high blood pressure and diabetes, ADRD has no established biomarker. A doctor's diagnosis is the gold standard, but obtaining a diagnosis requires considerable effort. In a survey, it is an open question whether a person with ADRD could report their status accurately. Sometimes proxy interviews, for example with spouses or children, are possible, but not always. 

Public surveys can inform us about ADRD through questions that probe cognition. Ken Langa and colleagues have pioneered a method that infers likely dementia &#8212; and cognitive impairment, often called CIND for "cognitive impairment, no dementia" &#8212; using questions asked in the U.S. Health and Retirement Study (HRS) and other surveillance instruments. [Edwards et al., 2020](https://alz-journals-onlinelibrary-wiley-com.libproxy.berkeley.edu/doi/full/10.1002/alz.12102) use the work of Langa et al., which appears as reference 23 in their paper.

Many aspects of ADRD remain mysterious. A question for research is the degree to which human companionship might be protective against the onset of ADRD. A natural challenge is that cognitive impairment could also be a factor that weakens social ties and potentially marital partnerships, creating reverse causality.

Below is an extract I have drawn from the HRS from wave 1 in 1992, with records linked to and wave 11 in 2012, 20 years later. I have conditioned on:

* Being in a couple household (married, spouse absent, or partnered) in wave 1
* Part of the original HRS cohort
* Present in wave 1 in 1992
* Aged between 50y and 59y in wave 1
* Present in wave 11 in 2012

It is also true that some of the people originally observed in 1992 but either deceased or in nonresponse by 2012 might have had ADRD. But how to adjust for that in a simple way is unclear and probably unnecessary. The results that emerge from studying these data are salient only for survivors, but that is a reasonable group to examine.  

Here are the variables in the dataset and their descriptions. Many are drawn from the RAND HRS file, where the naming convention is that the prefix `r1` for example means a measure for the respondent in wave 1. The cognitive functioning variables from Langa et al. (reference 23 in [Edwards et al., 2020](https://alz-journals-onlinelibrary-wiley-com.libproxy.berkeley.edu/doi/full/10.1002/alz.12102)) have the year at the end instead.

```
hhidpn           hhid + pn (numeric)
r11grandkids     R11 tot num grandkids, self/sp/dcsd-sp
r1mstat          r1mstat:w1 r marital status
r11mstat         r11mstat:w11 r marital status
h1cpl            h1cpl:w1 whether couple hhold
h11cpl           h11cpl:w11 whether couple hhold
ragender         ragender: r gender
rabyear          rabyear: r birth year
raedyrs          raedyrs: r years of education
r11agey_m        r11agey_m:w11 r age (years) at ivw midmon
r11demene        r11demene:w11 r ever reported dementia
h1atota          h1atota:w1 total of all assets--cross-wave
r1work           r1work:w1 r working for pay
h11child         h11child:w11 number of living children r/p
raraceth         Race/ethnicity: WNH, BNH, Hispanic
cogtot27_imp2012 2012: TICS-m 27-point scale
cogfunction2012  2012: Cognition Category:1=Normal, 2=CIND, 3=Demented
r11widowed       R11 widowed
cf2012_cd        cogfunction2012 registers CIND or dementia
cf2012_d         cogfunction2012 registers dementia
```

The "official" doctor diagnosis of ADRD is measured by the variable `r11demene`, which is based on self-reported diagnoses. ("Has a doctor ever told you that you have Alzheimer’s disease or dementia?") About 2.2% of this sample of 3,710 people reports an ADRD diagnosis.

The Langa et al. measure `cogtot27_imp2012` shows more than this, and the collapsed 3-category measure `cogfunction2012` shows 6.1% with likely ADRD and another 19.2% with likely CIND. These are also shown by the binary variables `cf2012_d` and `cf2012_cd`. 

<hr>

In [1]:
library(haven)

In [2]:
hrs92_12_cog <- read_dta("data/hrs92_12_cog.dta")

In [3]:
head(hrs92_12_cog)

hhidpn,r11grandkids,r1mstat,r11mstat,h1cpl,h11cpl,ragender,rabyear,raedyrs,r11agey_m,r11demene,h1atota,r1work,h11child,raraceth,cogtot27_imp2012,cogfunction2012,r11widowed,cf2012_cd,cf2012_d
<dbl>,<dbl>,<dbl+lbl>,<dbl+lbl>,<dbl+lbl>,<dbl+lbl>,<dbl+lbl>,<dbl>,<dbl+lbl>,<dbl>,<dbl+lbl>,<dbl>,<dbl+lbl>,<dbl>,<dbl+lbl>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>
3010,13,1,1,1,1,1,1936,12,76,NA(m),155000,1,5,1,7,2,0,1,0
3020,13,1,1,1,1,2,1938,16,73,0,155000,1,5,1,14,1,0,0,0
10013010,0,1,5,1,0,1,1938,12,74,0,104000,1,1,1,12,1,0,0,0
10038010,4,1,1,1,1,1,1936,16,76,0,1028000,1,2,1,19,1,0,0,0
10059020,6,1,1,1,1,2,1935,16,77,0,890000,0,3,1,17,1,0,0,0
10075020,2,1,7,1,0,2,1937,8,75,0,155100,1,1,3,13,1,1,0,0


<hr>

Here is a simple model of average rates of likely ADRD in 2012:

In [6]:
cog_reg1 <- lm(cf2012_cd ~ h11cpl 
               + ragender 
               + factor(raraceth)
               + factor(rabyear),
               data = hrs92_12_cog
              )
summary(cog_reg1)


Call:
lm(formula = cf2012_cd ~ h11cpl + ragender + factor(raraceth) + 
    factor(rabyear), data = hrs92_12_cog)

Residuals:
    Min      1Q  Median      3Q     Max 
-0.6515 -0.2398 -0.1600  0.3523  0.9153 

Coefficients:
                    Estimate Std. Error t value Pr(>|t|)    
(Intercept)          0.41874    0.04960   8.443  < 2e-16 ***
h11cpl              -0.09273    0.01582  -5.860 5.02e-09 ***
ragender            -0.03580    0.01410  -2.540  0.01113 *  
factor(raraceth)2    0.28902    0.02164  13.354  < 2e-16 ***
factor(raraceth)3    0.25329    0.02371  10.684  < 2e-16 ***
factor(raraceth)4    0.04994    0.04970   1.005  0.31507    
factor(rabyear)1933 -0.02026    0.04929  -0.411  0.68115    
factor(rabyear)1934 -0.02046    0.04816  -0.425  0.67105    
factor(rabyear)1935 -0.08002    0.04799  -1.667  0.09553 .  
factor(rabyear)1936 -0.08839    0.04729  -1.869  0.06169 .  
factor(rabyear)1937 -0.10729    0.04696  -2.285  0.02238 *  
factor(rabyear)1938 -0.12692    0.04707  -2.6

Above, the left-hand side variable is the prevalence of likely CIND or ADRD. The average rate of likely CIND or ADRD in the sample is about 25%:

In [8]:
mean(hrs92_12_cog$cf2012_cd)

The protective effect of being in a couple household (married, partnered) is pretty large, 9 percentage points or about a third of the prevalence.

Does controlling for years of education change anything? What about other variables?

<div style="text-align: right"> <span style="font-family:Papyrus; ">And they lived happily ever after. The End.</span></div>