Vitamin A Deficiency: Start Vivarium modeling section #149

NathanielBlairStahn · 2020-03-05T19:00:27Z

It would be great to get some feedback about the following:

Is my description of the "propensity model" accurate? What about the different terms I used to refer to it ("exposure model" and "prevalence-only model")?
Does my rationale for using this model sound accurate?
What needs to go in the data tables and what should the table format be?

beatrixkh · 2020-03-05T19:12:15Z

docs/source/gbd2017_models/risk_attributable_causes/vitamin_a_deficiency/index.rst

+the population with serum retinol concentration <0·7 μmol/L. Like iron
+deficiency, the cause VAD is also a population attributable fraction (PAF) of 1
+cause with the VAD risk factor. That is, 100% of the VAD cases are attributable
+to the VAD risk factor. VAD Risk exposure and VAD cause prevalence data are the


Does this mean that all data that goes into the GBD exposure model is the same as the data that goes into the GBD prevalence model, or that everyone with VAD exposure also has the VAD cause? (i.e., is this about GBD model inputs or outputs?)

Definitely address. PAF of one can mean many distinct things.

This means the GBD outputs are the same. There is only one VAD model, not separate models for the risk and cause. I can add some wording to clarify this.

beatrixkh · 2020-03-05T21:53:28Z

docs/source/gbd2017_models/risk_attributable_causes/vitamin_a_deficiency/index.rst

+of vitamin A deficiency at each time step to determine whether the simulant has
+VAD during that time step. Each simulant's propensity is assigned only once, but
+the underlying prevalence distribution can change throughout the course of the
+simulation, which may result in a change in the simulant's vitamin A status.


Even if the underlying prevalence distribution stayed the same over the course of the simulation, why wouldn't the simulants' VAD statuses randomly hop around, if there were multiple "solutions" to fit one set of propensities and a prevalence distribution? (Or do we not care if this happens, because on average the population looks the same?)

oh i just read your algorithm; nevermind!

collijk · 2020-03-05T22:09:55Z

docs/source/gbd2017_models/risk_attributable_causes/vitamin_a_deficiency/index.rst

+
+In Global Burden of Disease (GBD) 2017, VAD exposure definition is proportion of
+the population with serum retinol concentration <0·7 μmol/L. Like iron
+deficiency, the cause VAD is also a population attributable fraction (PAF) of 1


I would not bring iron deficiency into this model since iron is a more complex example of a paf of 1 relationship and looking there is not going to help people understand this.

Ok, I can edit this.

aflaxman

V&V: this model should get prevalence and YLDs right (meaning the prevalence and YLDs in the sim should match that in GBD). It will not necessarily get incidence and remission right, and we should take a look at how wrong it is, to note in the limitations section.

Limitations: In addition to probably not getting incidence and remission of VAD right, this model has a particular implication about who does not get VAD. GBD has estimated that the prevalence of VAD is around 30% and the duration until remission is around 1 year. GBD has not estimated what fraction of the population will have ever had VAD over a time longer than a year, however. Will most kids have experienced VAD by the time they are five? Or are the same 60% cycling in and out of VAD to mantain the 30% prevalence and 1 year duration? Probably something in between these extremes, but we have no data on yet, and we don't have guidance from GBD about how to do it. So it is hard to even know how wrong our model is when we don't get remission right, let alone how much it matters for quantifying the impact of LSFF.

collijk · 2020-03-07T07:26:52Z

docs/source/gbd2017_models/risk_attributable_causes/vitamin_a_deficiency/index.rst

+  * - Measure
+    - ID
+    - Data source
+  * - Remission


Incidence and remission are measures for the cause.

I thought we decided not to use them at all.

Adding a todo to address this.

collijk · 2020-03-07T07:29:15Z

docs/source/gbd2017_models/risk_attributable_causes/vitamin_a_deficiency/index.rst

 :ref:`lower respiratory infections <2017_cause_lower_respiratory_infections>`,
-:ref:`diarrhoeal diseases <2017_cause_diarrhea>`, :ref:`measles
+:ref:`diarrhoeal diseases <2017_cause_diarrhea>`, and :ref:`measles
 <2017_cause_measles>`. The relative risks for these causes appear in Table 4 on


I think I would put a big note here that LRI is dropped in the most recent round of GBD due to insufficient causal criteria and mention it again in your limitations. It's worth pointing out to the client.

Adding a note about this.

collijk · 2020-03-07T07:31:17Z

docs/source/gbd2017_models/risk_attributable_causes/vitamin_a_deficiency/index.rst

+  :widths: 15 13 15 15
+  :header-rows: 1
+
+  * - Cause


You're welcome to include the actual data means and uncertainties here, but unless you want to do something differently that use the draw level rrs from GBD, put the rei_id and cause_id associated with the risk outcome pair you're using in the table.

Adding a todo to reformat the RR table.

collijk · 2020-03-07T07:34:51Z

docs/source/gbd2017_models/risk_attributable_causes/vitamin_a_deficiency/index.rst

+    - 3.65 (2.23 - 5.97)
+    - No (only one study)
+
+The above relative risks for GBD 2017 should be interpreted as rate ratios for


This is good. I think you should have the interpretation as a column in the table though. A single risk factor may have different kinds of effects on different outcomes and should be specified pairwise. Also include the numerator and denominator or a link to what rate ratio means (I think we made a glossary in the documentation).

Adding a todo to reformat the RR table.

collijk · 2020-03-07T07:37:29Z

docs/source/gbd2017_models/risk_attributable_causes/vitamin_a_deficiency/index.rst

 Vivarium Modeling Strategy
 --------------------------

+We will use an **exposure model** (or **prevalence-only model** or **propensity


I think exposure model (vs. cause model) is an appropriate way to refer to this. We should put a description of this in the general risk documentation.

Ok, this brings up another point I wanted to check in about: Does this mean the VAD documentation should not include a cause model diagram because the underlying algorithm is different from a standard cause model? Or is a cause model diagram still useful?

I think a simple SIS diagram actually still makes sense in terms of representing what states a simulant can be in (I = has VAD / S = does not have VAD) and what transitions are possible (incidence = simulant develops VAD due to change in exposure distribution / remission = simulant recovers from VAD due to change in exposure distribution).

Is it useful to have a diagram to visualize this, or would that be confusing because it's not a rate-based model?

I think a diagram with risk categories that map into disease states makes sense. I would hesitate to put transitions on the diagram (or if we put them, we need to be super clear that they're implicit and based on resampling risk exposure).

collijk · 2020-03-07T07:38:11Z

docs/source/gbd2017_models/risk_attributable_causes/vitamin_a_deficiency/index.rst

 Vivarium Modeling Strategy
 --------------------------

+We will use an **exposure model** (or **prevalence-only model** or **propensity
+model**) for a vitamin A deficiency, in which each simulant is initialized with a "propensity" for vitamin A deficiency, and the simulant's vitamin A status is determined by comparing this


I don't know why we call this propensity except out of habit. I think it might be clearer just to call it a percentile.

For use with our standard sampling technique: https://en.wikipedia.org/wiki/Inverse_transform_sampling

Ugh, it's not a percentile -- it's the inverse of the percentile/quantile under the quantile function, i.e. the probability of lying below the corresponding quantile. I can't find a standard terminology for this. Perhaps "continuous rank statistic"?

See https://en.wikipedia.org/wiki/Quantile and https://en.wikipedia.org/wiki/Quantile_function

Man, I've been thinking about percentile wrong for a while. My stats professors would be ashamed.

There really isn't a name for this. Huh.

Aha! Apparently it's called the percentile rank (or presumably quantile rank if we're talking about values in [0,1] instead of [0,100]).

So if Q is the quantile function and p is a probability, then q = Q(p) is a quantile with quantile rank p, i.e. a p-quantile. Or in reverse, if F is the CDF and q is an arbitrary value of the random variable, then the probability p = F(q) is the quantile rank of q, so q is a p-quantile. (I'm ignoring some subtleties about nonuniqueness when F is discontinuous or has constant pieces.)

I've been confused about this too. But it's the statistics professors' fault! For not laying out the definitions clearly.

collijk · 2020-03-07T07:51:46Z

docs/source/gbd2017_models/risk_attributable_causes/vitamin_a_deficiency/index.rst

+model**) for a vitamin A deficiency, in which each simulant is initialized with a "propensity" for vitamin A deficiency, and the simulant's vitamin A status is determined by comparing this
+propensity to the overall VAD exposure/prevalence in the population.
+Such
+propensity/exposure models have been used in Vivarium for other risk factors and


They are used for all other risk models. It is the standard. They are infrequently used for cause models because we usually trust the dynamic disease parameters more or we care about counting cases.

Adding a todo to reword this.

collijk · 2020-03-07T07:52:25Z

docs/source/gbd2017_models/risk_attributable_causes/vitamin_a_deficiency/index.rst

+
+In more detail, the basic strategy is to initialize each simulant with a
+propensity score distributed uniformly in [0,1], then compare this propensity
+score with the (location/age/sex/year/intervention-status)-dependent prevalence


Use exposure throughout.

What do you mean? Do you mean I should call this the "(location/age/sex/year/intervention-status)-dependent exposure distribution," which in this case happens to represent prevalence of VAD?

Yeah, I think in a risk-based model, we should explicitly use exposure since the disease prevalence is derived from it. As opposed to a PAF of 1 model of VitA in which the disease model was the source of truth and the risk exposure was derived from prevalence.

Added some wording and a todo to reword everything to ensure consistent terminology.

I guess I was thinking about this from the GBD perspective - my impression is that they ran a DisMod model to get the cause data, and just copied the prevalence data over to the risk factor to get exposure.

Makes sense. As long as your consistent, it seems fine either way.

collijk · 2020-03-07T07:54:29Z

docs/source/gbd2017_models/risk_attributable_causes/vitamin_a_deficiency/index.rst

+remission data for vitamin A deficiency, but only *prevalence* (which is the
+same as the exposure data for the VAD risk factor). The rationale for this approach is twofold:
+
+1.  We want to guarantee that the simulated baseline prevalence of vitamin A


We are only doing this because of option 2. We expect the cause version of the model to get incidence and remission correct in addition to getting prevalence correct.

However, it is more important for us to use the intervention data correctly than it is to get the dynamic parameters of the disease correct. Details about the limitations and the expected impact to be found in .

With regard to (1), there's also the concern mentioned in @aflaxman 's comment above: If we use the incidence and remission data from GBD, then basically the whole population will end up getting VAD if we run the simulation for 5 years, but we suspect this would not be representative of reality.

I added a todo to clarify this.

collijk · 2020-03-07T07:56:51Z

docs/source/gbd2017_models/risk_attributable_causes/vitamin_a_deficiency/index.rst

+
+  Explain why the prevalence-only model is a reasonable strategy, citing
+  incidence, remission, and prevalence data, as well as expert opinions about
+  VAD. (Perhaps this explanation should come later, e.g. in the Assumptions and


Assumptions and limitations is right, I think. That's where I'd look for the justification.

collijk · 2020-03-07T08:05:45Z

docs/source/gbd2017_models/risk_attributable_causes/vitamin_a_deficiency/index.rst

+
+1.  **Initialize:** When simulant :math:`i` enters the simulation (e.g. at the
+    start of the simulation or at the time step when the simulant is born),
+    assign the simulant a random number :math:`v_i \sim


I think we should standardize the notation for these things. I follow scipy conventions in the code and I think they're reasonable:

x_i - the random variable (the exposure)
q_i - the percentile or propensity of the exposure x_i in the distribution (called q because the inverse cdf is the quantile function, though also called the percent point function).

Initialize should also describe how we actually get x_i.

@collijk In this case x_i would be a binary variable, "has VAD"/"does not have VAD", correct? How would you initialize this before following the procedure in the "Update" step? I assumed the "update" part would first happen during the same time step as initialization.

The procedure is the same as in update, but it has to happen before a time step takes place. The value of other attributes the simulant is initialized with (e.g. whether or not they are receiving vitamin a fortification) may be dependent on their initial vitamin a deficiency status. This is a fencepost error (https://en.wikipedia.org/wiki/Off-by-one_error#Fencepost_error). All attributes of a simulant must be assigned an initial value before the first time step starts or you introduce extremely hairy issues into the order you must update simulant state each time step.

Here's the procedure:

Initialization:

Sample propensity and store it for all time.

Use attributes initial distribution is conditional on (age, sex, year) to construct probability mass function

Use propensity to sample pmf and assign initial status.

Update:

Your procedure looks good here.

collijk · 2020-03-07T08:23:56Z

docs/source/gbd2017_models/risk_attributable_causes/vitamin_a_deficiency/index.rst

+    c)  If :math:`v_i < p_\text{VAD}(\text{subpop}(i,t))`, the simulant has
+        vitamin A deficiency on the next time step; otherwise, they don't.
+
+To address a point of potential confusion in the above algorithm, note that a


Just leaving notes. I think this section is great. I think we want to pull it out into the general risk model for the standard way to sample from categorical distributions. In order to do so, we have to generalize this section a bit. Technically what we do is start with a distribution

p = [P_1, P_2, ..., P_N, 1 - (P_1 + ... +P_n)]

where P_j is the probability that an individual is in category j.

We then take the cumulative sum over the distribution

p_cum = [P_1, P_1 + P_2, ..., P_1 + ...+ P_N, 1]

to form the right bounds of a partition of the interval [0, 1] with each subinterval mapping to a risk exposure category with probability p_cum[right] - p_cum[left].

We then select the exposure category k by arg_max_k (q_i < p_cum[k]) and assign that category as x_i.

This means the interpretation of propensity is dependent on the sort order of the categories.

The default sort order is worst to best.

w/r/t risk effect. We have not had to deal with unordered categorical risks.

…y of RR's

collijk

Just a general comment here: this PR is too big and stayed open too long. We need to come up with some strategies and guidance about putting these things together and getting fast reviews.

NathanielBlairStahn added 4 commits March 4, 2020 19:32

add intro paragraph for vivarium modeling strategy

473ad9c

add explanation of how to determine vitamin A status

265e830

edits to VAD status algorithm

a78a6a0

clean up line breaks, add link to RR table, add some more details

fbc2624

NathanielBlairStahn requested review from aflaxman, KjellSwedin, collijk, SantoniS, yongqx, kiranklc, plinds, alibow, XiaoluQianUW, yaqi7, beatrixkh and Ninicorn March 5, 2020 19:00

beatrixkh reviewed Mar 5, 2020

View reviewed changes

collijk reviewed Mar 5, 2020

View reviewed changes

aflaxman approved these changes Mar 5, 2020

View reviewed changes

NathanielBlairStahn added 2 commits March 6, 2020 18:21

Merge branch 'master' into vad

aa0a0b4

clarify PAF-of-1 relationship

be448d8

collijk reviewed Mar 7, 2020

View reviewed changes

NathanielBlairStahn added 5 commits March 7, 2020 12:43

be more precise in GBD strategy intro

faf52ae

add missing data in risk restrictions table, clarify RR data source

ff81d37

add Abie's notes about limitations and v&v; note dual interpretabilit…

c82612e

…y of RR's

add discussion of VAD status algorithm

b9155da

address several of James's comments in PR 149

933c9ef

yaqi7 approved these changes Mar 12, 2020

View reviewed changes

NathanielBlairStahn added 2 commits March 13, 2020 10:21

minor edits

b7940d3

add todos to clarify algorithm for exposure model

0b8c4e7

collijk approved these changes Mar 16, 2020

View reviewed changes

NathanielBlairStahn merged commit 2754b54 into master Mar 16, 2020

NathanielBlairStahn deleted the vad branch March 16, 2020 22:20

Vitamin A Deficiency: Start Vivarium modeling section #149

Vitamin A Deficiency: Start Vivarium modeling section #149

Conversation

NathanielBlairStahn commented Mar 5, 2020

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

aflaxman left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

collijk Mar 7, 2020 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

collijk left a comment

Choose a reason for hiding this comment

collijk Mar 7, 2020 •

edited