Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allow negative-valued SI (COVID-19 modelling) #90

Open
giulianonetto opened this issue Mar 23, 2020 · 9 comments
Open

Allow negative-valued SI (COVID-19 modelling) #90

giulianonetto opened this issue Mar 23, 2020 · 9 comments

Comments

@giulianonetto
Copy link

giulianonetto commented Mar 23, 2020

Hello, Dr. Cori!

First of all, thank you for this amazing package. It has been super helpful - not only with modelling itself but also with learning epidemiology concepts. I am no expert, so excuse me for any misunderstandings.

Model SI as Gaussian - COVID-19 report

This recent preprint has reported negative-valued serial intervals, which I understand as infectee showing/noticing symptoms before infector. For this reason, the authors departed from a Gamma model for the SI to a Gaussian model. So being able to assume a distribution on the entire real line for the SI seems like a good feature in EpiEstim.

Failed attempt:

While I was able to use estimate_R (with method='parametric_si') with their reported mean and sd for the SI, I noticed it uses a discretization of a gamma distribution to set the probabilities at each SI value.

Given the importance of COVID-19 and the evidence of negative-valued SIs, I wonder if it would be possible to follow EpiEstim strategy but assuming

$$SI \sim N(\mu, \sigma^2)$$

instead of

$$SI \sim Gamma(\k, \theta)$$

I actually tried to override EpiEstim::discr_si with a naive implementation for the discretization of a Gaussian r.v.:

discr_si <- function(s, mu, sigma) {
    pnorm(s + 0.5, mu, sigma) - pnorm(s - 0.5, mu, sigma)
}

but my results were totally incorrect.

Sorry for the long description.

Thank you very much!

Best wishes

Giuliano

@jlopezper
Copy link

Hi @giulianonetto. Have you managed to sort this out? I'm facing the same situation.

@giulianonetto
Copy link
Author

giulianonetto commented Mar 31, 2020

Hi @jlopezper ! No, unfortunately, I have not. What I did was assuming a Gamma distribution nonetheless, as it is quite frequent even for COVID-19 studies.

The one thing about that paper, though, is that their reported mean and sd for the serial interval don't carry the same meaning as the ones reported by other papers that do not assume Gaussian distributions. For instance, if you have mean=3 and sd=3, a Gamma has about 80% of its mass between 0 and 5, while a Gaussian has only about 50%. The discrepancy can be visualized in the plot below (blue is Gamma, black is Gaussian).

image

For now, I assumed SI ~ Gamma with mean 4.89 and sd = 1.48, which seems to span most previous estimates (listed in the paper which assumed the Gaussian). The figure below shows the params, quantiles, and the histogram of the distribution I am currently assuming (sorry I mean Serial Interval distribution in the plot title).

image

It is not perfect, but it has generated seemly good results with Brazil's data. Also, using method="uncertain_si" didn't make much difference in my case.

Hope it helps! Any better solutions or alternative ideas would be very appreciated as well!

@jlopezper
Copy link

jlopezper commented Apr 1, 2020

Thank you @giulianonetto for your detailed response!

I'm going with a Gamma distribution as well but with mean = 7.5 and sd = 3.4 (based on this and this). Comparing it with the Normal distribution proposed in that paper, both distributions overlap by about 50%.

image

Since I'm a complete novice in this matter, I'm not sure that this assumption is correct, although the results I'm obtaining with data from Spain seem reasonable.

Thank you again.

@giulianonetto
Copy link
Author

giulianonetto commented Apr 1, 2020

That's reassuring. We do believe our SI distribution might be a bit closer to zero than it should eventually be, but we wanted to partially follow the other studies cited here.

My only concern is that both studies you pointed out seem to be based on this one, which estimated the SI using only 6 infector-infectee pairs. Notice their confidence interval for the mean SI ranges from 5.3 to 19. I am not entirely sure whether that's just how it is, but it was one of the reasons we decided to be a bit skeptical about that particular study. In our case, it led to a most-recent Rt, using Brazil's data, of over 3, while the resulting R0 would be like over 5 - which seems a little above average (see table 1). Of course, that can be my personal bias again, these are just the decisions we made in such an uncertain scenario.

Thank you very much for your response!

It is important to know that we are not alone in all this, and that our doubts are shared.

@robin-thompson
Copy link
Collaborator

Hi both,

Thanks very much for using EpiEstim :-)

Happy to think more about this, but my immediate thought it that implementing negative serial intervals in this method is likely to be challenging, since it will involve summing over numbers of cases both in the past and in future? As a result, doing real-time estimation would be hard (given unknown numbers of future cases)?

Another issue will be that the method assumes that individuals cannot generate new cases on the same day (i.e. the serial interval cannot be zero). That allows the renewal equation model to predict the number of cases today based on cases on all previous days - thereby allowing estimation of the reproduction number.

The thing I'm not sure about is whether these assumptions can be relaxed... To do that, it would be necessary to work through the underlying method here (https://www.sciencedirect.com/science/article/pii/S1755436519300350) and adapt it, but I suspect this would be a substantial adaptation (and perhaps worthy of a paper in its own right!) It would certainly be more straightforward to simply implement the gamma distribution you mentioned, if you think that isn't too dubious an assumption.

I will think more about this, but wanted to share my initial thoughts!

Thanks!
Robin

@giulianonetto
Copy link
Author

giulianonetto commented Apr 2, 2020

Hi Dr. Thompson!

Thank you very much for your response. It seems to me that reporting negative-valued serial intervals rises from a limitation of SI as a proxy for the time between infection (TBI) events, as shown in the figure below.

The lines start when the person is infected, and arrowheads show when the person starts feeling the symptoms.

image

In the "normal" case, the SI serves fine as an approximation to TBI events. If the infector takes a bit longer to notice the symptoms though, while the infectee feels them more quickly upon infection, the serial interval becomes a negatively-biased estimator for the TBI events. Of course, infectee can never be infected before infector, despite such a "lack of synchronization" being quite possible in terms of symptom onset.

It feels like negative-valued SIs still carry important information to be simply thrown out - this should not happen so often if there is no asymptomatic transmission for instance. It might indicate that the TBI is in fact not super high. However, I wonder if there is a way to account for such a negative bias? Maybe using incubation period estimates to correct them in some way?

Sorry if this is pure nonsense, I am truly interested in digging deeper.

Thank you all very much!

@robin-thompson
Copy link
Collaborator

Hi, Yes - you are right. In theory it is possible to back-calculate infection times from the times of symptom appearance. And then use the inferred times of infection for your analyses. I know some work has been done in this direction before, but I imagine it leads to considerable uncertainty (given the width of the incubation period). Might be worth doing more of a literature search in that direction - from memory, I think Christophe Fraser talks about back-calculation of infection times in his paper "Estimating Individual and Household Reproduction Numbers in an Emerging Epidemic". Thanks! Robin

@t-pollington
Copy link
Contributor

Note the most up-to-date preprint of the negative SI distribution was posted on medRxiv on 27 April.``

@t-pollington
Copy link
Contributor

Dear @giulianonetto,

Are you currently using open-access epi data? Does it measure the date of notification or onset? I'm currently searching for date of onset data.

Kind regards, Tim.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants