Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Unreasonably high value of instantaneous reproduction number estimation? #3

Closed
caijun opened this issue Nov 1, 2016 · 5 comments
Closed

Comments

@caijun
Copy link

caijun commented Nov 1, 2016

Hi Anne Cori,

I am using EpiEstim to estimate the instantaneous (case) reproduction number during post-pandemic period for 2009 pandemic influenza A (H1N1) in mainland China. The EstimateR function successfully estimated the R(t); however the maximal estimation of R(t) is 47.5, which is so large that I don't think it makes sense. Could you help me to explain why such a large estimation of R(t) could be produced? Thank you very much.

> rm(list = ls())
> 
> load(url("http://tonytsai.name/confirmed_post-pdm_dec.rda"))
> 
> # instantaneous reproduction number estimation for post-pandemic --------------------
> # using ParametricSI method
> # the instantaneous reproduction number can be estimated after May 2nd, 2010
> x <- EstimateR(dec$cases, T.Start = 2:359, T.End = 8:365, method = "ParametricSI", 
+                Mean.SI = 2.6, Std.SI = 1.3, plot = TRUE, leg.pos = xy.coords(1, 3))
> max(x$R$`Mean(R)`)
[1] 47.54329

image

@annecori
Copy link
Collaborator

annecori commented Nov 2, 2016

Dear Tony,
as you can see the high estimates of R are towards the end of the year you are considering; looking closely at your data it appears that there are no incident cases between 07/04/2011 and 19/04/2011; so a period of 10 days with no cases at all between these two dates. As you can see the distribution of the generation time has a very thin tail, with pretty much no weight after day 7. Therefore this says that the 3 incident cases that appear on 07/04/2011 (and the cases that appeared before) are almost no longer infectious on the 19/04/2011; to explain the 14 new cases that appear on 19/04/2011 you therefore need an extremely high reproduction number.
It is worth remembering that at the moment the method does not account for imported cases, or missing cases, so it tries to explain the 14 new cases by transmission from previously observed cases, which here is probably not realistic.
I hope this helps?
Best wishes
Anne

@caijun
Copy link
Author

caijun commented Nov 2, 2016

Hi Anne, thanks for your detailed explanation. I agreed with you that trying to explain the 14 new cases by transmission from previously observed cases is probably not realistic. However, for the same dataset, I also calculated the case reproduction number and the results are presented in following figure. The case reproduction number estimation around 19/04/2011 is approximately 4.6, which is reasonable based on my knowledge. I wonder why the instantaneous reproduction number could be 10 times higher than the case reproduction number at the same time? I double checked the Figure 1 in your paper "A New Framework and Software to Estimate Time-Varying Reproduction Numbers During Epidemics" and found the ranges for the instantaneous or case reproduction number estimations are very similar across different diseases. Does such a large instantaneous reproduction number estimation make sense? Or whether it impacts the comparison between the two case reproduction number estimations?

# case reproduction number estimaion for post-pandemic -------------------------
# using the Wallinga and Teunis method
WT(dec$cases, T.Start = 1:359, T.End = 7:365, method = "ParametricSI", Mean.SI = 2.6, 
   Std.SI = 1.3, plot = TRUE, nSim = 100)

image

@annecori
Copy link
Collaborator

annecori commented Nov 2, 2016

Dear Tony,

actually I am not surprised, there is a tight relationship between the case reproduction number (Rc) and the instantaneous reproduction number (Ri), which states that Rc is the average / integral of Ri over one generation (see this paper http://journals.plos.org/plosone/article?id=10.1371/journal.pone.0000758 ). Therefore Rc is a "smoothed" version of Ri. Here your Ri if absurdly high only for one day I think, before and after it has reasonable values, so when you average that, it will give you a much lower value for Rc.

Such large differences between Ri and Rc are particularly noticeable when you have very high variability in Ri, because then the mean (Rc) differs a lot from the actual values (Ri). When Ri is smoother, as in the examples in my paper, Rc and Ri are more similar to one another.

In deciding whereas to look at Ri or Rc, you have to think about the meaning of each one. Ri tells you what transmissibility is on that tilmestep, or equivalently how many secondary cases would be generated, on average, by each case, if transmissibility remains as it is in this tilmestep in the future. On the other hand, Rc tell you how many secondary cases will eventually be produced by each infected individuals, and this accounts for the current transmissibility (really high in your example) but also future transmissibility (much lower in your example). So they are really different in nature.

Please let me know if this is clear and if you agree; I am happy to discuss further if that's helpful!

@caijun
Copy link
Author

caijun commented Nov 2, 2016

It is more clear and I understood. Thank you very much, Anne. Could you help to fix the issue #2

@annecori
Copy link
Collaborator

annecori commented Nov 3, 2016

Just worked on issue #2 this morning - found what's wring - will solve it asap and get back to you.

@annecori annecori closed this as completed Nov 3, 2016
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants