## Detecting rapidly spreading SARS-CoV-2 variants with case data
D. Karlen (+ S. Otto)/ Feb 14, 2021

Rapidly spreading SARS-CoV-2 variants dramatically affect the growth rate of detected cases in afflicted jurisdictions.
This report demonstrates that the nature of the transition to higher growth,
due to the introduction of the B.1.1.7 variant, is
significantly different, both in magnitude and shape, than transitions resulting from changes in general public behaviour.
As a result, the presence of the variant can be detected using population-level case data, after it has become dominant, without genomic screening or specialized PCR tests.
With the case data, the transmission advantage of the variant can be accurately determined, by directly comparing the infection trajectory for the two strains under the same conditions.

In this preliminary report, case data timeseries from England and Israel are examined. 
The PCR tests used in England were able to identify the presence of the rapidly spreading variant, B.1.1.7,
and that variant became responsible for half of daily cases by early December.
By using case data, we estimate that the efficiency for the PCR tests to identify a sample as the B.1.1.7 variant is 70-80%.

The same growth pattern is seen in the case data timeseries from Israel, suggesting that the variant B.1.1.7 was
responsible for half of daily cases at the beginning of November, and would be responsible for nearly all cases in December and
beyond.
Genomic screenings indicate that the variant represents less than 50% of samples collected in early January 2021,
and so further investigations are needed to understand this seeming contradition.

The case data shows that transmission advantage of the variant is substantialy larger than previous estimates. The US CDC assessment suggested that the variant would grow from 0.5% to 50% of cases about 75 days. That corresponding to the variant having a daily growth rate larger than the original strains by 7.3 % per day. The case data from England and Israel shows that the variant has a daily growth rate advantage of about 13%. With that advantage the variant would grow from 0.5% to 50% of cases in under 45 days.

### Method
To interpret case data time series, the [pypm](http://www.pypm.ca) population modelling framework is used. Discrete-time difference equations characterize an infection cycle and time delays are included to represent the propogation of populations from infected to contagious to symptomatic to tested and finally to reported. These calculations are either done in terms of expectation values (to define model expectations) or in terms of simulated data using negative binomial, binomial, and multinomial distributions (to evaluate the standard deviations of parameter estimators). The model is tuned to case data by adjusting the transmission rate of the original strains at a few dates, to model changes in public behaviour. The boundary dates for the periods of constant transmission rate are determined in the same fit that estimates the transmission rates.
In modelling juridications in North America, Europe, and Israel, the typical length of periods with constant transmission rate is 1-2 months.

The variant is modelled by including a second infection cycle whose contagious population propogates to both the regular reporting and the variant reporting populations.

### Data sources
Publicly available case data time series were used from [England](https://www.gov.uk/government/publications/investigation-of-novel-sars-cov-2-variant-variant-of-concern-20201201) and [Israel](https://data.gov.il/dataset/covid-19). 
Both sources report weekly data, and for the purposes of analysis, the data was spread equally for each day.
The data from England includes the number of cases each week that have been identified as the B.1.1.7 variant through their normal PCR testing, and these are divided into the 9 regions of England.
Case and first dose vaccination data from Israel, divided into age-groups, is used to study the introduction of the 
variant and to assess the immunity that the Pfizer vaccine provides the older age-groups against this variant.

### UK findings - case data only

To identify and measure the effect of the introduction of a rapidly spreading variant using case data time series alone is challenging if there are large reporting anomalies or significant changes in public behaviour during the period the variant is becoming dominant. For three of the UK regions, the data appears to be suitable to estimate the impact on growth of cases, in order to compare with the direct measurement using their PCR data.

The figures below show the average daily cases for each week (green points) along with the model expectations (green curves) for the tuned models for those regions in which the effect of the introduction of the variant is clearly separated from public behaviour changes.
The orange dashed curve shows the model expectation for reported cases arising from the original strains, and the red dashed
curve shows that for the B.1.1.7 variant.
The vertical lines show the fitted break-points, where the transmission rate is changed to a new value.

#### Linear scale
![uk-case-linear](uk-caselinear.png)

#### Log scale
![uk-case-log](uk-caselog.png) 

The long trough between the decline phase and rapid growth phase is a distinctive marker for the introduction of a rapidly spreading variant.
It is presumed that public behaviour is unchanged between the vertical grey lines.
The table below shows the daily decline and growth rates (% per day), as inferred from the transmission rates in that central period of constant public behaviour. The difference characterizes the transmission advantage of the B.1.1.7 variant, compared to the original strain.
(Error analysis to follow...)

Region | original strains | B.1.1.7 | difference
---|---|---|---
North West | -5.3 +/- | 9.3 +/- | 14.6 +/-
West Midlands | -2.8 +/- | 9.8 +/- | 12.6 +/-
Yorkshire and Humber | -6.3 +/- | 8.7 +/- | 15.0 +/-

### UK findings - comparison with PCR identification

The following figures compare the model expectations with the average daily variant cases (brown points) for each week.
The red dashed curves, the same as for the first plots, show the expected number of cases arising from the variant.
The solid brown curves shows the expected number of reported variant cases, which accounts for possible inefficiency
or data reporting delays.
These effects are estimated by fitting the model with two free parameters (efficiency and delay) to the variant cases time series.

#### Linear scale
![uk-var-case-linear](uk-var_caselinear.png)

#### Log scale
![uk-var-case-log](uk-var_caselog.png) 

The following table shows the multiplicative efficiency and additional delay estimated from these data. From the figures it appears that the efficiency for identification of the B.1.1.7 variant was lower in the first half of December.

Region | efficiency | delay (days)
---|---|---
North West | 0.75 | 1.7
West Midlands | 0.84 | 0.7
Yorkshire and Humber | 0.65 | 4.4


### Israel data findings

The data from Israel appears to be well described by a model with a very long period of constant public behaviour, in the period between lockdown \#2 (September 18) and lockdown \#3 (January 8), during which time a rapidly spreading variant
was introduced, that grew to produce half of the daily cases at the beginning of November.
Very similar results are seen in each age group.

The figures below show the average daily cases for each week (green points) along with the model expectations (green curves).
The model includes the effect of vaccination immunity. The latest data point are not included in the analysis, as it
suggests a departure from the expected infection trajectory.

#### Younger groups: Linear scale
![il-case-younger-linear](il-case-younger-linear.png)

#### Younger groups: Log scale
![il-case-younger-log](il-case-younger-log.png)

#### Older groups: Linear scale
![il-case-older-linear](il-case-older-linear.png)

#### Older groups: Log scale
![il-case-older-log](il-case-older-log.png)


The table below shows the daily decline and growth rates (% per day), as inferred from the transmission rates
for the two strains. The standard deviations of the estimators are smaller, thanks to the long period
of constant public behaviour. The difference in these two rates yield values that are similar to those
found during the introduction of the B.1.1.7 variant into England, shown above.
This is suggestive that the growth is due to the B.1.1.7 and it became well established in Israel 
more than a month before it became established in the UK.
(Error analysis is preliminary)

age|original strains | variant | difference
---|---|---|---
IL|-6.8 +/-  0.0| 5.9 +/-  0.0|12.7
19|-6.2 +/-  0.1| 6.8 +/-  0.0|13.0
20|-6.9 +/-  0.1| 6.2 +/-  0.1|13.1
30|-7.1 +/-  0.1| 5.6 +/-  0.1|12.7
40|-7.2 +/-  0.1| 5.5 +/-  0.1|12.7
50|-7.0 +/-  0.1| 5.3 +/-  0.1|12.3
60|-8.7 +/-  0.3| 5.1 +/-  0.2|13.8
70|-8.7 +/-  0.4| 4.7 +/-  0.3|13.4
80|-8.2 +/-  0.5| 5.3 +/-  0.3|13.5

### Comparison of shape with public behaviour change

Show examples comparing data to models accounting for the change by implementing a change in transmission rate.

### Comparison of magnitude with public behaviour change

Show distribution of the difference in growth rates seen when growth increased in the US over the past year.
No changes of the magnitude seen in the UK or Israel have been seen.
The B.1.1.7 variant is not yet dominant in the US.