In [69]:
import sys

sys.path.append('parse_website.py')
sys.path.append('analyze.py')

In [70]:
import numpy as np
import pandas as pd
from tabulate import tabulate

import parse_website as p
import analyze as a



# Pennsylvania Covid-19 Testing and New Cases Growth Rates

### An exploration of the relationship between reported rates growth in Covid-19 cases and the corresponding rates of growth in tests performed.

The last few days have seen stories in the news reporting an apparent slowing in the growth rates of new Covid-19 cases in Pennsylvania, and in the USA at large for that matter.  Officials have all levels of government have spoken of it _(find some links)_, and some have gone so far as to claim that _[paraphrasing, find quote]"we are no longer experiencing exponential growth"._   Of course, we have seen temporary dips in cases [before](cite Italy, maybe New York?), and most officials are careful to emphasize that this data is preliminary and uncertain and that we must therefore continue with our rigorous social distance and lockdown policies.  But I wondered whether these claims of reductions in growth rates, however qualified and tentative they may be, might not in fact be illusory:  whether they might not be a function of a reduction in _testing_, rather than of reduction in spread of the disease itself.

In order to see if my hypothesis held water, I wrote a Python program to scrape the [PA Coronavirus Update Archive](https://www.health.pa.gov/topics/disease/coronavirus/Pages/Archives.aspx), which keeps a daily record of the number of negative test results, positive test results, and deaths due to Covid-19 in Pennsylvania (among other data).  I couldn't find a proper dataset, so I wrote a program in Python to scrape the web page for the last couple of weeks of relevant data, which can be found in `parse_website.py` (imported as `p`).  Here is the compiled data.

In [71]:
data = {'negatives': np.array(p.negative_tests), 
        'positives': np.array(p.positive_tests), 
        'deaths': np.array(p.deaths)}
df = pd.DataFrame(data)
df

Unnamed: 0,negatives,positives,deaths
0,76719,14559,240
1,70874,12980,162
2,66261,11510,150
3,60013,10017,136
4,53695,8420,102
5,47698,7016,90
6,42427,5805,74
7,37645,4843,63
8,33777,4087,48
9,30061,3394,38


One piece of critical information is the rate of positives, which can be calculated by

In [72]:
rate_of_positive = data['positives']/(data['negatives'] + data['positives'])
df['positive_rate'] = rate_of_positive
df

Unnamed: 0,negatives,positives,deaths,positive_rate
0,76719,14559,240,0.159502
1,70874,12980,162,0.154793
2,66261,11510,150,0.147999
3,60013,10017,136,0.143039
4,53695,8420,102,0.135555
5,47698,7016,90,0.12823
6,42427,5805,74,0.120356
7,37645,4843,63,0.113985
8,33777,4087,48,0.107939
9,30061,3394,38,0.10145


The trend here is quite clear, if not particularly dramatic.   We will also want to know the daily tally of new cases.  We could make the same calculation for new deaths each day, but for now we will focus on cases.  Note the appending of a single `np.nan` to fill out this column.   
The trend here is quite clear, if not particularly dramatic.   We will also want to know the daily tally of new cases.  We could make the same calculation for new deaths each day, but for now we will focus on cases. 


In [73]:
new_cases = [p1 - p2 for p1, p2 in zip(df['positives'][:-1], df['positives'][1:])]
new_cases.append(np.nan)

df['new_cases'] = new_cases
df

Unnamed: 0,negatives,positives,deaths,positive_rate,new_cases
0,76719,14559,240,0.159502,1579.0
1,70874,12980,162,0.154793,1470.0
2,66261,11510,150,0.147999,1493.0
3,60013,10017,136,0.143039,1597.0
4,53695,8420,102,0.135555,1404.0
5,47698,7016,90,0.12823,1211.0
6,42427,5805,74,0.120356,962.0
7,37645,4843,63,0.113985,756.0
8,33777,4087,48,0.107939,693.0
9,30061,3394,38,0.10145,643.0


Note the appending of a single `np.nan` to fill out this column.   Next, we compute the growth rate.


In [74]:
nominal_growth_rate = [p1 / p2 for p1, p2 in zip(df['new_cases'][:-1], df['positives'][1:])]
nominal_growth_rate.append(np.nan)

df['nominal_growth_rate'] = nominal_growth_rate
df


Unnamed: 0,negatives,positives,deaths,positive_rate,new_cases,nominal_growth_rate
0,76719,14559,240,0.159502,1579.0,0.121649
1,70874,12980,162,0.154793,1470.0,0.127715
2,66261,11510,150,0.147999,1493.0,0.149047
3,60013,10017,136,0.143039,1597.0,0.189667
4,53695,8420,102,0.135555,1404.0,0.200114
5,47698,7016,90,0.12823,1211.0,0.208613
6,42427,5805,74,0.120356,962.0,0.198637
7,37645,4843,63,0.113985,756.0,0.184977
8,33777,4087,48,0.107939,693.0,0.204184
9,30061,3394,38,0.10145,643.0,0.233733


One can certainly see why people are saying the growth rate has gone down!  In fact there seem to be two distinct dips in the growth rate, one occuring at the end of March (roughly two weeks after drastic measures began to be taken in the first counties to be effected), and another one occuring just a few days ago.   However, let us run the same calculations for the daily number of tests being performed.

In [75]:
total_tests = df['negatives'] + df['positives']

new_tests = [p1 - p2 for p1, p2 in zip(total_tests[:-1], total_tests[1:])]
new_tests.append(np.nan)

testing_growth_rate = [p1 / p2 for p1, p2 in zip(new_tests[:-1], total_tests[1:])]
testing_growth_rate.append(np.nan)

df['new_tests'], df['testing_growth_rate'] = [new_tests, testing_growth_rate]
df

Unnamed: 0,negatives,positives,deaths,positive_rate,new_cases,nominal_growth_rate,new_tests,testing_growth_rate
0,76719,14559,240,0.159502,1579.0,0.121649,7424.0,0.088535
1,70874,12980,162,0.154793,1470.0,0.127715,6083.0,0.078217
2,66261,11510,150,0.147999,1493.0,0.149047,7741.0,0.110538
3,60013,10017,136,0.143039,1597.0,0.189667,7915.0,0.127425
4,53695,8420,102,0.135555,1404.0,0.200114,7401.0,0.135267
5,47698,7016,90,0.12823,1211.0,0.208613,6482.0,0.134392
6,42427,5805,74,0.120356,962.0,0.198637,5744.0,0.135191
7,37645,4843,63,0.113985,756.0,0.184977,4624.0,0.122121
8,33777,4087,48,0.107939,693.0,0.204184,4409.0,0.131789
9,30061,3394,38,0.10145,643.0,0.233733,5450.0,0.194608


Now the data is telling a rather different story, to my eyes.  There is a pretty clear (and obvious) connection between the number of tests performed and the number of new cases reported.  The earlier dip in new cases around the end of March comes hand in hand with a corresponding dip in the number of tests performed--as does the dip that has been on the tip of so many tongues the last few days.  

Now, I am curious as to _how_ closely these two numbers correspond.  One way to find out would be to consider the how the rates of change themselves are changing.  Thus


In [76]:
cases_growth_rate_change = [(p / q) - 1 for p, q in zip(nominal_growth_rate[:-1], nominal_growth_rate[1:])]
cases_growth_rate_change.append(np.nan)

tests_growth_rate_change = [(p / q) - 1 for p, q in zip(testing_growth_rate[:-1], testing_growth_rate[1:])]
tests_growth_rate_change.append(np.nan)

df['cases_rate_change'] = cases_growth_rate_change
df['tests_rate_change'] = tests_growth_rate_change

In [78]:
 df2 = df[['negatives', 'positives', 'new_cases', 
           'nominal_growth_rate', 'testing_growth_rate',
           'cases_rate_change', "tests_rate_change"]]

df2

Unnamed: 0,negatives,positives,new_cases,nominal_growth_rate,testing_growth_rate,cases_rate_change,tests_rate_change
0,76719,14559,1579.0,0.121649,0.088535,-0.047499,0.131916
1,70874,12980,1470.0,0.127715,0.078217,-0.14312,-0.292401
2,66261,11510,1493.0,0.149047,0.110538,-0.214169,-0.132522
3,60013,10017,1597.0,0.189667,0.127425,-0.052203,-0.057975
4,53695,8420,1404.0,0.200114,0.135267,-0.040742,0.00651
5,47698,7016,1211.0,0.208613,0.134392,0.050222,-0.00591
6,42427,5805,962.0,0.198637,0.135191,0.07385,0.107023
7,37645,4843,756.0,0.184977,0.122121,-0.094068,-0.073357
8,33777,4087,693.0,0.204184,0.131789,-0.126423,-0.322798
9,30061,3394,643.0,0.233733,0.194608,-0.027354,-0.05229
