Correct for changing testing volume by using data from rt.live #1

michaelmunie · 2020-06-24T19:20:50Z

I really like your idea, but I thought one thing that could be improved was getting data that corrects for changing testing volume. The site www.rt.live provides this, and the code below, specifically get_dates_positives_totals_and_deaths_rt_live(), can be simply substituted for get_DNC(). The new predictions are similar, but slightly lower, and the regression is nicer.

Also,

Thank you for sharing your code!
You might want to refactor it a bit and perhaps upgrade to Python 3 as well. It was difficult to work with.
Thanks again, and hopefully this is helpful.
-Mike

import csv

tmp = sys.argv
try:
    state = tmp[1]
except IndexError:
    state = "FL"
    print 'Entering debug mode with state' + state


def date_to_datetime(my_date):
    if isinstance(my_date, (str,)):
        my_date = int(my_date.replace('-',''))

    if isinstance(my_date, (int,)) :
        return datetime(int(my_date/10000), int(my_date/100) % 100, int(my_date) % 100)
    else:
        raise NotImplementedError


def rt_live_str_to_float(my_str):
    if my_str == '':
        return 0.0
    else:
        return float(my_str)


def get_dates_positives_totals_and_deaths_rt_live(my_state, max_days=99,
                                                  num_days_to_use_for_aligning_positive_adjusted=7):
    # Download data from rt.live, which corrects cases for testing volume
    with requests.Session() as s:
        download = s.get('https://d14wlfuexuxgcm.cloudfront.net/covid/rt.csv')
    decoded_content = download.content.decode('utf-8')
    rt_live_csv_reader = csv.reader(decoded_content.splitlines(), delimiter=',')
    rt_live_rows = list(rt_live_csv_reader)

    my_state_index = rt_live_rows[0].index('region')
    my_date_index = rt_live_rows[0].index('date')
    my_positive_index = rt_live_rows[0].index('positive_test_adjusted_raw')
    my_positive_no_adjustment_index = rt_live_rows[0].index('positive')
    my_test_index = rt_live_rows[0].index('new_tests')
    my_death_index = rt_live_rows[0].index('new_deaths')

    my_dates = list()
    my_positives = list()
    my_totals = list()
    my_deaths = list()
    my_positive_no_adjustment = list()

    rt_live_rows_sorted_body = rt_live_rows.copy()
    rt_live_rows_sorted_body = rt_live_rows_sorted_body[1:]
    rt_live_rows_sorted_body.sort()

    for rt_live_row in rt_live_rows_sorted_body: #Make sure to skip the header row.
        this_date = date_to_datetime(rt_live_row[my_date_index])
        if rt_live_row[my_state_index] == my_state or (my_state == "US" and this_date not in my_dates):
            my_dates.append(this_date)
            my_positives.append(rt_live_str_to_float(rt_live_row[my_positive_index]))
            my_positive_no_adjustment.append(rt_live_str_to_float(rt_live_row[my_positive_no_adjustment_index]))
            my_deaths.append(rt_live_str_to_float(rt_live_row[my_death_index]))
            my_totals.append(rt_live_str_to_float(rt_live_row[my_test_index]))
        #TODO: fix this
        elif my_state == "US": # And we haven't seen this date before
            my_index = my_dates.index(this_date)
            my_positives[my_index] += rt_live_str_to_float(rt_live_row[my_positive_index])
            my_positive_no_adjustment[my_index] += rt_live_str_to_float(rt_live_row[my_positive_no_adjustment_index])
            my_deaths[my_index] += rt_live_str_to_float(rt_live_row[my_death_index])
            my_totals[my_index] += rt_live_str_to_float(rt_live_row[my_test_index])

    my_cumulative_deaths = [0.0] * len(my_deaths)
    for my_cd_i, daily_new_deaths in enumerate(my_deaths):
        my_cumulative_deaths[my_cd_i] = my_cumulative_deaths[max(my_cd_i - 1, 0)] + daily_new_deaths

    for my_list in my_dates, my_positives, my_totals, my_cumulative_deaths, my_positive_no_adjustment:
        my_list.reverse()
        num_to_pop = max(0, len(my_list) - max_days)
        for popper in range(num_to_pop):
            my_list.pop()

    # Align most recent x days
    positives_total = 0
    positives_no_adjustment_total = 0
    for jj in range(num_days_to_use_for_aligning_positive_adjusted):
        positives_total += my_positives[jj]
        positives_no_adjustment_total += my_positive_no_adjustment[jj]

    factor_to_multiply_adjusted_positives_so_they_approximate_tested_positives = positives_no_adjustment_total / positives_total

    adjusted_positives = [int(x * factor_to_multiply_adjusted_positives_so_they_approximate_tested_positives) for x in my_positives]
    adjusted_totals = [int(x * factor_to_multiply_adjusted_positives_so_they_approximate_tested_positives) for x in my_totals]

    return my_dates, adjusted_positives, adjusted_totals, my_cumulative_deaths```

The text was updated successfully, but these errors were encountered:

qjhong · 2020-06-25T16:03:59Z

Mike: Thanks for your suggestion! I do not agree with them that this correction is valid.

As I understand, this approach of P/T (P:positive, T:total test) is valid only when sampling is random. This is not true in reality, and it could lead to unreasonable conclusions. For example, their correction suggests that current positive number is smaller than that in April in CA. I do not agree. Another counter example is that imagine a state can manipulate and reduce P/T by forcefully increasing T, while the real positive population is independent of T. In other words, imagine we have T and then intentionally expand it by another T. In the second T, people are less likely to show symptoms, tend to be relatively healthy, so P/T becomes smaller.

I believe a reasonable correction is P/(T^a), where 0<a<1. We don't know the real value, but a<1. In my model a=0.25. At least this effect is partially taken into account in my model.

michaelmunie mentioned this issue Jun 24, 2020

Case counts corrected for testing volume qjhong/covid19#1

Closed

qjhong closed this as completed Jun 28, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Correct for changing testing volume by using data from rt.live #1

Correct for changing testing volume by using data from rt.live #1

michaelmunie commented Jun 24, 2020

qjhong commented Jun 25, 2020

Correct for changing testing volume by using data from rt.live #1

Correct for changing testing volume by using data from rt.live #1

Comments

michaelmunie commented Jun 24, 2020

qjhong commented Jun 25, 2020