Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Correct for changing testing volume by using data from rt.live #1

Closed
michaelmunie opened this issue Jun 24, 2020 · 1 comment
Closed

Comments

@michaelmunie
Copy link

I really like your idea, but I thought one thing that could be improved was getting data that corrects for changing testing volume. The site www.rt.live provides this, and the code below, specifically get_dates_positives_totals_and_deaths_rt_live(), can be simply substituted for get_DNC(). The new predictions are similar, but slightly lower, and the regression is nicer.

Also,

  1. Thank you for sharing your code!
  2. You might want to refactor it a bit and perhaps upgrade to Python 3 as well. It was difficult to work with.
  3. Thanks again, and hopefully this is helpful.
    -Mike
import csv

tmp = sys.argv
try:
    state = tmp[1]
except IndexError:
    state = "FL"
    print 'Entering debug mode with state' + state


def date_to_datetime(my_date):
    if isinstance(my_date, (str,)):
        my_date = int(my_date.replace('-',''))

    if isinstance(my_date, (int,)) :
        return datetime(int(my_date/10000), int(my_date/100) % 100, int(my_date) % 100)
    else:
        raise NotImplementedError


def rt_live_str_to_float(my_str):
    if my_str == '':
        return 0.0
    else:
        return float(my_str)


def get_dates_positives_totals_and_deaths_rt_live(my_state, max_days=99,
                                                  num_days_to_use_for_aligning_positive_adjusted=7):
    # Download data from rt.live, which corrects cases for testing volume
    with requests.Session() as s:
        download = s.get('https://d14wlfuexuxgcm.cloudfront.net/covid/rt.csv')
    decoded_content = download.content.decode('utf-8')
    rt_live_csv_reader = csv.reader(decoded_content.splitlines(), delimiter=',')
    rt_live_rows = list(rt_live_csv_reader)

    my_state_index = rt_live_rows[0].index('region')
    my_date_index = rt_live_rows[0].index('date')
    my_positive_index = rt_live_rows[0].index('positive_test_adjusted_raw')
    my_positive_no_adjustment_index = rt_live_rows[0].index('positive')
    my_test_index = rt_live_rows[0].index('new_tests')
    my_death_index = rt_live_rows[0].index('new_deaths')

    my_dates = list()
    my_positives = list()
    my_totals = list()
    my_deaths = list()
    my_positive_no_adjustment = list()

    rt_live_rows_sorted_body = rt_live_rows.copy()
    rt_live_rows_sorted_body = rt_live_rows_sorted_body[1:]
    rt_live_rows_sorted_body.sort()

    for rt_live_row in rt_live_rows_sorted_body: #Make sure to skip the header row.
        this_date = date_to_datetime(rt_live_row[my_date_index])
        if rt_live_row[my_state_index] == my_state or (my_state == "US" and this_date not in my_dates):
            my_dates.append(this_date)
            my_positives.append(rt_live_str_to_float(rt_live_row[my_positive_index]))
            my_positive_no_adjustment.append(rt_live_str_to_float(rt_live_row[my_positive_no_adjustment_index]))
            my_deaths.append(rt_live_str_to_float(rt_live_row[my_death_index]))
            my_totals.append(rt_live_str_to_float(rt_live_row[my_test_index]))
        #TODO: fix this
        elif my_state == "US": # And we haven't seen this date before
            my_index = my_dates.index(this_date)
            my_positives[my_index] += rt_live_str_to_float(rt_live_row[my_positive_index])
            my_positive_no_adjustment[my_index] += rt_live_str_to_float(rt_live_row[my_positive_no_adjustment_index])
            my_deaths[my_index] += rt_live_str_to_float(rt_live_row[my_death_index])
            my_totals[my_index] += rt_live_str_to_float(rt_live_row[my_test_index])

    my_cumulative_deaths = [0.0] * len(my_deaths)
    for my_cd_i, daily_new_deaths in enumerate(my_deaths):
        my_cumulative_deaths[my_cd_i] = my_cumulative_deaths[max(my_cd_i - 1, 0)] + daily_new_deaths

    for my_list in my_dates, my_positives, my_totals, my_cumulative_deaths, my_positive_no_adjustment:
        my_list.reverse()
        num_to_pop = max(0, len(my_list) - max_days)
        for popper in range(num_to_pop):
            my_list.pop()

    # Align most recent x days
    positives_total = 0
    positives_no_adjustment_total = 0
    for jj in range(num_days_to_use_for_aligning_positive_adjusted):
        positives_total += my_positives[jj]
        positives_no_adjustment_total += my_positive_no_adjustment[jj]

    factor_to_multiply_adjusted_positives_so_they_approximate_tested_positives = positives_no_adjustment_total / positives_total

    adjusted_positives = [int(x * factor_to_multiply_adjusted_positives_so_they_approximate_tested_positives) for x in my_positives]
    adjusted_totals = [int(x * factor_to_multiply_adjusted_positives_so_they_approximate_tested_positives) for x in my_totals]

    return my_dates, adjusted_positives, adjusted_totals, my_cumulative_deaths```
@qjhong
Copy link
Owner

qjhong commented Jun 25, 2020

Mike: Thanks for your suggestion! I do not agree with them that this correction is valid.

As I understand, this approach of P/T (P:positive, T:total test) is valid only when sampling is random. This is not true in reality, and it could lead to unreasonable conclusions. For example, their correction suggests that current positive number is smaller than that in April in CA. I do not agree. Another counter example is that imagine a state can manipulate and reduce P/T by forcefully increasing T, while the real positive population is independent of T. In other words, imagine we have T and then intentionally expand it by another T. In the second T, people are less likely to show symptoms, tend to be relatively healthy, so P/T becomes smaller.

I believe a reasonable correction is P/(T^a), where 0<a<1. We don't know the real value, but a<1. In my model a=0.25. At least this effect is partially taken into account in my model.

@qjhong qjhong closed this as completed Jun 28, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants