<a href="https://colab.research.google.com/github/oughtinc/ergo/blob/notebooks-readme/assorted-predictions.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Setup

In [None]:
!pip install --quiet poetry  # Fixes https://github.com/python-poetry/poetry/issues/532
!pip install --quiet git+https://github.com/oughtinc/ergo.git@submit_mixture
!pip install --quiet pendulum seaborn requests
!pip install --quiet torch

In [None]:
%load_ext google.colab.data_table

In [None]:
import ergo
import pendulum
import pandas
import seaborn
import requests
import re
import numpy as np
from scipy import stats

# Set up Metaculus API

In [None]:
metaculus = ergo.Metaculus(username="ought", password=passwords["ought"], api_domain="pandemic")
# metaculus = ergo.Metaculus(username="oughttest", password=passwords["oughttest"], api_domain="pandemic")

# Utils

In [None]:
def will_x_happen_prediction (model_uncertainty, start, p_at_start, end):
  start_to_end = (end - start).days
  days_remaining = (end - pendulum.now()).days
  proportion_time_remaining = days_remaining / start_to_end
  model_prediction = p_at_start * proportion_time_remaining

  return model_prediction + model_uncertainty

# should be 1
will_x_happen_prediction(0.5, pendulum.now(), 0.5, pendulum.datetime(2020,7,7))

# Questions

## JSON

In [None]:
question_infos = [
    {
        "name": "What will the Seattle Police Department report as the total number of criminal offenses in March 2020?",
        "id": 3924
    },
    {
        "name": "What will Washington state’s Department of Revenue report as the 2020 Q1 gross business income?",
        "id": 3923
    },
    {
        "name": "Will the US federal government shut down all non-essential services by 2020-04-19?",
        "id": 3921
    },
    {
        "name": "Will the Emergency Telework Act (S.3561) become law by 4/25/20?",
        "id": 3918
    },
    {
        "name": "By May 1 will there be an iOS or Android app that shares an individual's COVID-19 infection status with more than 1M other users?",
        "id": 3915
    },
    {
        "name": "By June 1, how many tests for COVID-19 will have been administered in the US?",
        "id": 3916
    },
    {
        "name": "[short fuse] How many total confirmed deaths of novel coronavirus will be reported in the state of New York by April 2nd?",
        "id": 3934
    },
    {
        "name": "What will be the US unemployment rate for March 2020?",
        "id": 3922
    },
    {
        "name": "How many days will the city of New Orleans spend under lockdown between 2020-03-25 and 2020-04-15?",
        "id": 3930
    },
    {
        "name": "Will Florida go under lockdown between 2020-03-25 and 2020-04-25?",
        "id": 3926
    }
]

## Dataframe

In [None]:
questions = [metaculus.get_question(question_info["id"], name=question_info["name"]) for question_info in question_infos]
ergo.MetaculusQuestion.to_dataframe(questions)

# Models

## 0. What will the Seattle Police Department report as the total number of criminal offenses in March 2020?

https://pandemic.metaculus.com/questions/3924/what-will-the-seattle-police-department-report-as-the-total-number-of-criminal-offenses-in-march-2020/

In [None]:
def crime_model():
  feb_total = 5090 # https://www.seattle.gov/police/information-and-data/crime-dashboard
  mar_difference_multiplier = ergo.lognormal_from_interval(0.3, 2)
  prediction = feb_total * mar_difference_multiplier
  ergo.tag(prediction, "mar_total")

crime_samples = ergo.run(crime_model, 1000)

questions[0].samples = crime_samples.mar_total

# questions[0].show_submission(questions[0].samples)

#### Apr 4
* We're already predicting a broader range than the community on this one, and our median is close to the community median. I don't see any obvious reason to change anything
* TODO:
  * This one is ripe for reference-class comparison, which I haven't done at all:
    * what does data from other states, or early data from WA, show re: impact of coronavirus on crime?
    * what impact did other catastophes have on crime in the past?

### 1. What will Washington state’s Department of Revenue report as the 2020 Q1 gross business income?

https://pandemic.metaculus.com/questions/3923/what-will-washington-states-department-of-revenue-report-as-the-2020-q1-gross-business-income/

In [None]:
wa_df = pandas.read_csv("https://gist.githubusercontent.com/brachbach/5dc01125a44ce28e067a2dddb18f8a02/raw/12b16deecef9848c1f75432cf9aca5b61b1fd26a/WAGrossBusiness.csv")
wa_df["Total Gross"] = wa_df["Total Gross"].apply(lambda x: int(re.sub('[\$,]', '', x)))
wa_df["year"] = wa_df["Year"].apply(lambda x: int(x.split()[0]))
quarters = wa_df[wa_df["Year"].str.contains("Quarter")]
recent = quarters[quarters["year"] >= 2017]
recentish = quarters[quarters["year"] >= 2008]
worst_quarters = recentish.nsmallest(10, "Total Gross")

In [None]:
recentish

In [None]:
worst_quarters

In [None]:
def wa_rev_model():
  proxy_gross = ergo.random_choice(recent["Total Gross"].to_list())
  # fuzzed to be more likely to be worse than normal rather than better
  fuzzed = ergo.normal_from_interval(proxy_gross * 0.5, proxy_gross * 1.2)
  ergo.tag(fuzzed, "mar_gross")

wa_rev_samples = ergo.run(wa_rev_model, 1000)

questions[1].samples = wa_rev_samples.mar_gross

# questions[1].show_submission(questions[1].samples)

* IMO our prediction is grossly overconfident
  * why do I think so:
    * much more confident than the community
    * my dumb model can't provide that much certainty
    * I think that the prediction for how things would go in the absence of coronavirus should be relatively close to the 90th percentile of our estimate, which it is not
  * TODO:
    * I think for now, just increase the fuzz factor
* The model predicts that revenue will turn out to have been much lower than expected for Q1 2020. It's not obvious why that'd be true
  * stay-at-home order was not issued until 23 Mar, i.e. almost the end of the quarter
  * TODO:
    * switch to basically assuing that Q1 2020 will be a normal quarter, not an abnormally bad one
    * figure out what my view on this should really be:
      * what exactly is "gross business income"?
      * how much should we expect it to have been hurt by corona before stay-at-home order?

### 2. Will the US federal government shut down all non-essential services by 2020-04-19?

https://pandemic.metaculus.com/questions/3921/will-the-us-federal-government-shut-down-all-non-essential-services-by-2020-04-19/

#### Thinking about the question

- As far as what people usually think of as "government shutdowns", seems like all have been caused by budgetary brinksmanship: https://www.thebalance.com/government-shutdown-3305683

- Seems like no budget showdowns coming up before the question expires: http://www.crfb.org/blogs/upcoming-congressional-fiscal-policy-deadlines

- So really we're just talking about coronavirus

- I feel like this might resolve ambiguously: "The president or other federal official formally announces a government shutdown"

- Also it's unclear whether no work _with pay_ would count as a "furlough" or not , per: "And / or at least 200,000 federal employees are furloughed for at least 1 week"

- Not much on Google about this possibility: https://www.google.com/search?q=us+federal+government+shutdown+coronavirus&tbm=nws

- Also not much about state shutdowns/furloughs

- So I think this is pretty unlikely, predicting 15%

In [None]:
shutdown_prediction = will_x_happen_prediction(0.05, pendulum.datetime(2020,4,4), 0.08, pendulum.datetime(2020,4,19))

questions[2].binary_prediction = shutdown_prediction

In [None]:
shutdown_prediction

#### Apr 4
* Now several days later, not seeing any signs of this becoming more likely
  * so we should lower p
  * ~~write some simple code to do this automatically going forward~~

### 3. Will the Emergency Telework Act (S.3561) become law by 4/25/20?

https://pandemic.metaculus.com/questions/3918/will-the-emergency-telework-act-s3561-become-law-by-42520/

#### Thinking about the question

- looking at the question and the current comment, it seems like the thing to do is to model the procedure of bills becoming laws and then use that model to predict whether this one will
  - the first step is to build/find a qualitative model of the steps from bill to law
  - As the comment points out, it might require something really unusual for this to become law given that Congress is currently out of session, should look into that first
- What do I find from a quick Google News search on this? 
  - nothing from less than a week ago: https://www.google.com/search?q=Emergency+Telework+Act&source=lnms&tbm=nws

##### Qualitative model of bill to law

[This](https://www.zerotothree.org/resources/728-how-a-bill-becomes-a-law) seems like a good overall summary

And [this](https://www.usa.gov/how-laws-are-made) seems to have similar content

1.   Various things happen that we don't care about because they already happened for this bill, then: the bill gets introduced into the Senate and referred to a committee: ["Read twice and referred to the Committee on Homeland Security and Governmental Affairs."](https://www.congress.gov/bill/116th-congress/senate-bill/3561/all-actions?overview=closed&KWICView=false)
2. Committee delegates to subcommittee or submits to floor
3. debate on floor
4. votes in full House and Senate (by this point there may be separate bills in the Senate and House; not sure how likely that is for this bill, vs. just having one version
5. conference committee between House and Senate
6. full House and Senate agree to the version that came out of conference committee
7. President signs

##### Prediction
On further reflection, just seems unlikely to me that this will become law by the specified date given that it doesn't seem to have moved forward in the last week. Assigning 15% probability for now, may go back and build the model later


In [None]:
telework_prediction = will_x_happen_prediction(0.05, pendulum.datetime(2020,4,4), 0.08, pendulum.datetime(2020,4,25))

questions[3].binary_prediction = telework_prediction

telework_prediction

#### Apr 4
* any new news on this?
  * no
  * so that's a slight further negative update, I'll move us down from 0.15 to 0.13
* TODO:
  * reference class forecasting. What's the right reference class of bills? How likely are they to become law? How long does it take them to become law?
    * use this to build a model of the progress of the Emergency Telework Act towards becoming law, use that model to estimate p(becomes law) by 25 Apr

###	4. By May 1 will there be an iOS or Android app that shares an individual's COVID-19 infection status with more than 1M other users?

https://pandemic.metaculus.com/questions/3915/by-may-1-will-there-be-an-ios-or-android-app-that-shares-an-individuals-covid-19-infection-status-with-more-than-1m-other-users/

#### Thinking about the question
* Seems really hard to think about
* Some related stuff:
  * Apple has a coronavirus screening app: https://www.apple.com/covid19/ (does not seem to share your status)
  * A friend of mine who values digital privacy but also other things posted on FB: "Question: Under current employment law, what are the rules about whether you can require seeing the results of a medical test before hiring someone?". This suggests to me that people would likely be open to an app like this, despite the sort of panoptic weirdness of it
  * China:
    * https://www.nytimes.com/2020/03/01/business/china-coronavirus-surveillance.html
      * app assigns you a coronavirus risk assessment that's sent to the government
      * doesn't report this to other users, so doesn't count
  * Should look into what dating apps are doing/considering
    * [didn't find too much](https://www.google.com/search?q=dating+app+share+coronavirus+status&source=lnms&sa=X&ved=0ahUKEwjZyIv9-MXoAhWYJzQIHYx-DGEQ_AUIDSgA&biw=1200&bih=1809&dpr=1)
  * Seems likely that the app would star=t somewhere other than the US. East Asia seems particularly likely
* One way to make a breakthrough on this question would be to learn about an app/feature that's in development
* thinking about reference class, or at least just similar things -- what sort of info about themselves do users share with each other in this way?
  * current prediction: 55%. Would have been at like 70% but surprised that we haven't already seen this come out of East Aia
  * Can you report HIV or other STI status:
    * on Grindr:
      * [Yes](https://www.npr.org/sections/thetwo-way/2018/04/03/599069424/grindr-admits-it-shared-hiv-status-of-users)
      * not sure about other STIs
    * on Fetlife?
      * seems like no, but that you also can't enter much other info for your profile
    * on Tinder? 
      * No
  * on OKCupid, you can answer a question about how you're feeling about coronavirus



In [None]:
app_prediction = will_x_happen_prediction(0.05, pendulum.datetime(2020,4,5), 0.55, pendulum.datetime(2020,5,1))

questions[4].binary_prediction = app_prediction

app_prediction

#### 4 Apr update notes
* checked the comments, nothing new + interesting
* added in the consideration that you can answer on OKC re: how you're feeling about Coronavirus, which increased my p to 0.6

### 5. By June 1, how many tests for COVID-19 will have been administered in the US?	

https://pandemic.metaculus.com/questions/3916/by-june-1-how-many-tests-for-covid-19-will-have-been-administered-in-the-us/

Metaculus says that this is the best data source, seems reasonable: https://covidtracking.com/api/us/daily

In [None]:
daily_corona = pandas.read_csv("https://covidtracking.com/api/us/daily.csv")
daily_corona["date"] = daily_corona["date"].apply(lambda x: pendulum.parse(str(x)))
daily_corona = daily_corona.sort_values("date")

In [None]:
daily_corona

In [None]:
seaborn.pointplot(daily_corona["date"], daily_corona["totalTestResults"], color="red")
seaborn.pointplot(daily_corona["date"], daily_corona["totalTestResultsIncrease"], color="green")

Original notes: Looks like there have been a constant number of new tests per day lately, so just assume that will continue to be the case

In [None]:
def tests_model():
    latest_date = pendulum.instance(daily_corona.iloc[-1]["date"])
    total_tests = daily_corona.iloc[-1]["totalTestResults"]
    last_five_days = daily_corona.nlargest(5,"date")
    avg_recent_daily_tests = last_five_days["totalTestResultsIncrease"].mean()

    tests_per_day_high = ergo.normal_from_interval(avg_recent_daily_tests * 1, avg_recent_daily_tests * 10)
    tests_per_day_likely = ergo.lognormal_from_interval(avg_recent_daily_tests * 0.2, avg_recent_daily_tests * 2)
    tests_per_day = ergo.random_choice([tests_per_day_high, tests_per_day_likely, tests_per_day_likely, tests_per_day_likely])
    end_date = pendulum.date(2020,6,1)
    for i in range((end_date - latest_date).days + 1):
      date = latest_date.add(days=i)
      total_tests = total_tests + tests_per_day
      ergo.tag(total_tests, date.format('YYYY/MM/DD'))

test_samples = ergo.run(tests_model, 1000)

questions[5].samples = test_samples["2020/06/01"]

# questions[5].show_submission(questions[5].samples)
    

In [None]:
test_samples

In [None]:
questions[5].submit_from_samples(test_samples["2020/06/01"])

#### Apr 4 updates
* ~~increase number of samples to run to 1000, 100 might have been too few~~
* My prediction distribution is way too narrow:
  * I think so because:
    * A priori -- there's no way I'm that confident
    * compared to the community prediction
  * What I'm going to do:
    * ~~instead of sampling tests_today every day, sample tests_per_day for the whole duration. I had already identified this as a way to make the model make more sense, and it will also lead to a wider distribution~~
    * TODO: do this in some more sophisticated way where I can more clearly express my guesses for how many tests will be administered over the whole interval, compared to now. Seems good to do, planning to come back to this
* tests per day looks slightly less constant now
  * TODO: consider modeling test_per_day based on something other than average tests over the past few days -- fit a quadratic or do a linear regression or something. Will skip this for now but maybe come back to it later


### 6. [short fuse] How many total confirmed deaths of novel coronavirus will be reported in the state of New York by April 2nd?

In [None]:
# from this metaculus comment: https://pandemic.metaculus.com/questions/3934/short-fuse-how-many-total-confirmed-deaths-of-novel-coronavirus-will-be-reported-in-the-state-of-new-york-by-april-2nd/#comment-25503
supposed_covid_timeseries = pandas.read_csv("https://raw.githubusercontent.com/nytimes/covid-19-data/master/us-states.csv")
ny = supposed_covid_timeseries[supposed_covid_timeseries["state"].str.contains("New York")]

In [None]:
ny

In [None]:
# manually fit an exponential curve
seaborn.pointplot(ny["date"], ny["deaths"], color="red")
seaborn.pointplot(ny["date"], [1.28**x for x in range(0,len(ny["date"]))])

In [None]:
def death_model():
  fit_base = 1.28
  # mar 29 is day 29, so apr 2 is day 33
  day_number = 33
  fuzzed_base = ergo.lognormal_from_interval(fit_base - 0.01, fit_base + 0.01)
  ergo.tag(fuzzed_base**day_number, "ny_deaths")

death_samples = ergo.run(death_model, 1000)

questions[6].samples = death_samples.ny_deaths
# questions[6].show_submission(questions[6].samples)

### 7. What will be the US unemployment rate for March 2020?
https://pandemic.metaculus.com/questions/3922/what-will-be-the-us-unemployment-rate-for-march-2020/

In [None]:
monthly_unemployment_wide = pandas.read_csv("https://gist.githubusercontent.com/brachbach/d966ef4221215bdd58a5067802ded0be/raw/4a944a28eb9f222023396020913aabd507538060/monthly_unemployment_wide.csv")
# monthly_unemployment_wide

In [None]:
months = ["Jan","Feb","Mar","Apr","May","Jun","Jul","Aug","Sep","Oct","Nov","Dec"]
unemployment = pandas.melt(monthly_unemployment_wide, id_vars=["Year"],
                           value_vars=["Jan","Feb","Mar","Apr","May","Jun","Jul","Aug","Sep","Oct","Nov","Dec"],
                           var_name="month",
                           value_name="percent_unemployment")

unemployment["month"] = unemployment["month"].apply(lambda name: months.index(name) + 1)
unemployment = unemployment.sort_values(by=["Year", "month"])
unemployment["monthly_diff"] = unemployment["percent_unemployment"].diff()
# unemployment

In [None]:
biggest_monthly_increase = unemployment.nlargest(5, "monthly_diff")
biggest_monthly_increase

In [None]:
def unemployment_model():
  proxy_delta = ergo.random_choice(biggest_monthly_increase["monthly_diff"].to_list())
  fuzzed_delta = ergo.normal_from_interval(proxy_delta * 0.5, proxy_delta * 2)
  feb_rate = 3.5
  ergo.tag(feb_rate + fuzzed_delta, "mar_unemployment")

unemployment_samples = ergo.run(unemployment_model, 1000)

questions[7].samples = unemployment_samples.mar_unemployment
# questions[7].show_submission(questions[7].samples)

## 8. How many days will the city of New Orleans spend under lockdown between 2020-03-25 and 2020-04-15?

https://pandemic.metaculus.com/questions/3930/how-many-days-will-the-city-of-new-orleans-spend-under-lockdown-between-2020-03-25-and-2020-04-15/

Predicting in the [lockdown model](https://colab.research.google.com/drive/1BRplIkEvySIWLDfL2m-I2-69ES725Hnv)

## 9. Will Florida go under lockdown between 2020-03-25 and 2020-04-25?

https://pandemic.metaculus.com/questions/3926/will-florida-go-under-lockdown-between-2020-03-25-and-2020-04-25/

#### Thinking through the question
* Which other states are currently not under lockdown?
  * https://www.usatoday.com/story/news/nation/2020/03/21/coronavirus-lockdown-orders-shelter-place-stay-home-state-list/2891193001/
    * lists 34 states (`$('h2').length`)
  * https://www.wsj.com/articles/a-state-by-state-guide-to-coronavirus-lockdowns-11584749351
    * lists 31 states (`$("h6").length`), but actually Florida is on there so not all are true lockdowns
* Hmm, resolution may well be ambiguous. Florida is already doing some partial lockdown:
  * https://www.wsj.com/articles/florida-unlike-other-hard-hit-states-avoids-broad-coronavirus-lockdown-11585560601
  *[https://www.wsj.com/articles/a-state-by-state-guide-to-coronavirus-lockdowns-11584749351](reports that some parts of Florida have closed down business, but not all)
  * what if e.g. the Governor orders the businesses that ordinarily serve most Floridians to shut down, but doesn't close rural ones, and doesn't order people to stay home?

Doesn't seem crazy to me to think that the Governor will maintain the current not-quite a lockdown policy

In [None]:
questions[9].binary_prediction = 0.55

# Submit predictions

Convert samples to Metaculus distributions and visualize:

In [None]:
for question in questions:
  if question.type == "binary":
    print(f"Prediction for {question}: {question.binary_prediction}")
  elif question.type == "continuous":
    try:
      question.show_submission(question.samples)
    except:
      print(f"No submission or submission can't be shown for {question}")
    print("\n\n")
  else:
    raise ValueError("Unknown question type!")
  print("\n\n")

If everything looks good, submit the predictions!

In [None]:
def submit_all():
  for question in questions:
    if question.type == "binary":
      try:
        r = question.submit(question.binary_prediction)
        print(f"Submitted for {question.name}")
        print(f"https://pandemic.metaculus.com{question.page_url}")
      except requests.exceptions.HTTPError as e:
        print(f"Couldn't make prediction for {question.name} -- maybe this question is now closed? See error below.")
        print(e)
    elif question.type == "continuous":
      try:
        question.samples
      except:
        print(f"No predictions for {question}")
        continue

      try:
        r = question.submit_from_samples(question.samples)
        print(f"Submitted for {question.name}")
        print(f"https://pandemic.metaculus.com{question.page_url}")
      except requests.exceptions.HTTPError as e:
        print(f"Couldn't make prediction for {question.name} -- maybe this question is now closed? See error below.")
        print(e)
    else:
      raise ValueError("Unknown question type!")
      

submit_all()

# TODO
1. consider automatically pulling all open questions and matching them to models