# Using Python to Access the Geopolitical Forecasting Challenge 2 API


## Introduction

Participation in Geopolitical Forecasting (GF) Challenge 2 is conducted through an API provided by Cultivate Labs. This notebook briefly demonstrates some ways to interact with the API through Python. These examples assume that you are familiar with Python 2 or 3, and have installed the [Requests](http://docs.python-requests.org/en/master/) library (which is included in distributions such as Anaconda). (NOTE: You may use these examples as part of your Challenge solution, but it is important to note that this code is meant primarily as a reference, and should not be assumed to be bug-free, or particularly efficient. We make no guarantees or warranties that this code will correctly retrieve or submit data to the API -- you are responsible for verifying that your forecasts are submitted correctly).

Prior to participating in GF Challenge 2, you must register at [HeroX](https://www.herox.com/IARPAGFChallenge2), and then register on the Cultivate platform using the URLs provided. Upon registering on the Cultivate platform, you will be able to generate an API key for the staging (test) instance, and production (competition) instance of the API. This API key is unique to you, and you should ensure that it is not shared with others.  This API key is used to identify and authenticate your requests and submissions to the API.  (*Note that API keys from the first GF Challenge have been deactivated; you will need a new key to participate in GF Challenge 2*.)

You will find complete [API documentation on the Cultivate Labs site](https://cultivate-hfc.github.io/gfc-api-docs/) once you receive your API key. This notebook does not provide an exhaustive overview of the entire API. Instead, it is designed to highlight some key considerations for implementing a GF Challenge 2 API client. It is not a substitute for a thorough understanding of the API documentation. This document describes some concepts related to the challenge, however, it is not an authoritative source of challenge rules. You are responsible for reviewing and understanding the official GF Challenge 2 Rules document available on the HeroX site.

All code in this document is provided using the [CC0 1.0 Universal (CC0 1.0) Public Domain Dedication](https://creativecommons.org/publicdomain/zero/1.0/).

## General Details

There are two separate but similar APIs for GF Challenge 2 on the Cultivate Labs platform.  The **Staging** platform (URL https://api.gfc-staging.com) is provided for Solvers to get familiar with the procedures for accessing the data and submitting forecasts in a practice mode.  The **Production** platform (URL https://api.iarpagfchallenge.com) is the platform for the competition.  The two platforms use separate API keys.  It is recommended that you use the Staging platform as you work through the code examples in this notebook.

### Using your API Key

All calls to the API must include your API key in the request headers in the following form:

`headers['Authorization'] = 'Bearer ' + secret_token`

So, one can submit a GET request to the API this way:

In [1]:
MY_STAGING_SECRET = "a0a4d8f1a34501875101650ee2d60a2dc9dd27c38f0e1262a57ce7e1af1743a1" # Insert your API token here

In [2]:
STAGING_URL = "https://api.iarpagfchallenge.com"

import requests

secret_token = MY_STAGING_SECRET
server = STAGING_URL
url = server + '/api/v1/questions' # The endpoint to retrieve questions
headers = {'Authorization':'Bearer ' + secret_token}
params = {"training_data": "true"} # More to come on this in a moment

result = requests.get(url, headers=headers, params=params) 

if result.ok:
    j = result.json() # This will be the content you are interested in.
    ifp_count = len(j["questions"])
    print("We retrieved {} IFPs".format(ifp_count))
else:
    print('PROBLEM:', result.status_code, result.text)

We retrieved 10 IFPs


If your request is successful the variable `j` will hold a Python dict of the form {"questions": List of dicts} where each dict will contain the data for a single IFP.

When POSTing forecasts, things look pretty much the same, except it would be formatted with the `.post()` method, and the parameters would be submitted as `json`:

`result = requests.post(url, headers=headers, json=params)`

### Passing parameters to the API

Typically, you'll want to provide specific parameters when making your API calls, (e.g., asking for human forecasts made against a specific forecasting question, requesting information that has been updated since you last checked).  To do that, you will pass a set of parameters in python dictionary form.  The dictionary structure will be identical to the examples provided in the Cultivate API documentation.

As an example, you may want to receive all human forecasts made against question number 5 since May 20, 2019. You would set your params dictionary as:

`params = {'question_id':5, 'created_after':'2019-05-20T00:00:00.000Z'}`

The full set of required and optional input parameters are listed in the Cultivate API documentation.

### Recieving Responses

When a GET or POST request is successfully executed, you will receive a JSON formatted response which can be accessed as the response's `.json()` object.

#### Paging

All paginated endpoints will also include 2 pagination-related response headers: `X-Total-Page-Count` and `X-Total-Record-Count`. `X-Total-Page-Count` contains the total number of pages available for your request, while `X-Total-Record-Count` contains the total number of records that will be included across all of those pages.

You can access these through the response's `headers` dictionary:

```
result = requests.get(url, headers=headers, params=params)

if result.ok:
    totalPages = int(resp.headers.get('X-Total-Page-Count',0))
```

By default, results are returned for the first page (page 0), updating the `params` dictionary to include `params['page'] = 1` would get the next page.

## Primary API Endpoints

### Retrieving Questions

Individual Forecasting Problems (IFPs) are questions about future events that solvers forecast against, and can be retrieved from the `questions` API endpoint. Each IFP includes, among other fields, an `id` a `description`, a set of `answers`, and a collection of `metadata` as well as starting and ending dates. Questions can be retrieved using a GET request with optional parameters.  The `status` parameter (values: active, closed, all) describes whether an IFP is open for forecast submission.  The several date parameters `created_before`, `created_after`, `updated_before`, and `updated_after` limit the response based on when questions were updated or created. The `training_data` parameter is new to GF Challenge 2 and is used to retrieve IFPs that were part of the HFC program but are not in GF Challenge 2 and which are available for Solvers to train their methods.  (More below).  The `answers` are a list of mutually exclusive, and collectively exhaustive options describing possible IFP outcomes. Forecasts must specify probabilities for each possible outcome that sum to 1.0 (except for binary questions (e.g., yes/no) for which only one option is presented, with the other option being calculated as 1 - Option A).

The `metadata` that is returned includes `Domain`, `Topic`, and location information. In GF Challenge 2 the metadata will only be populated as an IFP is closed.  (In the first GF Challenge this metadata was provided when an IFP was launched.)  This metadata is provided *post hoc* to allow Solvers to evaluate their techniques for identifying the region or subject of an IFP but does not figure into scoring in any way, as there are no Domain or Region prizes in GF Challenge 2.  Refer to the [GF Challenge 2 Rules document](NEEDS URL ONCE PUBLISHED) and the API documentation to ensure that you accurately retrieve and handle the relevant fields. 

Each IFP also includes a list of `clarifications`. This will be populated if there is additional guidance regarding the IFP issued. This can include situations where terms are further defined, or sources of resolution are changed. You should regularly check for updates to this field.

### Retrieving Individual and Aggregate Human Forecasts

As part of the Challenge, solvers will have access to forecasts made by a crowd of human forecasters. These forecasts will be made available in two forms: individual and aggregate.  The individual forecasts are made available through the `prediction_sets` API end point, while the aggregate forecasts are provided through the `consensus_histories` end point. The consensus is calculated using an aggregation algorithm called logit. The logit aggregation method is an extremizing method that uses a weighted geometric mean to aggregate forecasts. Forecaster weights are calculated based on 3 factors: historical accuracy, the frequency with which the forecaster updates his or her forecasts, and whether the forecaster completed a training course. The `consensus_histories` data also serve as the baseline against which Solvers will be compared. More details on the baseline and scoring can be found in the official GF Challenge 2 Rules.

Individual forecasts contain the probability forecasts, including a `question_id` that maps to the `id` in from the `questions` endpoint and `membership_guid` which uniquely identifies the human forecaster. Each forecast will include a list of `predictions` reflecting the probabilities that person assigned to each possible answer. The `forecasted_probability` for each answer represents that person's beliefs for each answer.

Consensus aggregations contain a `question_id` and `answer_id` pair that identify a particular answer option. The `normalized_value` for a particular answer reflects the score such that all answers to a particular question will sum to 1.0. This consensus is updated any time a human forecaster makes a new forecast against an IFP. The `consensus_histories` API end point contains a list of these updated consensus scores. The most recent consensus for a particular question reflects the current crowd consensus at that time. NOTE: With the exception of the first time you query this endpoint, you should NOT retrieve `consensus_histories` without specifying a `created_after` parameter.  

### Training Data (New in GF Challenge 2)

There are two collections of non-competition training data that are available for Solvers to train and backtest their methods.  The training data on the Staging platform includes IFPs, individual human forecasts, and consensus histories for 86 IFPs between December 2018 and March 2019.  All of these IFPs are closed and have been resolved.  The training data on the Production platform consists of these elements for HFC IFPs that launched before the start of the GF Challenge 2 competition and which are not included in the Challenge.  Depending on when you access the Production training data you may have a mixture of open and closed IFPs, especially early in the Challenge.

Please note that `membership_guid` values for the Staging training data will not be reused in GF Challenge 2.  However the `membership_guid` values in the Production platform training data will map to the same forecasters during the competition and are suitable for determining forecaster attributes such as accuracy and update frequency.

### Submitting Forecasts

Forecast submission is done through an HTTP POST. Your forecast must include the `question_id`, an `external_predictor_attributes.method_name`, and `external_predictions_attributes`: a list of dictionaries each containing an `answer_id` and `value` for each of the forecast question's possible alternatives. The sum of the values must equal 1.0. 

Each Solver is allocated 40 methodological "slots." These slots can be used to represent different strategies for weighting data sources, different algorithms, etc. The `method_name` parameter is used to identify which slot a forecast should be associated with. `method_name` can be up to 50 characters, and will be held constant throughout the challenge (i.e., you cannot add a 41st method). You will be scored on a per `method_name` basis, with only your best performing approach being considered for each prize category. For more details, review the GF Challenge 2 rules.

A forecast submission can look like:

```
params = {"external_prediction_set": {
        "question_id": 123,
        "external_predictor_attributes": {
            "method_name": "red"
            },
        "external_predictions_attributes": [
            {"value": 0.6, "answer_id": 431},
            {"value": 0.35, "answer_id": 432},
            {"value": 0.05, "answer_id": 433}
            ]
          }}
```

A successful submission of a forecast to the submission endpoint will return a json summary of the submission, including the time of submission. A failure (e.g., incorrect `answer_id` or `value`s that don't total to 1.0, or attempting to create a 41st `method_name`) will result in json describing the error. It is advisable to inspect the resulting json to ensure it reflects the intended forecast.

## Putting it All Together

We can implement API access into a single Python class so we can make GET and POST requests in a consistent fashion.  Below, we define a `GfcApi` class that allows us to specify a server and API token one time and access all the API endpoints.

In [3]:
import requests
import time
import datetime

from pprint import pprint #This is just to make things look pretty...

#Make this python 2 and 3 compliant
from __future__ import print_function

class GfcApi(object):
    """
        An example class for interacting with the Geopolitical Forecasting Challenge 2
        API.  Note that this code is for reference purposes, no warranties are expressed
        or implied.  
    """
    def __init__(self,token,server,proxy=None,verbose=False):
        """
            Create an instance of an API client. This assumes you have an OAuth token.
            
            Arguments
            
            REQUIRED
            token - <string> - The secret API token assigned when registering on the 
                               Cultivate platform
            
            server - <string> - The beginning of the server url in the form:
                                https://api.XXXXXXX.com.  This is described in the
                                Cultivate API documentation
            
            OPTIONAL
            proxy - <dictionary> - If you are behind a proxy server, you can specify the details
                                   in the form: 
                                       proxy = {'http': http_proxy,
                                                'https': https_proxy,
                                                'ftp': ftp_proxy}
                                   where an individual entry might be [ip address:port]. See the
                                   requests library documentation for more details.
                Default: None
                
            verbose - <boolean> - If true, we print GET and POST request URLs and params
                Default: False
            
        """
        
        self.token = token
        self.server = server
        self.proxy = proxy
        self.verbose = verbose
        
        self.sess = requests.session()
        self.rate_limit_delay = 1 #seconds between subsequent API calls
        self.last_call_time = 0.0 
        self.set_urls()
    
    def set_urls(self):
        if not self.server.endswith('/'):
            self.server += '/'
    
        self.api_base = self.server + 'api/v1/'
    
        self.consensus_histories_url = self.server + 'aggregation/api/v1/control/consensus_histories'
        self.external_prediction_sets_url = self.api_base + 'external_prediction_sets'
        self.prediction_sets_url = self.api_base + 'control/prediction_sets'
        self.questions_url = self.api_base + 'questions'

    def get_questions(self, status=None, created_before=None, created_after=None,
                      sort='published_at', updated_before=None, updated_after=None,
                      training_data=False):
        """
            This function retrieves Individual Forecasting Problems (IFPs).

            Optional Inputs:
            status - <string> - IFP status
                    Possible Values:
                        'active' - only return questions that are currently open for forecasting
                        'closed' - return all resolved or otherwised closed questions
                        'all'    - return all active and closed questions
                    Default Value:
                        'active'

            created_before - <datetime> - returns only questions created before this time

            created_after - <datetime> - returns only questions created after this time
            
            sort - <string> - Sort order of returned questions
                    Possible Values:
                        'published_at'
                        'ends_at'
                        'resolved_at'
                        'prediction_sets_count'
                    Default Value:
                        'published_at'
            
            updated_before - <datetime> - returns only questions updated before this time
            
            updated_after - <datetime> - returns only questions updated after this time
            
            training_data - <boolean> - returns only the questions for the training data, 
                otherwise returns competition questions.  Default False.
                    
             Output:
            JSON representation of a list of Individual Forecasting Problems
        """
        
        url = self.questions_url
        section = 'questions'
        params={}
        
        if created_before:
            params['created_before'] = created_before.isoformat()
        if created_after:
            params['created_after'] = created_after.isoformat()
        if created_before:
            params['updated_before'] = updated_before.isoformat()
        if created_after:
            params['updated_after'] = updated_after.isoformat()
        if status:
            params['status'] = status
        if sort:
            params['sort'] = sort
        if training_data:
            params['training_data'] = 'true'
        
        return self._get_pages(url=url,section=section,params=params)
    
    def get_human_forecasts(self, question_id=None, created_before=None, created_after=None,
                           updated_before=None, updated_after=None, training_data=False):

        """
            This function retrieves the stream of human forecasts against IFPs.

            Optional Inputs:
            question_id - <integer> - returns predictions for a single question
                    Default Value:
                        None

            created_before - <datetime> - returns only predictions created before this time

            created_after - <datetime> - returns only predictions created after this time
            
            updated_before - <datetime> - returns only predictions updated before this time
            
            updated_after - <datetime> - returns only predictions updated after this time

            training_data - <boolean> - returns only the human forecasts for the training data, 
                    otherwise returns competition human forecasts.  Default is False.
                                        
             Output:
            JSON representation of a list of human forecasts
        """
        
        url = self.prediction_sets_url
        section = 'prediction_sets'
        params={}
        
        if created_before:
            params['created_before'] = created_before.isoformat()
        if created_after:
            params['created_after'] = created_after.isoformat()
        if updated_before:
            params['updated_before'] = updated_before.isoformat()
        if updated_after:
            params['updated_after'] = updated_after.isoformat()
        if question_id:
            params['question_id'] = question_id
        if training_data:
            params['training_data'] = "true"

       
        return self._get_pages(url=url,section=section,params=params)        
    
    def get_consensus_histories(self, question_id=None, created_before=None, created_after=None,
                           updated_before=None, updated_after=None, training_data=False):

        """
            This function retrieves the consensus of human forecasts against IFPs.

            NOTE: You need to include some date constraints after your first use of this API. 
            Always utilize the created_after parameter to pull only those records that have 
            been created since you last accessed the API. Do not attempt to pull every 
            record/page of the history.

            Optional Inputs:
            
            question_id - <integer> - returns only predictions made about a specific IFP
                Default Value
                    None

            created_before - <datetime> - returns only predictions created before this time

            created_after - <datetime> - returns only predictions created after this time
            
            updated_before - <datetime> - returns only predictions updated before this time
            
            updated_after - <datetime> - returns only predictions updated after this time
                    
            training_data - <boolean> - returns only the consensus histories for the training data, 
                 otherwise returns competition consensus histories.  
                 Default False.
                    
             Output:
            JSON representation of a list of human forecasts
        """
        
        if (not created_before) and (not created_after) and (not updated_before) and (not updated_after):
            print("After your first query, use a date constraint (created_before/after or",\
                  "updated_before/after) to get consensus history. Old values won't change")
        
        url = self.consensus_histories_url
        section = 'consensus_histories'
        params={}
        
        if question_id:
            params['question_id'] = str(question_id)
        if created_before:
            params['created_before'] = created_before.isoformat()
        if created_after:
            params['created_after'] = created_after.isoformat()
        if updated_before:
            params['updated_before'] = updated_before.isoformat()
        if updated_after:
            params['updated_after'] = updated_after.isoformat()
        if training_data:
            params['training_data'] = "true"

        
        return self._get_pages(url=url,section=section,params=params)   
    
    def submit_forecast(self,question_id,method_name,predictions):
        """
            Submit probabilistic forecasts against a question.
            
            Required Parameters
            
            question_id - <integer> - The question_id of the IFP being forecast against
            
            method_name - <string> - The name of one of your 25 forecasting methods. Up to 50 chars
                             NOTE: This is used to track and score your forecasting methods. You
                             are responsible for keeping track of your named methods. Using a new
                             method_name will automatically add a new method - unless you have
                             already created 25 methods. In that case, you'll get an error message
                             in the response.
                             
            predictions - <list> - A list of Dictionaries in the form .
                                                   {'answer_id': <Integer>, 'value': <Decimal>}
                            
                          If the question is binary (exactly two possible answers), you only submit a
                          prediction for one possible answer, with the other being equal to 1 minus
                          your prediction for option A.
                          
                          NOTE: The set of values in the forecast must equal exactly 1.0 or you will 
                          receive an error message in the response.
            
     RESPONSE
     The json response will either summarize your forecast to this question, or it will contain an 
     error message indicating why it wasn't accepted.  You are responsible for recieving and reviewing
     the response to ensure that your forecast was accepted, and reflects your intentions.  You can 
     resubmit forecasts to a particular IFP repeatedly over the course of a forecast day, and each
     new submission will replace older submissions for scoring purposes. Review the GF Challenge
     Rules for details on forecast submission and scoring.
        """
    
        url = self.external_prediction_sets_url
        
        params={'external_prediction_set':{'question_id':question_id,
                                          'external_predictor_attributes':
                                           {'method_name':method_name},
                                           'external_predictions_attributes':predictions}
                }
        
        return self._post(url,params)
    
    def _forecast_template(self,ifp):
        """
            A tiny little helper function to create the basis for the predictions parameter in
            the submit_forecast function.  You pass an IFP from the questions API into this 
            function and receive a list of 'answer_id' and 'value' dictionaries that are
            needed to submit a forecast.  
            
            NOTE: This sets the forecast probability (value field) to None, which will result in an error upon
            submission.  You must set this to a legal probability.
        """
        
        output = [{'answer_id':a['id'],'value':None} for a in ifp['answers']]
        return output
    
    def _get_pages(self,url,params,section):
        
        """
            This function uses _get to make authenticated calls to the
            relevant API endpoints with the user-provided parameters.
            
            This function handles paging through results, and returns only the list from
            the resulting json result(s).
            
            The 'url' and 'params' describe the API query, the 'section' is the key in the
            returned json that contains the list of query results (e.g., 'questions').
        """
        if self.verbose:
            print('Get Pages for {}'.format(url))
            print(params)
        page = 1
        maxPage = 1
        
        all_results = []
        this_batch = []
        while page <= maxPage: 
            
            params['page']=page
            resp = self._get(url=url,params=params)
            maxPage = int(resp.headers.get('X-Total-Page-Count',0))
            try:
                results=resp.json()
            except:
                results=None
            if isinstance(results,(list,dict)):
                if 'errors' in results:
                    print(results['errors'])
                    return results
                
                this_batch = results[section]
                all_results.extend(this_batch)

                page+=1
            else:
                if self.verbose:
                    print("PROBLEM")
                return results

        return all_results                
        
    def _get(self,url,params):
        """
            A helper function that handles authentication and rate limiting.
            
            Given a URL and a set of parameters, this function calls the Cultivate API
            and returns the json response.
        """
        
        while time.time() < self.last_call_time + self.rate_limit_delay:
            if self.verbose:
                print("{}: Sleeping".format(time.ctime()))
            time.sleep(1)
        
        headers={'Authorization':'Bearer ' + self.token} #This is needed to authenticate

        if self.verbose:
            print("{}: GETTING {}".format(time.ctime(),url))
            safeHeaders = {k:v for k,v in headers.items() if k!='Authorization'}
            safeHeaders['Authorization']="Bearer <shhhhhh it's a secret>"
            print("\tHeaders: {}".format(safeHeaders))
            print("\tArgs: {}".format(params))
        resp = self.sess.get(url, headers=headers, params=params, proxies=self.proxy)
                                                                                         
        self.last_call_time = time.time()
        return resp
    
    def _post(self,url,params):
        """
            A helper function that handles authentication.
            
            Given a URL and a set of parameters, this function submits a POST to the 
            Cultivate API and returns the json response.
            
            Output
            JSON response describing the forecast or indicating an error.

        """

        while time.time() < self.last_call_time + self.rate_limit_delay:
            if self.verbose:
                print("{}: Sleeping".format(time.ctime()))
            time.sleep(1)
        
        headers={'Authorization':'Bearer ' + self.token} #This is needed to authenticate

        if self.verbose:
            print("{}: POSTING {}".format(time.ctime(),url))
            safeHeaders = {k:v for k,v in headers.items() if k!='Authorization'}
            safeHeaders['Authorization']="Bearer <shhhhhh it's a secret>"
            print("\tHeaders: {}".format(safeHeaders))
            print("\tArgs: {}".format(params))
        resp = self.sess.post(url, headers=headers, json=params, proxies=self.proxy) 
                                                                                         
        self.last_call_time = time.time()
        
        return resp.json()

We can invoke this class by specifying our secret_token and server.

One strategy for token management is to create a dictionary of server instances with the server address and API token like:

In [4]:
# Put your tokens in place of <STAGING API KEY> and <PRODUCTION API KEY> below.
secrets = {'staging':
              {'key':'<STAGING API KEY>',
               'server':'https://api.gfc-staging.com'},
           'production': 
              {'key':MY_STAGING_SECRET,
               'server':'https://api.iarpagfchallenge.com'}}

We can create an instance of the `GfcApi` class thusly:

In [5]:
instance='production'
gf=GfcApi(secrets[instance]['key'],secrets[instance]['server'],verbose=True)

Once we create an instance of the `GfcApi` class, we retrieve Individual Forecasting Problems (IFPs). We could limit our queries of IFPs based on several criteria:

- Question status (active or not)
- Date of creation or update (useful for finding clarifications)
- Training data (true or false)

The code snippet below illustrates using an instance of the GfcApi class to retrieve IFPs from the staging platform and to summarize them by extracting some of the fields from the IFP content.  In this example no filters were applied for `status`, creation or update dates, or `training_data`, so the query retrieves all non-training data IFPs on the platform.


In [6]:
ifps=gf.get_questions(training_data=True)
print("We've downloaded {} IFPs\n".format(len(ifps)))

for ifp in ifps:
    print("IFP {}: {}".format(ifp['id'],ifp['name']))
    print("Description: {}".format(ifp['description']))
    print("Starts: {}, Ends: {}".format(ifp['starts_at'],ifp['ends_at']))
    print("Options:")
    for answer in ifp['answers']:
        print(' ({}) {}'.format(answer['id'],answer['name']))
        
    if ifp['clarifications']:
        print('Clarifications:')
        print(ifp['clarifications'])
    print("")    


Get Pages for https://api.iarpagfchallenge.com/api/v1/questions
{'sort': 'published_at', 'training_data': 'true'}
Wed Jun 12 16:16:11 2019: GETTING https://api.iarpagfchallenge.com/api/v1/questions
	Headers: {'Authorization': "Bearer <shhhhhh it's a secret>"}
	Args: {'sort': 'published_at', 'training_data': 'true', 'page': 1}
Wed Jun 12 16:16:12 2019: Sleeping
Wed Jun 12 16:16:13 2019: GETTING https://api.iarpagfchallenge.com/api/v1/questions
	Headers: {'Authorization': "Bearer <shhhhhh it's a secret>"}
	Args: {'sort': 'published_at', 'training_data': 'true', 'page': 2}
Wed Jun 12 16:16:14 2019: Sleeping
Wed Jun 12 16:16:15 2019: GETTING https://api.iarpagfchallenge.com/api/v1/questions
	Headers: {'Authorization': "Bearer <shhhhhh it's a secret>"}
	Args: {'sort': 'published_at', 'training_data': 'true', 'page': 3}
Wed Jun 12 16:16:16 2019: Sleeping
Wed Jun 12 16:16:17 2019: GETTING https://api.iarpagfchallenge.com/api/v1/questions
	Headers: {'Authorization': "Bearer <shhhhhh it's a sec

In [7]:
ifps=gf.get_questions(training_data=False)
print("We've downloaded {} IFPs\n".format(len(ifps)))

for ifp in ifps:
    print("IFP {}: {}".format(ifp['id'],ifp['name']))
    print("Description: {}".format(ifp['description']))
    print("Starts: {}, Ends: {}".format(ifp['starts_at'],ifp['ends_at']))
    print("Options:")
    for answer in ifp['answers']:
        print(' ({}) {}'.format(answer['id'],answer['name']))
        
    if ifp['clarifications']:
        print('Clarifications:')
        print(ifp['clarifications'])
    print("")    


Get Pages for https://api.iarpagfchallenge.com/api/v1/questions
{'sort': 'published_at'}
Wed Jun 12 16:16:27 2019: Sleeping
Wed Jun 12 16:16:28 2019: GETTING https://api.iarpagfchallenge.com/api/v1/questions
	Headers: {'Authorization': "Bearer <shhhhhh it's a secret>"}
	Args: {'sort': 'published_at', 'page': 1}
Wed Jun 12 16:16:29 2019: Sleeping
Wed Jun 12 16:16:30 2019: GETTING https://api.iarpagfchallenge.com/api/v1/questions
	Headers: {'Authorization': "Bearer <shhhhhh it's a secret>"}
	Args: {'sort': 'published_at', 'page': 2}
Wed Jun 12 16:16:30 2019: Sleeping
Wed Jun 12 16:16:31 2019: GETTING https://api.iarpagfchallenge.com/api/v1/questions
	Headers: {'Authorization': "Bearer <shhhhhh it's a secret>"}
	Args: {'sort': 'published_at', 'page': 3}
Wed Jun 12 16:16:33 2019: Sleeping
Wed Jun 12 16:16:34 2019: GETTING https://api.iarpagfchallenge.com/api/v1/questions
	Headers: {'Authorization': "Bearer <shhhhhh it's a secret>"}
	Args: {'sort': 'published_at', 'page': 4}
Wed Jun 12 16:1

In [8]:
pprint(sorted([(i["id"], i["name"]) for i in ifps]))

[(1930,
  'Will the PITF Worldwide Atrocities Dataset record an event perpetrated by a '
  'state actor in Brazil (BRA) that starts between 18 August 2019 and 18 '
  'September 2019?'),
 (1935,
  'Will ACLED record any riots in Thailand between 7 June 2019 and 7 July '
  '2019?'),
 (1940, 'What will be the daily closing price of gold on 10 July 2019 in USD?'),
 (1945,
  'Will Sudan execute or be targeted in a national military attack between 16 '
  'May 2019 and 30 June 2019?'),
 (1950,
  'How many Total Registered Syrian Refugees will be reported by the UNHCR on '
  '15 July 2019?'),
 (1955,
  'What will be the price of casual labor (unskilled, daily, without food) in '
  'Somalia in the Elwak market for June 2019?'),
 (1960,
  'Will there be a significant day-over-day increase in worldwide search '
  'interest in the term "ISIS" reported by Google Trends between 1 November '
  '2019 and 29 November 2019?'),
 (1965,
  'Will Julian Assange leave the UK between 16 May 2019 and 15 July 2

In [9]:
pprint([i for i in ifps if not i["active?"]][1])

IndexError: list index out of range

We can retrieve human forecasts. If we'd like, we can limit them to a particular `question_id`, and can constrain the creation or update dates. In this notebook we will set the `training_data` flag to `True` to ensure that there are human forecasts to retrieve from the staging platform.  To limit the response we will only look at the last portion of the training data by passing a value for `created_after`

In [None]:
preds=gf.get_human_forecasts(training_data=False, created_after=datetime.datetime(2019,5,16,0,0,0))
if 'errors' in preds:
    print("We ran into a problem:")
    print(preds)
else:
    print("Retrieved {} human forecasts".format(len(preds)))

In [None]:
q_id = 1930
q_preds = [p for p in preds if p["question_id"] == q_id]
print(len(q_preds))

In [None]:
from collections import Counter
Counter([p["question_id"] for p in preds])

Let's look at an item in the human forecast stream. The `question_id` links us to the `get_questions()` results. The `membership_guid` is the unique identifier for a human forecaster, and will remain consistent throughout the Challenge.

Each item in the `predictions` list includes the `answer_id` for that alternative, which aligns to the `get_questions()` output, and a `forecasted_probability` which indicates the human forecaster's submitted probability for that alternative.

Here is an example prediction from the training data available on the staging platform.  Examining the elements of the forecast:

- `created_at`: the timestamp of the forecast.
- `id`: a unique identifier for the forecast.
- `membership_guid`: a unique identifier for the forecaster.
- `predictions`:  The IFP has 5 answers and the forecast includes probabilistic forecasts for all of the answers (multinomial or ordinal IFPs) or for the "Yes" answer (binary IFPs), found in the `forecasted_probability` field.  These forecasts add up to 1.0 (values are 0, 0, 0.5, 0.5, 0) for multinomial or ordinal IFPs.  For binary IFPs the forecast will be between 0 and 1.0 inclusive and the implicit forecast for the "No" answer is 1 - the forecast.
- `question_id`: A unique identifier for the IFP.
- `question_name`: The name of the IFP.
- `rationale`: Forecaster entered text, explaining the reasoning behind the forecast.
- `updated_at`: The timestamp the forecast was updated, generally the same as the `created_at` timestamp.


```
{'created_at': '2019-03-12T18:00:03.675Z',
 'id': 64797,
 'membership_guid': '4662cf7dbb06d39017571d84ea88d6f1d69100a8',
 'predictions': [{'answer_id': 8161,
                  'answer_name': 'Less than $1,240',
                  'final_probability': 0.0,
                  'forecasted_probability': 0.0,
                  'id': 154343,
                  'made_after_correctness_known': False,
                  'starting_probability': 0.0},
                 {'answer_id': 8162,
                  'answer_name': 'More than $1,240 but less than $1,290, '
                                 'inclusive',
                  'final_probability': 0.2,
                  'forecasted_probability': 0.5,
                  'id': 154344,
                  'made_after_correctness_known': False,
                  'starting_probability': 0.2},
                 {'answer_id': 8163,
                  'answer_name': 'Between $1,290 and $1,330',
                  'final_probability': 0.55,
                  'forecasted_probability': 0.5,
                  'id': 154345,
                  'made_after_correctness_known': False,
                  'starting_probability': 0.55},
                 {'answer_id': 8164,
                  'answer_name': 'More than $1,330 but less than $1,380, '
                                 'inclusive',
                  'final_probability': 0.1,
                  'forecasted_probability': 0.0,
                  'id': 154346,
                  'made_after_correctness_known': False,
                  'starting_probability': 0.1},
                 {'answer_id': 8165,
                  'answer_name': 'More than $1,380',
                  'final_probability': 0.0,
                  'forecasted_probability': 0.0,
                  'id': 154347,
                  'made_after_correctness_known': False,
                  'starting_probability': 0.0}],
 'question_id': 2995,
 'question_name': 'What will be the daily closing price of gold on 13 March '
                  '2019 in USD?',
 'rationale': 'This is really down to the last minute. The price has been '
              'teetering just around the 1290 mark for several days. It will '
              'either be slightly above or slightly below 1290 by tomorrow, '
              'which is the end of the time period.',
 'updated_at': '2019-03-12T18:00:03.675Z'}
 ```

We can retrieve the baseline consensus forecasts using `get_consensus_histories()`. As described in the API documentation, and above, after your first call to this API endpoint, you should constrain your requests using something like `created_after` while storing and tracking older values locally. Note that we're using `datetime.datetime()` objects to specify the `created` and `updated` parameters. You can limit this request by `question_id` if desired.

In [None]:
cons = gf.get_consensus_histories(created_after=datetime.datetime(2019,5,15,12,0,0),
                                  updated_before=datetime.datetime(2019,5,16),
                                  training_data=False) 
print("retrieved {} consensus scores".format(len(cons)))

Let's look at these results. Note that each item in the list represents a single answer -- unlike an item in the `get_human_forecasts()` results where each entry represents the predictions for each possible answer for a single IFP.

The `normalized_value` scores for all the answers to a single IFP for a specific `consensus_at` time will add up to 1.0.

Here's an example consensus history from the training data on the staging platform.

- `answer_id`: Identifier for the answer, matches the questions API.
- `consensus_at`, `created_at`, `updated_at`: Timestamp for the consensus creation, computation, and update.  Typically the same or very close in time.
- `decay_args`, `decay_method`, `strategy`, `weighting_settings`, `method_name`: "Under the hood" parameters for the method of aggregating individual forecasts to produce the consensus.  Solvers can ignore these.
- `id`: The identifier for this consensus history.
- `normalized_value`: **The baseline method forecast**, The consensus forecast value for this answer.
- `prediction_set_id`:  Collective identifier for the answer consensus elements corresponding to the consensus for an IFP.
- `question_id`: The identifier for the IFP
- `value`: The raw output from the aggregator, **not to be used as a probability value**

```
{'answer_id': 7015,
 'consensus_at': '2019-03-13T14:28:40.843Z',
 'created_at': '2019-03-13T14:28:41.951Z',
 'decay_args': {'percent': 0.468},
 'decay_method': 'Aggregation::Decay::PercentRecent',
 'id': 107249,
 'method_name': '1-WeightedLogit-PercentRecent',
 'normalized_value': 0.00166729,
 'prediction_set_id': 64780,
 'question_id': 2595,
 'strategy': 'Aggregation::Strategies::Logit',
 'updated_at': '2019-03-13T14:28:41.951Z',
 'value': 0.013881,
 'weighting_settings': {'count_of_closed_questions_answered_requirement': 35,
                        'enabled': True,
                        'minimum_closed_questions_to_enable_accuracy_weighting': 10,
                        'percentage_of_closed_questions_answered_requirement': 0.5}}
```