<a href="https://colab.research.google.com/github/joe-segal/metaculus-bot-forecaster/blob/main/JMS_Metaculus_Bot_Notebook.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# LLM Forecasting Bot
*Created by Kirill and Tom, revised by Ryan*

The code below is used for making LLM-powered forecasts in Metaculus' [AI benchmarking competition](https://www.metaculus.com/project/ai-benchmarking-pilot/).

Specifically, it does the following:

* Gets questions from the project page using the Metaculus API
* Gets four separate forecasts from the LLM, three independently and the fourth assessing the reasoning of the first three and producing its own.
* Predicts on the Metaculus questions and shares a comment describing the reasoning of the fourth LLM forecast.

Features and options:

* Allows you to choose whether it repredicts on questions it's already forecasted on or ignores them.
* Allows for the use of [Perplexity search](https://www.perplexity.ai/) for additional research, using a prompt formed by an LLM.
    * *Previously this allowed for the use of pre-computed Perpelxity results, but this is no longer supported by Metaculus.*
* Can be used with an automated workflow via Github actions to monitor the project for new open questions and make forecasts when there are some
    * See [this Github repo](https://github.com/ryooan/metaculus-bot-forecaster) for how to set this up

### 🚨🚨🚨 Warning 🚨🚨🚨

**You are responsible for monitoring the costs of your implementation, especially if using the automated Github workflow. Cost estimates computed by this notebook are rough estimates only, make sure to check and monitor how much you are spending and the funds in your relevant accounts.**

## Getting Started

### Make a Metaculus Bot Account
The first step will be to make a Metaculus bot account. Instructions for how to do this have likely already been provided to you, either via a page on Metaculus or at an event.

### Secrets and Tokens

You need to set secrets 1) and 2) in order to make forecasts. Secrets 3) and 4) are necessary if you will be using Perplexity research.

1) METACULUS_TOKEN (you can find it or create it here - https://www.metaculus.com/admin/authtoken/tokenproxy/, or ask Metaculus to share it with you).

2) OPENAPI_API_KEY - (you can find it here https://platform.openai.com/settings/profile?tab=api-keys).

3) PERPLEXITY_API_KEY - You can generate an API key here: https://docs.perplexity.ai/docs/getting-started

These secrets can be set in your Google colab account using the key on the left side.

*See the [Github repo](https://github.com/ryooan/metaculus-bot-forecaster) for special instructions necessary for setting your secrets in Github if you intend to use the automated Github action.*

*Note: previously this also used a QUESTIONS_API_KEY which got some precomputed Perplexity results stored by Metaculus, but this is no longer supported by Metaculus.*

### Setting Inputs

Once your tokens are set correctly, you can proceed to the [Inputs section](https://colab.research.google.com/drive/1_P7_QNJiJyWBY2qCVu2-_8gVPD1X7mX3?authuser=2#scrollTo=6cbruBaVtaZh). That should be the only section most users will need. More advanced users can edit the [Setup](https://colab.research.google.com/drive/1_P7_QNJiJyWBY2qCVu2-_8gVPD1X7mX3?authuser=2#scrollTo=tNl_mbJaX60R) and [Code](https://colab.research.google.com/drive/1_P7_QNJiJyWBY2qCVu2-_8gVPD1X7mX3?authuser=2#scrollTo=k8vtze4SXtR3) sections if desired, but this is not recommended unless you have coding experience.

## Setup

It is recommended that you do not edit these cells unless you have coding experience.



---



In [1]:
from IPython.display import HTML, display

def set_css(*args, **kwargs):
  display(HTML('''
  <style>
    pre {
        white-space: pre-wrap;
    }
  </style>
  '''))
get_ipython().events.register('pre_run_cell', set_css)

#according to this link the above wraps the output text: https://stackoverflow.com/questions/58890109/line-wrapping-in-collaboratory-google-results

In [2]:
!pip install openai
!pip install tiktoken
import datetime
import json
import os
import requests
import tiktoken
import re

from openai import OpenAI

#use the below to detect if it's being run in google colab, if it's not this skips an error
def in_colab():
    try:
        import google.colab
        return True
    except ImportError:
        return False

if in_colab():
    from google.colab import userdata

Collecting openai
  Downloading openai-1.37.0-py3-none-any.whl (337 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m337.0/337.0 kB[0m [31m1.7 MB/s[0m eta [36m0:00:00[0m
Collecting httpx<1,>=0.23.0 (from openai)
  Downloading httpx-0.27.0-py3-none-any.whl (75 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m75.6/75.6 kB[0m [31m3.5 MB/s[0m eta [36m0:00:00[0m
Collecting httpcore==1.* (from httpx<1,>=0.23.0->openai)
  Downloading httpcore-1.0.5-py3-none-any.whl (77 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m77.9/77.9 kB[0m [31m4.8 MB/s[0m eta [36m0:00:00[0m
[?25hCollecting h11<0.15,>=0.13 (from httpcore==1.*->httpx<1,>=0.23.0->openai)
  Downloading h11-0.14.0-py3-none-any.whl (58 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m58.3/58.3 kB[0m [31m1.6 MB/s[0m eta [36m0:00:00[0m
Installing collected packages: h11, httpcore, httpx, openai
Successfully installed h11-0.14.0 httpcore-1.0.5 ht

In [3]:
#the below is used to get secrets when using github actions to automate
#initialize
metaculus_token = None
#questions_api_key = None
perplexity_api_key = None

# Function to load secrets from the specified path
def load_secrets(secrets_path):
    try:
        with open(secrets_path, 'r') as secrets_file:
            secrets = json.loads(secrets_file.read())
            for k, v in secrets.items():
                os.environ[k] = v
    except Exception as e:
        print(f"Error loading secrets from {secrets_path}: {e}")

# Main code block
try:
    if 'secretsPath' in globals():
        print(f"secretsPath exists: {secretsPath}")
        load_secrets(secretsPath)

        metaculus_token = os.environ['METACULUS_TOKEN']
        #questions_api_key = os.environ['QUESTIONS_API_KEY']
        perplexity_api_key = os.environ['PERPLEXITY_API_KEY']
    else:
        raise NameError("secretsPath not defined")
except NameError:
    print("Loading secrets from userdata (Google Colab)")
    metaculus_token = userdata.get('METACULUS_TOKEN')
    os.environ["OPENAI_API_KEY"] = userdata.get('OPENAI_API_KEY')
    #questions_api_key = userdata.get('QUESTIONS_API_KEY')
    perplexity_api_key = userdata.get('PERPLEXITY_API_KEY')
except KeyError as e:
    print(f"Missing required environment variable: {e}")

Loading secrets from userdata (Google Colab)


## Inputs

The cell below contains all of the main settings that you can change. See comments in the cell for an explanation of each. Modify them as you see fit and then run all of the cells below to forecast.

*You can press Ctrl+F10 to run all of the cells after the selected one.*

In [4]:
PROJECT_ID = 3349  # 3129 is ID of AI Becnhmarking Pilot project. We kindly ask you not to forecast on any public tournaments or public questions in general
MAX_QUESTIONS_TO_FORECAST = 25  # You can set it to some small number for testing or to 1_000_000 to forecast on all available questions
REPREDICT = True # if this is false it won't predict on questions it has previously already predicted on. Set it to true to repredict on all open questions, even if it has made previous predictions.
SUBMIT_FORECASTS = True # If set to False - forecast, but don't submit results to Metaculus platform. If set to True - forecast, and submit results to Metaculus platform
USE_PERPLEXITY_RECENT = True # If set to true the perplexity search used is one that looks for the most recent news on the subject using a GPT prompt completion informed by the forecasting question.
QUESTION_IDS_TO_FORECAST = None # Set to None to disable custom filtering by ID. Set to a list of IDs to only forecast on selected questions, i.e. [24191, 24190, 24189]
#QUESTION_IDS_TO_FORECAST = [26390]
USE_CONFIDENCE = True
#USE_METACULUS_PROXY_ENDPOINT = False
OPEN_AI_MODEL = 'gpt-4-turbo' # if it is one model running everything openai
PERPLEXITY_MODEL = 'llama-3-sonar-large-32k-online'
NUM_FORECASTS_TO_AVERAGE = 3 # 1 to go once through on the above OPEN_AI_MODEL, otherwise use MODELS_TO_RUN in sequence
METACULUS_SUPPORTED_MODELS = ['gpt-4o']
MODELS_TO_RUN = ['gpt-4o', 'gpt-4', 'gpt-4-turbo']

# Use this to run your own question, not pulled from Metaculus
ANSWER_MANUAL_QUESTION = False
#manual_question = "Will Pfizer's GLP-1 trial agent danuglipron be approved by the FDA for weight loss by January 1 2028?"
manual_question = "Will a nuclear weapon be detonated as an act of war by Sept 30, 2024?"
manual_question_dict = {
  'id': None,
  'title': manual_question,
  'description': manual_question,
  'resolution_criteria': manual_question,
  'fine_print': manual_question
  }
if ANSWER_MANUAL_QUESTION:
  SUBMIT_FORECASTS = False

# The forecaster weights are used to produce a weighted average of the forecasts. You can adjust these weights to favor a certain forecaster/prompt more heavily.
# Weights should sum to 1.
forecaster1_weight = 0.2
forecaster2_weight = 0.2
forecaster3_weight = 0.2
forecaster4_weight = 0.4

# Prompts
# The prompts used are below, these can be edited to hone your LLM forecasts.
# Here is a glossary of the variables that can be inserted and used in the prompts.
# [[title]]: This is the question text, what shows up at the top of the Metaculus question page
# [[resolution_criteria]]: This is the resolution criteria section of the question, excluding fine print
# [[fine_print]]: This is the fine print section of the question.
# [[background]]: This is the background section of the resolution criteria.
# [[today]]: The current date
# [[forecaster1]] through {forecaster3}: These are the LLM outputs of forecasters 1 through 3, currently being used to feed into the input of forecaster 4 for it to assess in its own forecast.
# [[summary_report]]: This is research from Perplexity. If USE_PERPLEXITY_RECENT is True, this will be the Perplexity info returned when the output of LLM_question_completion is passed to Perplexity.
                  # If USE_PERPLEXITY_RECENT is False and ENABLE_PERPLEXITY_RESEARCH is True, it will return pre-computed Perplexity research on the question stored by Metaculus.
                  # If both are false it will not return anything.

# The LLM_question_completion prompt is used to ask the LLM what question it should ask Perplexity, if using USE_PERPLEXITY_RECENT
prompt_template_get_perplexity_question = """
You're being asked the following forecasting question:

The question is:
[[title]]

And it has these specific resolution details:
[[resolution_criteria]]

Fine print:
[[fine_print]]

To get the latest news that will help you forecast on the question, you need to ask your web search tool one question that would be most valuable to help you forecast
on this question. The question should be posed so that the web search tool will provide you with the most recent information, including the latest information on the progress toward
the criteria in the forecasting question being met. You don't want to influence it to give more negative or positive spin on the information, so make sure to phrase the question in a neutral, objective wording.  Please complete the sentence below with the most valuable question to ask:

"What is the most recent news available and prior occurrences related to the possibility of, and any indicators or evidence of. . . "
"""

# Prompt 1 is used by forecaster 1, with QUESTION_INFO_TEMPLATE appended to the end
prompt_template_v1 = """
You are a professional forecaster interviewing for a job. The interviewer is also a professional forecaster, with a strong track record of accurate forecasts of the future.
They will ask you a question, and your task is to provide the most accurate forecast you can. To do this, you evaluate past data and trends carefully, make use of comparison classes
of similar events, take into account base rates about how past events unfolded, and outline the best reasons for and against any particular outcome.
You know that great forecasters don't just forecast according to the "vibe" of the question and the considerations.
Instead, they think about the question in a structured way, recording their reasoning as they go,
and they always consider multiple perspectives that usually give different conclusions, which they reason about together.
You can't know the future, and the interviewer knows that, so you do not need to hedge your uncertainty, you are simply trying to give
the most accurate numbers that will be evaluated when the events later unfold.

"""

# Prompt 2 is used by forecaster 2, with QUESTION_INFO_TEMPLATE appended to the end
prompt_template_v2 = """
You are a professional forecaster being tested on your skill. You are being asked a question for which you must provide the most accurate forecast you can.
Reason step by step about the best forecast, carefully considering the information you are presented with and the potential that some of the
background information and information provided by your research assistant may be out of date or conflicting.
You also know that a key forecasting practice is to consider the outside view, thinking carefully about reference classes and base rates and how often similar events
have happened in the past.

"""

# Prompt 3 is used by forecaster 3, with QUESTION_INFO_TEMPLATE appended to the end
prompt_template_v3 = """
You are a professional forecaster trying your best to produce an accurate forecast for the question below. You have a keen eye for detail, and know that
the resolution criteria and fine print of a question often contain important edge cases that should be considered. You also know that rookie forecasters
tend to be underconfident, and so your forecasts should be confident and extremize some of the way toward 1% or 99%. You're also aware that forecasting questions tend
to be biased, so when forecasting questions asks about things that would be a departure from the norm or from business as usual, those things tend not to happen.
You will use that knowledge to make the best forecast you can, and describe your reasoning step by step.

"""

# Forecaster 4 uses prompt 4 parts 1 and 2, with QUESTION_INFO_TEMPLATE inserted between part 1 and part 2
final_prompt_part1 = """
You are a professional forecaster trying your best to produce an accurate forecast for the question below.

"""

final_prompt_part2 = """
Now that you know what the question asks and some relevant background and research, your job is to make the best forecast you can. You know that examining the reasoning of other
forecasters is an excellent way to improve your own forecast. Below I have provided the reasoning from three other forecasters who predicted on the same question.
Examine their reasoning and use it to inform your own, using your expertise as a forecaster to assess which reasoning seems strongest and which seems flawed,
as well as which reasoning seems to incorporate the most accurate information about base rates and historic reference classes. Construct your own reasoning and forecast,
describing your reasoning step by step and incorporating the strongest arguments from the other forecasters in a way that improves your own reasoning. First produce a
one sentence summary of the reasoning of each forecaster (repeating the final probability each predicted), then describe your forecast.

Forecaster A:
[[forecaster1]]

Forecaster B:
[[forecaster2]]

Forecaster C:
[[forecaster3]]
"""

# QUESTION_INFO_TEMPLATE is used with the above prompts to share the details about the question with the LLM.
QUESTION_INFO_TEMPLATE = """

The question is:
[[title]]

Here are details about how the outcome of the question will be determined, make sure your forecast is consistent with these:
[[resolution_criteria]]

Here is the question's fine print that you need to be consistent with in your forecast:
[[fine_print]]

Here is some background of the question, though note that some of the details may be out of date:
[[background]]

Your research assistant provides the following information that is likely more up to date:
[[summary_report]]

Today is [[today]].

Describe your reasoning step by step and give your final probability forecast in the last line of your response along with an assessment of your confidence in the accuracy of this forecast, both values as a percentage probability or confidence in exactly this form: "Probability: XX.X% - Confidence: XX.X%", where XX.X is a number between 0 and 100, with at most one decimal place.
"""


## Functions

---



Getting questions (only binaries)

Getting them 10 at a time, you can change offset to "scroll" through them

In [5]:
url = "https://www.metaculus.com/api2/questions/"

params = {
    "has_group": "false",
    "order_by": "-activity",
    "forecast_type": "binary",
    "project": PROJECT_ID,
    "status": "open", # can change this to 'closed' for testing where you're not submitting a forecast, otherwise leave as open
    "type": "forecast",
    "title-and-description-only": "true",
}

In [6]:
def yield_all_questions():
  if ANSWER_MANUAL_QUESTION:
    yield manual_question_dict
    return

  limit = 10 # This is a page limit, not question limit
  n = 0
  new_questions_found = False

  while True:
    offset = n * limit
    response = requests.get(
        url,
        params={**params, "limit": limit, "offset": offset},
        headers={"Authorization": f"Token {metaculus_token}"}
    )
    response.raise_for_status()
    questions = response.json().get("results")

    # if repredict is true it will skip to the else and predict on all the questions
    # if repredict is false it will see if "my_predictions" is empty or not for each question, and only predict on questions without a prediction
    if not REPREDICT:
        for question in questions:
            question_id = question['id']

            guess_response = requests.get(
                f"{url}{question_id}/",
                headers={"Authorization": f"Token {metaculus_token}"}
            )
            guess_response.raise_for_status()

            if not guess_response.json().get("my_predictions"):
                new_questions_found = True
                yield question
    else:
        new_questions_found = True
        yield from questions

    if not response.json().get("next"):
      break
    n += 1

  if not new_questions_found:
    print("No new questions to predict on.")

In [7]:
def extract_probability(s):
    s = s.replace("**","")
    # Use a regular expression to find all numbers followed by a '%'
    matches = re.findall(r'Probability[:\s]+(?:\*\*)?[\s]*(\d+\.?\d*)%', s)
    if matches:
        # Return the last number found before a '%'
        #return int(matches[-1])
        return float(matches[-1]), None
    else:
        # Return None if no number found
        return None, None

def extract_probability_and_confidence(s):
    s = s.replace("**","")
    # Use a regular expression to find all numbers followed by a '%'
    matches = re.findall(r'Probability[:\s]+(?:\*\*)?[\s]*(\d+\.?\d*)%', s)
    probability = float(matches[-1])
    matches = re.findall(r'Confidence[:\s]+(?:\*\*)?[\s]*(\d+\.?\d*)%', s)
    confidence = float(matches[-1])
    return probability, confidence

In [8]:
# this is used to replace the {} keys with [[]], since sometimes the LLM output uses {} when formatting code.
def replace_keys(text, key_dict, delimiter='[[', end_delimiter=']]'):
    pattern = re.compile(re.escape(delimiter) + '(.*?)' + re.escape(end_delimiter))
    def replace(match):
        key = match.group(1)
        return key_dict.get(key, match.group(0))  # Return the original if key not found
    return pattern.sub(replace, text)

In [9]:
def predict(question_id, prediction_percentage):
  prediction_decimal = float(prediction_percentage) / 100.0
  url = f"https://www.metaculus.com/api2/questions/{question_id}/predict/"
  response = requests.post(
      url,
      json={
        "prediction": prediction_decimal
      },
      headers={"Authorization": f"Token {metaculus_token}"},
  )
  response.raise_for_status()
  print(f"Successfully predicted {prediction_percentage}% ({prediction_decimal}) on question {question_id}")


In [10]:
def formulate_comment(prediction_json):
  comment_blocks = []
  if "reasoning_base_rate" in prediction_json:
    comment_blocks.append("## Base rate estimation")
    comment_blocks.append(prediction_json["reasoning_base_rate"])
  if "reasoning_reference_classes" in prediction_json:
    comment_blocks.append("## Reference classes")
    comment_blocks.append(prediction_json["reasoning_reference_classes"])
  if "reasoning_other" in prediction_json:
    comment_blocks.append("## Additional")
    comment_blocks.append(prediction_json["reasoning_other"])
  return "\n".join(comment_blocks) if comment_blocks else "No reasoning provided"

In [11]:
def comment(question_id, comment_text):

  # for submit_type choose "S" to post regular comment and "N" for private. Tournament submissions should be private comments.
  url = f"https://www.metaculus.com/api2/comments/"
  response = requests.post(
    url,
    json={
      "comment_text":comment_text,"submit_type":"N","include_latest_prediction":True,"question":question_id
    },
    headers={"Authorization": f"Token {metaculus_token}"},
  )
  response.raise_for_status()
  print("Comment Success!")

In [12]:
def estimate_pricing(input, output, model):
  encoding = tiktoken.encoding_for_model(model)
  input_len = len(encoding.encode(input))
  output_len = len(encoding.encode(output))

  print("Input text is approx %s tokens." % input_len)
  print("Output text is approx %s tokens." % output_len)


  # hard coding for now, maybe make it smarter later
  # units of $ per token
  gpt4o_input_pricing = 5 / 1_000_000
  gpt4o_output_pricing = 15 / 1_000_000

  input_cost = input_len * gpt4o_input_pricing
  output_cost = output_len * gpt4o_output_pricing
  total_cost = input_cost + output_cost

  return input_cost, output_cost, total_cost

In [13]:
def run_llm(today, client, template_values, prompt_template, summary_report, model, looking_for_probability=True):

  title = template_values["title"]
  resolution_criteria = template_values["resolution_criteria"]
  background = template_values["description"]
  if template_values["fine_print"]:
    fine_print = template_values["fine_print"]
  else:
    fine_print = "none"

  prompt_dict = {
      "title": title,
      "summary_report": summary_report,
      "today": today,
      "background": background,
      "fine_print": fine_print,
      "resolution_criteria": resolution_criteria,
  }

  prompt_text = replace_keys(prompt_template, prompt_dict)

  #print("Here is the prompt used:")
  #print("prompt_text")
  #print(prompt_text)
  #print("")

  chat_completion = client.chat.completions.create(
    model=model,
    messages=[
      {
        "role": "user",
        "content": prompt_text
      }
    ]
  )

  gpt_text = chat_completion.choices[0].message.content

  #estimate cost
  input_cost, output_cost, total_cost = estimate_pricing(prompt_text, gpt_text, model)

  if looking_for_probability:
    # Regular expression to find the number following 'Probability: '
    try:
      probability_match, confidence_match = extract_probability_and_confidence(gpt_text)
    except:
      print('Error extracting numbers from text:')
      print(gpt_text)
      raise ValueError

    # Extract the number if a match is found
    if probability_match:
        #probability = int(probability_match) # int(match.group(1))
        probability = float(probability_match)
        #print(f"The extracted probability is: {probability}%")
        #probability = min(max(probability, 3), 97) # To prevent extreme forecasts
    else:
        probability = None
        print("No probability found in the text! Skipping!")
        # Extract the number if a match is found
    if confidence_match:
        #probability = int(probability_match) # int(match.group(1))
        confidence = float(confidence_match)
        #print(f"The extracted probability is: {probability}%")
        #probability = min(max(probability, 3), 97) # To prevent extreme forecasts
    else:
        confidence = None
        print("No confidence found in the text! Skipping!")
  else:
    probability = None
    confidence = None
  return probability, confidence, gpt_text, input_cost, output_cost, total_cost


def run_llm_metaculus(today, template_values, prompt_template, summary_report, model, looking_for_probability=True):

  title = template_values["title"]
  resolution_criteria = template_values["resolution_criteria"]
  background = template_values["description"]
  if template_values["fine_print"]:
    fine_print = template_values["fine_print"]
  else:
    fine_print = "none"

  prompt_dict = {
      "title": title,
      "summary_report": summary_report,
      "today": today,
      "background": background,
      "fine_print": fine_print,
      "resolution_criteria": resolution_criteria,
  }

  prompt_text = replace_keys(prompt_template, prompt_dict)

  #print("Here is the prompt used:")
  #print("prompt_text")
  #print(prompt_text)
  #print("")

  url = "https://www.metaculus.com/proxy/openai/v1/chat/completions"
  auth_token = metaculus_token

  # Define the payload as a dictionary
  payload = {
      "model": model,
      "messages": [
          { "role": "user", "content": prompt_text },
      ]
  }

  # Set the headers
  headers = {
      "Content-Type": "application/json",
      "Authorization": f"Token {auth_token}"
  }

  # Make the POST request
  response = requests.post(url, headers=headers, data=json.dumps(payload))

  # Check if the request was successful
  if response.status_code == 200:
      print("Request was successful!")

      # Parse the response JSON
      result = response.json()

      # Extract the desired text
      gpt_text = result["choices"][0]["message"]["content"]

      # Print the extracted content
      #print(gpt_text)

  else:
      print(f"Request failed with status code: {response.status_code}")
      print(response.text)
      raise ValueError

  #estimate cost
  input_cost, output_cost, total_cost = estimate_pricing(prompt_text, gpt_text, model)

  if looking_for_probability:
    # Regular expression to find the number following 'Probability: '
    try:
      probability_match, confidence_match = extract_probability_and_confidence(gpt_text)
    except:
      print('Error extracting numbers from text:')
      print(gpt_text)
      raise ValueError

    # Extract the number if a match is found
    if probability_match:
        #probability = int(probability_match) # int(match.group(1))
        probability = float(probability_match)
        #print(f"The extracted probability is: {probability}%")
        #probability = min(max(probability, 3), 97) # To prevent extreme forecasts
    else:
        probability = None
        print("No probability found in the text! Skipping!")
        # Extract the number if a match is found
    if confidence_match:
        #probability = int(probability_match) # int(match.group(1))
        confidence = float(confidence_match)
        #print(f"The extracted probability is: {probability}%")
        #probability = min(max(probability, 3), 97) # To prevent extreme forecasts
    else:
        confidence = None
        print("No confidence found in the text! Skipping!")
  else:
    probability = None
    confidence = None
  return probability, confidence, gpt_text, input_cost, output_cost, total_cost

In [14]:
def call_perplexity(perplexity_prompt, perplexity_api_key):

  from openai import OpenAI

  YOUR_API_KEY = perplexity_api_key

  messages = [
      {
          "role": "system",
          "content": (
              "You are an artificial intelligence assistant and you need to "
              "engage in a helpful, detailed, polite conversation with a user.  "
              "Please supply relevant objective and balanced information and status regarding the following questions the user will pose to you.  "
              "You may also supply expert opinions (and cite the source of) if you find some that you think are important to include, "
              "but you are not to editorialize very much yourself about this information.  "
              "You are supporting an analysis team who are already experts in this area, "
              "but need to get a quick yet thorough update on any recent news developments and the overall current status of any key factors relating to this question.  "
              "Questions will usually be stated in terms something that WILL happen, or that something WILL NOT happen - either way it is asked, be sure to provide recent information on why the event or condition MAY happen and why it MAY NOT happen."

          ),
      },
      {
          "role": "user",
          "content": (
              perplexity_prompt
          ),
      },
  ]

  perplexity_client = OpenAI(api_key=YOUR_API_KEY, base_url="https://api.perplexity.ai")

  # chat completion without streaming
  response = perplexity_client.chat.completions.create(
      model=PERPLEXITY_MODEL,
      messages=messages,
  )

  content = response.choices[0].message.content

  print("Generated research from perplexity:")
  print(content)
  print("")

  # get token and cost estimate

  # currently using the GPT tokenizer with a 1.3 multiplier. Hacky and wrong, but rough estimate.
  # See here for 1.3 factor estimate source: https://github.com/continuedev/continue/issues/878

  perplexity_token_pricing = 1/1_000_000
  perplexity_cost_fixed = 5/1_000

  multiplier = 1.3
  encoding = tiktoken.encoding_for_model(OPEN_AI_MODEL)
  input_text = perplexity_prompt
  output_text = content
  input_len = len(encoding.encode(input_text)) * multiplier
  output_len = len(encoding.encode(output_text)) * multiplier

  input_cost = input_len * perplexity_token_pricing
  output_cost = output_len * perplexity_token_pricing
  fixed_cost = perplexity_cost_fixed
  total_cost = input_cost + output_cost + perplexity_cost_fixed

  print(f"Total perplexity call cost: ${total_cost}")
  print("")

  return content, total_cost

In [15]:
def clean_gpt_turbo_markdown(text: str) -> str:
  match = re.search(r"```[\w]+\s+(.*?)\s+```", text, re.DOTALL)
  if match:
    cleaned_text = match.group(1).strip()
  else:
    cleaned_text = text
  return cleaned_text

Execution

In [16]:
today = datetime.datetime.now().strftime("%Y-%m-%d")
#if USE_METACULUS_PROXY_ENDPOINT:
#  client = OpenAI(base_url='https://www.metaculus.com/proxy/openai/v1/chat/completions')
#else:
#  client = OpenAI()
openai_client = OpenAI()

model = OPEN_AI_MODEL
# ENABLE_PERPLEXITY_RESEARCH = False  # Previously this could be set to True to get and use pre-computed Perplexity research results for the question, set to False otherwise.
# However, Metaculus no longer supports pre-computed Perplexity research, so it has been permanently set to False here to skip over that setp.

prompt_templates = [prompt_template_v1 + QUESTION_INFO_TEMPLATE, prompt_template_v2 + QUESTION_INFO_TEMPLATE, prompt_template_v3 + QUESTION_INFO_TEMPLATE]

forecasted_count = 0

for question_to_forecast_dict in yield_all_questions():
  if forecasted_count >= MAX_QUESTIONS_TO_FORECAST:
    break
  if QUESTION_IDS_TO_FORECAST is not None and question_to_forecast_dict["id"] not in QUESTION_IDS_TO_FORECAST:
    continue

  final_forecasts = []

  for q in range(0, NUM_FORECASTS_TO_AVERAGE):

    print("Now Forecasting Question:")
    print(question_to_forecast_dict["id"], question_to_forecast_dict["title"])
    print("")
    print("Run %s" % (q+1))
    print("")

    if NUM_FORECASTS_TO_AVERAGE>1:
      model = MODELS_TO_RUN[q % len(MODELS_TO_RUN)]
      if model in METACULUS_SUPPORTED_MODELS:
        print("Using Metaculus endpoint for model: %s" % model)
        USE_METACULUS_PROXY_ENDPOINT = True
      else:
        print("Using OpenAI endpoint for model: %s" % model)
        USE_METACULUS_PROXY_ENDPOINT = False

    #define perplexity research to use
    perplexity_total_cost = "N/A"

    #if ENABLE_PERPLEXITY_RESEARCH:
    #  summary_report = get_perplexity_research(question_to_forecast["id"])
    #else:
    #  #summary_report = "No results found, please use your own knowledge and judgement to forecast"
    #  summary_report = ""
    summary_report = "" # ?

    # set summary_report to the perplexity recent search if enabled
    if USE_PERPLEXITY_RECENT:
      print("Getting Perplexity recent data...")
      print ("")
      #print("run_llm(...):")
      #print("today", today)
      #print("client", client)
      #print("question_to_forecast_dict", question_to_forecast_dict)
      #print("prompt_template_get_perplexity_question", prompt_template_get_perplexity_question)
      #print("summary_report", summary_report)
      #print("model", model)

      #get prompt completion for use with perplexity
      #probability, confidence, perplexity_recent_prompt_completion, completion_input_cost, completion_output_cost, completion_total_cost = run_llm(today, client, question_to_forecast_dict, prompt_template_get_perplexity_question, summary_report, model, looking_for_probability=False)
      if USE_METACULUS_PROXY_ENDPOINT:
        probability, confidence, perplexity_recent_prompt_completion, completion_input_cost, completion_output_cost, completion_total_cost = run_llm_metaculus(today, question_to_forecast_dict, prompt_template_get_perplexity_question, summary_report, model, looking_for_probability=False)
      else:
        probability, confidence, perplexity_recent_prompt_completion, completion_input_cost, completion_output_cost, completion_total_cost = run_llm(today, openai_client, question_to_forecast_dict, prompt_template_get_perplexity_question, summary_report, model, looking_for_probability=False)
      #print("perplexity_recent_prompt_completion", perplexity_recent_prompt_completion)
      #print("completion_input_cost", completion_input_cost)
      #print("completion_output_cost", completion_output_cost)
      #print("completion_total_cost", completion_total_cost)
      #print("")

      perplexity_recent_prompt = perplexity_recent_prompt_completion if ("What is the most recent news available and prior occurrences related to".lower() in perplexity_recent_prompt_completion.lower()) else ("What is the most recent news available and prior occurrences related to the possibility of, and any indicators or evidence of " + perplexity_recent_prompt_completion)

      print(f"The completed question posed to perplexity reads: {perplexity_recent_prompt}")
      #get recent news from perplexity
      #print("call_perplexity(...)")
      perplexity_content, perplexity_cost = call_perplexity(perplexity_recent_prompt, perplexity_api_key)

      summary_report = perplexity_content
      perplexity_total_cost = completion_total_cost + perplexity_cost

    # need to iterate through prompts here
    all_forecasts = []
    overall_cost = 0

    i = 0
    for prompt_template in prompt_templates:
      i+=1
      print("Running initial prompt %s" % i)
      #print("run_llm(...)")
      if USE_METACULUS_PROXY_ENDPOINT:
        try: # i'm letting it possibly run twice, in case it fails the first time due to returning the numerical result in a weird way
          probability, confidence, gpt_text, input_cost, output_cost, total_cost = run_llm_metaculus(today, question_to_forecast_dict, prompt_template, summary_report, model)
        except:
          probability, confidence, gpt_text, input_cost, output_cost, total_cost = run_llm_metaculus(today, question_to_forecast_dict, prompt_template, summary_report, model)
      else:
        try:
          probability, confidence, gpt_text, input_cost, output_cost, total_cost = run_llm(today, openai_client, question_to_forecast_dict, prompt_template, summary_report, model)
        except:
          probability, confidence, gpt_text, input_cost, output_cost, total_cost = run_llm(today, openai_client, question_to_forecast_dict, prompt_template, summary_report, model)
      all_forecasts.append((probability, confidence, gpt_text, input_cost, output_cost, total_cost))
      overall_cost += total_cost
      print(f"Output Reasoning:")
      print(gpt_text)
      print("")
      print("Extracted probability percentage value (0-100): %s" % probability)
      #print(f"Input cost: ${input_cost}")
      #print(f"Output cost: ${output_cost}")
      #print(f"Total cost: ${total_cost}")
      #print(f"Perplexity search costs (including LLM prompt completion): ${completion_total_cost + perplexity_cost}")
      print("")
      print("~~~~ NEXT PROMPT ~~~~")
      print("")

    final_prompt_part2_dict = {
        "forecaster1": all_forecasts[0][2],
        "forecaster2": all_forecasts[1][2],
        "forecaster3": all_forecasts[2][2],
    }

    formatted_final_prompt_part2 = replace_keys(final_prompt_part2, final_prompt_part2_dict)

    print("+++++++++++ FINAL PROMPT (4) +++++++++++++++++")

    final_prompt_template = final_prompt_part1 + QUESTION_INFO_TEMPLATE + formatted_final_prompt_part2

    print("Running final prompt...")

    if USE_METACULUS_PROXY_ENDPOINT:
      try: # i'm letting it possibly run twice, in case it fails the first time due to returning the numerical result in a weird way
        probability, confidence, gpt_text, input_cost, output_cost, total_cost = run_llm_metaculus(today, question_to_forecast_dict, prompt_template, summary_report, model)
      except:
        probability, confidence, gpt_text, input_cost, output_cost, total_cost = run_llm_metaculus(today, question_to_forecast_dict, prompt_template, summary_report, model)
    else:
      try:
        probability, confidence, gpt_text, input_cost, output_cost, total_cost = run_llm(today, openai_client, question_to_forecast_dict, prompt_template, summary_report, model)
      except:
        probability, confidence, gpt_text, input_cost, output_cost, total_cost = run_llm(today, openai_client, question_to_forecast_dict, prompt_template, summary_report, model)
    #print("")
    #print(gpt_text)
    print("")

    print("Final prompt's individual (non-weighted) extracted probability percentage value (0-100): %s" % probability)
    print("")

    if USE_CONFIDENCE:
      total_confidence = 0
      for forecast in all_forecasts:
        total_confidence += forecast[1]
      total_confidence += 2.0*confidence
      forecaster1_weight = round(all_forecasts[0][1]/total_confidence,4)
      forecaster2_weight = round(all_forecasts[1][1]/total_confidence,4)
      forecaster3_weight = round(all_forecasts[2][1]/total_confidence,4)
      forecaster4_weight = round(2.0*confidence/total_confidence,4)

    else:
      forecaster1_weight = 0.2
      forecaster2_weight = 0.2
      forecaster3_weight = 0.2
      forecaster4_weight = 0.4

    weighted_forecast = forecaster1_weight*float(all_forecasts[0][0]) + forecaster2_weight*float(all_forecasts[1][0]) + forecaster3_weight*float(all_forecasts[2][0]) + forecaster4_weight*float(probability)
    weighted_forecast = round(weighted_forecast,1)
    overall_cost = overall_cost + total_cost

    #create summary strings for comments:
    header_string = f"""
    *This forecast is produced from several prompts initially making forecasts, then a summary forecaster looks over all the previous forecasts and rationales and assess them for level of convincingness before giving its own forecast.  Then a weighted forecast is produced by combining all the inidividual and summary forecasts, weighted based on the initial forecasters' reported confidence and an independently assessed level of rationale strength.  The individual forecasts are reasoned based on knowledge stored inside the latest OpenAI GPT models, supplemented with recent information from targeted Perplexity search results.*

    * *Main LLM Model used: {model}*
    * *Weighted formula: ({forecaster1_weight})({all_forecasts[0][0]}% [Forecaster A]) + ({forecaster2_weight})({all_forecasts[1][0]}% [Forecaster B]) + ({forecaster3_weight})({all_forecasts[2][0]}% [Forecaster C]) + ({forecaster4_weight})({probability}% [Summary Forecaster])*
    * *** FINAL WEIGHTED FORECAST: {weighted_forecast}%***

    ---

    """
    print('Rationale to submit:')
    print("")
    print(header_string)
    print("")

    if SUBMIT_FORECASTS and weighted_forecast is not None:
      predict(question_to_forecast_dict["id"], float(weighted_forecast))
      comment(question_to_forecast_dict["id"], header_string + "PERPLEXITY INFO\n\n" + summary_report + "\n\n---\n\n" + "SUMMARY FORECASTER RATIONALE\n\n" + gpt_text)
    final_forecasts.append(float(weighted_forecast))

    print(f"Output Reasoning:")
    print(gpt_text)
    print("")
    print("FINAL WEIGHTED FORECAST:")
    print(f"{weighted_forecast}%")
    print("")
    print(f"Overall cost was: ${overall_cost}")
    print("")
    print("################ NEXT QUESTION #################")
    print("")

  if NUM_FORECASTS_TO_AVERAGE>1:
    #average the forecasts and make a final one
    assert len(final_forecasts) == NUM_FORECASTS_TO_AVERAGE
    final_forecast = round(sum(final_forecasts)/len(final_forecasts),1)
    final_comment = "Averaging %s independently generated forecasts (%s) generated on different main LLM models (%s) to get a final forecast value of **%s%%**." % (NUM_FORECASTS_TO_AVERAGE, ", ".join([("%s%%" % f) for f in final_forecasts]), ", ".join(MODELS_TO_RUN), final_forecast)
    print("AVERAGED FINAL FORECAST: %s%%" % final_forecast)
    print("FINAL COMMENT:")
    print(final_comment)
    print("")
    if SUBMIT_FORECASTS:
      predict(question_to_forecast_dict["id"], float(final_forecast))
      comment(question_to_forecast_dict["id"], final_comment)
    print("################ NEXT QUESTION #################")
    print("")


  forecasted_count += 1

print("Finished forecasting on %s questions." % forecasted_count)

NameError: name 'USE_METACULUS_PROXY_ENDPOINT' is not defined