Tooling for benchmarking, deploying and monitoring agents for prediction market applications.
Install the project dependencies with poetry
, using Python >=3.10:
python3.10 -m pip install poetry
python3.10 -m poetry install
python3.10 -m poetry shell
Create a .env
file in the root of the repo with the following variables:
Deploying and monitoring agents using GCP requires that you set up the gcloud CLI (see here for installation instructions, and use gcloud auth login
to authorize.)
MANIFOLD_API_KEY=...
BET_FROM_ADDRESS=...
BET_FROM_PRIVATE_KEY=...
OPENAI_API_KEY=...
Create a benchmarkable agent by subclassing the AbstractBenchmarkedAgent
base class, and plug in your agent's research and prediction functions into the predict
method.
Use the Benchmarker
class to compare your agent's predictions vs. the 'wisdom of the crowd' on a set of markets from your chosen prediction market platform.
For example:
import prediction_market_agent_tooling.benchmark.benchmark as bm
from prediction_market_agent_tooling.markets.markets import MarketType, get_binary_markets
benchmarker = bm.Benchmarker(
markets=get_binary_markets(limit=10, market_type=MarketType.MANIFOLD),
agents=[...],
)
benchmarker.run_agents()
md = benchmarker.generate_markdown_report()
This produces a markdown report comparing agents:
Create a deployable agent by subclassing the DeployableAgent
base class, and implementing the
For example, deploy an agent that randomly picks an outcome:
import random
from prediction_market_agent_tooling.deploy.agent import DeployableAgent
from prediction_market_agent_tooling.markets.agent_market import AgentMarket
class DeployableCoinFlipAgent(DeployableAgent):
def answer_binary_market(self, market: AgentMarket) -> bool | None:
return random.choice([True, False])
DeployableCoinFlipAgent().deploy_gcp(...)
For deploying a Safe manually for a given agent, run the script below:
poetry run python scripts/create_safe_for_agent.py --from-private-key <YOUR_AGENT_PRIVATE_KEY> --salt-nonce 42
This will output the newly created Safe in the terminal, and it can then be copied over to the deployment part (e.g. Terraform).
Note that salt_nonce
can be passed so that the created safe is deterministically created for each agent, so that, if the same salt_nonce
is used, the script will not create a new Safe for the agent, instead it will output the previously existent Safe.
Monitor the performance of the agents deployed to GCP, as well as meta-metrics of the prediction market platforms they are deployed to.
This runs as a streamlit app on a localhost server, executed with:
PYTHONPATH=. streamlit run examples/monitor/monitor.py
Which launches in the browser:
The following markets platforms are supported:
- Manifold
- AIOmen
- Polymarket - Benchmarking only. Deploy and monitor TODO
See the Issues for ideas of things that need fixing or implementing. Or come up with your own :D.
We use mypy
for static type checking, and isort
, black
and autoflake
for linting. These all run as steps in CI.