⚡ Autonomous Strategy Code Review Using AI Agents and LLM-as-a-Judge:

Yes—and it’s easier than you think - Let's see how you can build it with the new OpenAI Agents SDK.



👉 In this experiment, I built a simple multi-agent loop where:

- One agent generates a Python trading strategy 💡  

- A second agent evaluates it like a strict reviewer 👨‍⚖️  

- If the result is not “successful,” it provides feedback  

- The generation agent revises its code accordingly  

- The loop runs until the strategy passes or max iterations is reached ✅  



I used a “**LLM-as-a-Judge**” setup to automate the evaluation process.





👉 Key Highlights:

- Agents communicate through messages

- The evaluator uses 3 scoring levels: `"successful"`, `"needs_refinement"`, `"unsuccessful"`  

- Feedback is passed back to the generator for continuous improvement  

- Built using the OpenAI `agents` SDK  





👉 Use case shown:

Mean reversion trading strategy implementation for the equity market, refined through autonomous feedback cycles.





👉 Key Takeaways:

The refinement improved the strategy implementation by incorporating additional computations, such as risk metrics.

In [None]:
!pip install openai-agents -q

In [12]:
from google.colab import userdata
OPENAI_API_KEY = userdata.get('OPENAI_API_KEY')

import os
os.environ["OPENAI_API_KEY"] = OPENAI_API_KEY

In [19]:
import asyncio
from dataclasses import dataclass
from typing import Literal

from agents import Agent, ItemHelpers, Runner, TResponseInputItem, trace

In [30]:
from agents import set_tracing_export_api_key
set_tracing_export_api_key(os.environ["OPENAI_API_KEY"] )

In [31]:
strategy_generator = Agent(
    name="strategy_generator",
    instructions=(
        "You generate a sophisticated trading strategy implementation in Python, based on the user's request."
        "If there is any feedback provided, use it to improve the implementation of the strategy and propose a new Python code incorporating these feedbacks."
        "Be very careful to provide accurate trading strategies, with precise calculations—for example, when computing moving averages or triggering buy/sell signals."
    ),
)

@dataclass
class EvaluationFeedback:
    feedback: str
    score: Literal["successful", "needs_refinement", "unsuccessful"]


strategy_evaluator = Agent[None](
    name="strategy_evaluator",
    instructions=(
        "You evaluate a Python trading strategy implementation and decide if it's good enough."
        "If it's not good enough, you provide feedback on what needs to be improved."
        "Never give it a 'successful' on the first try."
    ),
    output_type=EvaluationFeedback,
)


In [21]:
import nest_asyncio
nest_asyncio.apply()

In [37]:
async def main() -> None:
    msg = input("User's request:")
    input_items: list[TResponseInputItem] = [{"content": msg, "role": "user"}]

    latest_outline: str | None = None
    max_iteration = 2
    # We'll run the entire workflow in a single trace
    nbr_iterations = 0
    while True:
        strategy_generator_result = await Runner.run(
            strategy_generator,
            input_items,
          )

        input_items = strategy_generator_result.to_input_list()
        latest_outline = ItemHelpers.text_message_outputs(strategy_generator_result.new_items)

        print("\033[92m**************************Trading strategy implemented:**************************\033[0m")
        print(latest_outline)

        print("\n\n\033[92m**************************Running Evaluation:**************************\033[0m")
        strategy_evaluator_result = await Runner.run(strategy_evaluator, input_items)
        result: EvaluationFeedback = strategy_evaluator_result.final_output

        print(f"\033[94mEvaluator score: {result.score}\033[0m")
        print("\n")

        if result.score == "successful":
              print("\033[92mStrategy implementation is good ==> stopping the iteration.\033[0m")
              break

        print("\033[94mFeedback:\033[0m")
        print(result.feedback)

        print("\n\n\033[92m**************************Running Refinement:**************************\033[0m")
        print("\033[94mRe-running with feedback for improvements:\033[0m")

        input_items.append({"content": f"Feedback: {result.feedback}", "role": "user"})
        nbr_iterations+=1

        if nbr_iterations>max_iteration:
              print("\033[91mReached max_iteration ==> stopping the iteration.\033[0m")
              break

    print("\n\n\033[92m**************************Final trading strategy implementation:**************************\033[0m")
    print(latest_outline)


if __name__ == "__main__":
    asyncio.run(main())

#Propose a mean reversion trading strategy implementation in Python for equity market

User's request:Propose a mean reversion trading strategy implementation in Python for equity market
[92m**************************Trading strategy implemented:**************************[0m
Certainly! Here's a basic implementation of a mean reversion trading strategy using the Bollinger Bands approach. This strategy assumes that prices will revert to the mean over time.

To start, ensure you have necessary packages installed:

```bash
pip install pandas numpy matplotlib yfinance
```

Now, let's write the strategy:

```python
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import yfinance as yf

# Download historical data from Yahoo Finance
def get_data(ticker, start_date, end_date):
    data = yf.download(ticker, start=start_date, end=end_date)
    return data

# Calculate the Bollinger Bands
def calculate_bollinger_bands(data, window=20, num_std_dev=2):
    data['Rolling Mean'] = data['Close'].rolling(window=window).mean()
    data['Rolling Std Dev'] = data['Cl