# Evaluation Agent

In this notebook, we create an agent with an evaluator persona. We develop on the evaluation approach established in [coolstore evaluation](./01.coolstore.ipynb). We attempt to feed the evaluation output back to generate a better fix.

We follow the same methodology for evaluation, but we improve the actual mechanisms of providing information. We use additional functions to lint and re-analyze the source code to improve evaluation.

### Goals

* Attempt to improve evaluation itself by incorporating additional information about updated file

* Experiment with an evaluator agent and understand if it helps improve quality of responses


## Pre-requisites

We use autogen to create agents. Install autogen in this environment:

```sh
pip install pyautogen
```

To run through the code snippets in this notebook, use Kai's virtualenv created in the [base dir](../../../).

We use models provided by IBM, AWS Bedrock and OpenAI. You can configure models you want to use by updating the `MODELS` variable in the following cell:

In [95]:
# some shorthands we can use in our experiments for different models
META_LLAMA_70b = 'meta-llama/llama-3-70b-instruct'
META_LLAMA_13b = 'meta-llama/llama-2-13b-chat'
MIXTRAL = 'mistralai/mixtral-8x7b-instruct-v01'
GPT_4 = "gpt-4o"
GPT_3 = "gpt-3.5-turbo"
CLAUDE_SONNET = "anthropic.claude-3-5-sonnet-20240620-v1:0"

# models used in experiments
MODELS = [
    # META_LLAMA_13b,
    # META_LLAMA_70b,
    # MIXTRAL,
    GPT_3,
    # GPT_4,
    # CLAUDE_SONNET,
]

Depending on models you configure above, ensure you include the keys in `.env` file in Kai base dir:

```sh
export GENAI_KEY=<your-ibm-key>
export OPENAI_API_KEY=<your-openai-key>
```

For Claude Sonnet, we use AWS Bedrock. Configure your AWS keys in the default AWS profile for authentication. This file is usually located at `~/.aws/credentials`.

Once you have configured the `.env` file, run the following cell to load keys in the notebook env:

In [None]:
%load_ext dotenv
%dotenv

### Creating autogen config

Now we create some LLM configs to use with autogen.

In [93]:
import os

def AUTOGEN_CONFIGS():
    confs = []
    for model in MODELS:
        conf = {
            "model": model,
        }
        match model:
            case model if model in [GPT_4, GPT_3]:
                conf["api_key"] = os.environ.get("OPENAI_API_KEY")
                conf["tags"] = ['openai']
            case model if model in [MIXTRAL, META_LLAMA_13b, META_LLAMA_70b]:
                conf["api_key"] = os.environ.get("GENAI_KEY")
                conf["base_url"] = "https://bam-api.res.ibm.com"
                conf["tags"] = ['ibm']
        confs.append(conf)
    return confs

### Load test data

We start off by loading into memory a static code analysis report generated for [Coolstore](https://github.com/konveyor-ecosystem/coolstore) app.

We will use two issues from this analysis in our experiments. One of the issues is easier to fix - it involves updating an import statement. The other issue is relatively harder to fix - it has some issues that require import changes while others that require a combination of API change, import removal and updating contents of a function.

In [103]:
# first we load incidents into memory and group by files for better access
import sys
import copy
sys.path.append('../../../kai')
from datetime import datetime
from kai.models.report import Report
from kai.evaluation import BenchmarkExample
from kai.service.incident_store import Application

output_file = './analysis_output.yaml'
report = Report.load_report_from_file(output_file)
files = report.get_impacted_files()

# we filter out filepaths for dependencies
to_delete = []
for k in files: 
    if k.startswith('root/.m2'): to_delete.append(k)
for d in to_delete: del(files[d])

# printing file names and incidents in each file
for f in files: print(len(files[f]), f)

examples: dict[str, BenchmarkExample] = {}
for f in files:
    original_content = ""
    expected_content = ""
    with open(f"./data/apps/coolstore/javaee/{f}", "r") as fl: original_content = fl.read()
    if os.path.exists(f"./data/apps/coolstore/quarkus/{f}"): 
        with open(f"./data/apps/coolstore/quarkus/{f}", "r") as fl: expected_content = fl.read()
    examples[f] = BenchmarkExample(
        application=Application(
            application_name="coolstore",
            current_branch="main",
            repo_uri_local="./data/apps/coolstore/javaee",
            generated_at=datetime.strptime("24/05/09 19:32:00", "%y/%m/%d %H:%M:%S"),
            repo_uri_origin="https://github.com/konveyor-ecosystem/coolstore",
            current_commit="aa"
        ),
        expected_file=expected_content,
        incidents=files[f],
        original_file=original_content,
        name=os.path.basename(f),
        report=report,
    )

EXAMPLE_EASY = examples['src/main/java/com/redhat/coolstore/model/ShoppingCart.java']
hard_example = examples['src/main/java/com/redhat/coolstore/service/ShoppingCartOrderProcessor.java']
ex = copy.deepcopy(hard_example)
ex.incidents = ex.incidents[:6] + ex.incidents[7:8]
EXAMPLE_HARD = copy.deepcopy(ex)

1 src/main/webapp/WEB-INF/web.xml
12 pom.xml
10 src/main/java/com/redhat/coolstore/model/Order.java
6 src/main/java/com/redhat/coolstore/model/OrderItem.java
5 src/main/webapp/WEB-INF/beans.xml
8 src/main/resources/META-INF/persistence.xml
6 src/main/java/com/redhat/coolstore/model/InventoryEntity.java
1 src/main/java/com/redhat/coolstore/model/ShoppingCart.java
6 src/main/java/com/redhat/coolstore/persistence/Resources.java
9 src/main/java/com/redhat/coolstore/rest/CartEndpoint.java
8 src/main/java/com/redhat/coolstore/rest/OrderEndpoint.java
3 src/main/java/com/redhat/coolstore/rest/ProductEndpoint.java
4 src/main/java/com/redhat/coolstore/rest/RestApplication.java
8 src/main/java/com/redhat/coolstore/service/CatalogService.java
2 src/main/java/com/redhat/coolstore/service/InventoryNotificationMDB.java
8 src/main/java/com/redhat/coolstore/service/OrderService.java
15 src/main/java/com/redhat/coolstore/service/OrderServiceMDB.java
3 src/main/java/com/redhat/coolstore/service/ProductSe

Following cell contains some common code we need to run our experiments. Run before moving forward.

In [74]:
import re
from jinja2 import Environment, FileSystemLoader

class KaiFixResponseTransform:
    def __init__(self, input_file: str, incidents: list[dict]):
        self._input_file = input_file
        self._incidents = incidents

    def apply_transform(self, messages: list[dict]) -> list[dict]:
        edited = copy.deepcopy(messages)
        for message in edited:
            if isinstance(message["content"], str):
                message["content"] = self._parse_updated_file(message.get("content", ""))
            elif isinstance(message["content"], list):
                msgs = message.get("content", [])
                for item in msgs:
                    if item["type"] == "text":
                        item["text"] = self._parse_updated_file(item.get("text", ""))
        return edited
    
    def _parse_updated_file(self, msg: str) -> str:
        updated_file = ''
        match = re.search(r'## Updated [F|f]ile\s+```([\s\S]*?)```', msg, re.DOTALL)
        if match: 
            updated_file = match.group(1).strip()
        return get_prompt("evaluate.prompt.jinja", 
                { "incidents": self._incidents, "input_file": self._input_file, "output_file": updated_file })

    def get_logs(self, pre_transform_messages: list[dict], post_transform_messages: list[dict]) -> tuple[str, bool]:
        return "", True

def get_prompt(sub_path: str, params: any) -> str:
    template_env = Environment(loader=
                               FileSystemLoader(searchpath=f"./templates/agent-prompts/"))    
    return template_env.get_template(sub_path).render(params)

### Creating an agent to generate a fix

First, we create an agent that will take an input file and a list of issues identified in the file. It will use all the LLMs in the `MODELS` variable and generate updated file that's supposed to fix the issues.

When generating fixes, we will try to keep the prompts as close to Kai as possible.

In [106]:
from autogen import ConversableAgent, UserProxyAgent, GroupChatManager, GroupChat
from autogen.agentchat.contrib.capabilities import transform_messages, transforms

parse_kai_fix = transform_messages.TransformMessages(transforms=[KaiFixResponseTransform(EXAMPLE_HARD.original_file, EXAMPLE_HARD.incidents)])

llm_config = { 
    "config_list": AUTOGEN_CONFIGS(), 
    "seed": 42,    
}

fix_generation_agent = ConversableAgent(
    name="fix-gen-agent",
    description="Generates fixes for issues",
    system_message="""You are an expert in migrating JavaEE applications to Quarkus. 
You will be given a JavaEE file which is to be migrated to Quarkus.
You will be given static code analysis information highlighting issues in the file.
You will fix the highlighted issues and briefly reason through what changes are required and why.
Remember you will only fix the issues that are highlighted in the input.
Pay attention to changes you make to imports we need to consider.
When updating or adding annotations or using classes, make sure you correctly import them.
After you have shared your step by step thinking, you will provide a full output of the updated file.
Structure your output exactly in Markdown format such as:

## Reasoning
Write the step by step reasoning in this markdown section. Do not include code in this section, please only include text.

## Updated File
```java
// Write the updated file for Quarkus in this section. If the file should be removed, make the content of the updated file a comment explaining it should be removed.
```
""",
    llm_config=llm_config,
    human_input_mode="NEVER",
    is_termination_msg=lambda msg: "TERMINATE" in msg["content"].lower()
    # max_consecutive_auto_reply=1,
)

evaluation_agent = ConversableAgent(
    name="eval-agent",
    description="Evaluates source code",
    system_message="""You are an expert in judging the source code migrated from JavaEE to Quarkus. 
You will be given a JavaEE file as input file which was to be migrated to Quarkus.
You will also be given a list of issues identified in the input file which needed in order to make the file compatible with quarkus.
The issues identified will contain brief summary describing how it should be fixed.
Finally, you will be given an updated file which attempts to fix the issues in the input file.
You will compare the two files and evaluate whether all of the issues identified in the input file are fixed in the updated file.
Make sure that issues are fixed exactly as described. Do not consider alternate solution as valid.
For every issue that is not fixed, you will provide a brief summary describing how to fix it.
Only include the issues that were not fixed in your output. 
If no issues are found, output TERMINATE.
If one or more issues are found, format your output such as:

# Issue
// Issue Message from original input here
// Your suggested fix here
""",
    llm_config=llm_config,
    human_input_mode="NEVER",
    # max_consecutive_auto_reply=1,
)

parse_kai_fix.add_to_agent(evaluation_agent)

input_prompt = get_prompt("fix.prompt.jinja",params={"incidents": EXAMPLE_HARD.incidents, "src_file_contents": EXAMPLE_HARD.original_file})

response = evaluation_agent.initiate_chat(recipient=fix_generation_agent, max_turns=4,
              message=get_prompt("fix.prompt.jinja",params={"incidents": EXAMPLE_HARD.incidents, "src_file_contents": EXAMPLE_HARD.original_file}))

# fix_generation_agent.send()





[33meval-agent[0m (to fix-gen-agent):

## Input File

```java
package com.redhat.coolstore.service;

import java.util.logging.Logger;
import javax.ejb.Stateless;
import javax.annotation.Resource;
import javax.inject.Inject;
import javax.jms.JMSContext;
import javax.jms.Topic;

import com.redhat.coolstore.model.ShoppingCart;
import com.redhat.coolstore.utils.Transformers;

@Stateless
public class ShoppingCartOrderProcessor  {

    @Inject
    Logger log;


    @Inject
    private transient JMSContext context;

    @Resource(lookup = "java:/topic/orders")
    private Topic ordersTopic;

    
  
    public void  process(ShoppingCart cart) {
        log.info("Sending order from processor: ");
        context.createProducer().send(ordersTopic, Transformers.shoppingCartToJson(cart));
    }



}

```

## Issues


### Incident 0
Line number: 5
Message: Replace the `javax.annotation` import statement with `jakarta.annotation`

### Incident 1
Line number: 4
Message: Replace the `javax.ejb` imp