# Coolstore Evaluation

This notebook will work with LLM to provide fixes for issues identified in [Coolstore](https://github.com/konveyor-ecosystem/coolstore) app for different kind of incidents. The incidents have varying levels of complexity for the fixes.  For each incident, we try to evaluate the responses under two conditions - with and without supplemental information from the analysis. We do at least 3 experiments with a prompt before concluding.

Find incidents used in this experiment [here](./analysis_output.yaml).

For comparing with expected output, we will use already modernized version of coolstore app for Quarkus which can be found in `quarkus` branch of the repo [here](https://github.com/konveyor-ecosystem/coolstore/tree/quarkus).

For evaluating the responses, we will use different approaches based on complexity of a fix in question. For easy fixes, we will use [evaluation.py](../../kai/evaluation.py) which uses edit distance. For more complex fixes, we explore using another LLM. For determining consistency of responses, we will use standard deviation of the edit distance.

_You will need to activate and use virtualenv to run snippets in this notebook_

In [1]:
# first we group incidents by files

import sys
sys.path.append('../../../kai')
from kai.models.report import Report

output_file = './analysis_output.yaml'
report = Report.load_report_from_file(output_file)
files = report.get_impacted_files()

# we filter out filepaths for dependencies
to_delete = []
for k in files: 
    if k.startswith('root/.m2'): to_delete.append(k)
for d in to_delete: del(files[d])

# printing file names and incidents in each file
for f in files: print(len(files[f]), f)

1 src/main/webapp/WEB-INF/web.xml
12 pom.xml
10 src/main/java/com/redhat/coolstore/model/Order.java
6 src/main/java/com/redhat/coolstore/model/OrderItem.java
5 src/main/webapp/WEB-INF/beans.xml
8 src/main/resources/META-INF/persistence.xml
6 src/main/java/com/redhat/coolstore/model/InventoryEntity.java
1 src/main/java/com/redhat/coolstore/model/ShoppingCart.java
6 src/main/java/com/redhat/coolstore/persistence/Resources.java
9 src/main/java/com/redhat/coolstore/rest/CartEndpoint.java
8 src/main/java/com/redhat/coolstore/rest/OrderEndpoint.java
3 src/main/java/com/redhat/coolstore/rest/ProductEndpoint.java
4 src/main/java/com/redhat/coolstore/rest/RestApplication.java
8 src/main/java/com/redhat/coolstore/service/CatalogService.java
2 src/main/java/com/redhat/coolstore/service/InventoryNotificationMDB.java
8 src/main/java/com/redhat/coolstore/service/OrderService.java
15 src/main/java/com/redhat/coolstore/service/OrderServiceMDB.java
3 src/main/java/com/redhat/coolstore/service/ProductSe

From the files displayed above, we will be focusing on following files in our experiments:

* src/main/java/com/redhat/coolstore/model/ShoppingCart.java
* src/main/java/com/redhat/coolstore/model/InventoryEntity.java
* src/main/java/com/redhat/coolstore/service/CatalogService.java
* src/main/java/com/redhat/coolstore/service/ShippingService.java
* src/main/java/com/redhat/coolstore/service/ShoppingCartOrderProcessor.java

These files appear in our demo example found [here](https://github.com/konveyor/kai/blob/main/docs/scenarios/demo.md)

In [2]:
# now we will get our test data
import os
import errno
from git import Repo

def ensure_dirs(dir):
    try:
        os.makedirs(dir)
    except OSError as e:
        if e.errno != errno.EEXIST:
            raise

def clone_coolstore(branch: str, path: str):
    try:
        Repo.clone_from("https://github.com/konveyor-ecosystem/coolstore", 
            depth=1, single_branch=True, branch=branch, to_path=path)
    except Exception as e:
        if "already exists" not in str(e):
            print("fatal error cloning repo")
            sys.exit(1)

ensure_dirs("./data/apps/coolstore/")
clone_coolstore("quarkus", "./data/apps/coolstore/quarkus")
clone_coolstore("main", "./data/apps/coolstore/javaee")

In [3]:
# now we will create data required for evaluation
from datetime import datetime
from kai.evaluation import BenchmarkExample, evaluate
from kai.service.incident_store import Application

examples = {}
for f in files:
    examples[f] = BenchmarkExample(
        application=Application(
            application_name="coolstore",
            current_branch="main",
            repo_uri_local="./data/apps/coolstore/javaee",
            generated_at=datetime.strptime("24/05/09 19:32:00", "%y/%m/%d %H:%M:%S"),
            repo_uri_origin="https://github.com/konveyor-ecosystem/coolstore",
            current_commit="aa"
        ),
        expected_file=f"./data/apps/coolstore/quarkus/{f}",
        incidents=files[f],
        original_file=f"./data/apps/coolstore/javaee/{f}",
        name=os.path.basename(f),
        report=report,
    )

In [4]:
from jinja2 import Template

CONFIG_BASE_PATH = "./data/configs/"
OUTPUT_BASE_PATH = "./data/outputs/"

ensure_dirs(CONFIG_BASE_PATH)
ensure_dirs(OUTPUT_BASE_PATH)

templ = Template("""
trace_enabled = true
demo_mode = false
log_dir = "$pwd/logs"
file_log_level = "debug"
log_level = "info"

[models]
provider = "{{ model_provider }}"
template = "{{ prompt_template }}"

[models.args]
model_id = "{{ model_id }}"
{% if max_tokens != "" %}
parameters.max_new_tokens = "{{ max_tokens }}"
{% endif %}

[incident_store]
solution_detectors = "naive"
solution_producers = "text_only"

[incident_store.args]
provider = "postgresql"
host = "127.0.0.1"
database = "kai"
user = "kai"
password = "dog8code"
""")

# some shorthands we can use in our experiments for different models
IBM_LLAMA_13b = 'ibm-llama-13b'
IBM_LLAMA_70b = 'ibm-llama-70b'
IBM_MIXTRAL = 'ibm-mixtral'
IBM_GRANITE = 'ibm-granite'
GPT_4 = "gpt-4"
GPT_3 = "gpt-3"

# model_provider: { model_id: {parameter: val}}
models = {
    "ChatIBMGenAI": {
        "meta-llama/llama-3-70b-instruct": {"max_tokens": "2048", "key": IBM_LLAMA_70b},
        "meta-llama/llama-2-13b-chat": {"max_tokens": "1536", "key": IBM_LLAMA_13b},
        "mistralai/mixtral-8x7b-instruct-v01": {"key": IBM_MIXTRAL},
        "ibm/granite-13b-chat-v2": {"key": IBM_GRANITE},
    },
    "ChatOpenAI": {
        "gpt-3.5-turbo": {"key": GPT_4},
        "gpt-4": {"key": GPT_3},
    },
}

configs = {}

# create configs for all models with different parameters 
# we will use these as needed in our experiments
for model_provider, model_ids in models.items():
    for model_id, parameters in model_ids.items():
        configs[parameters.get("key", "")] = Template(templ.render(
            model_provider = model_provider,
            model_id = model_id,
            max_tokens = parameters.get("max_tokens", ""),
            prompt_template = "{{ prompt_template }}"
        ))

In [26]:
# this is common code we will use to evaluate response of LLM for one example
import signal
import requests
import threading
import subprocess
from kai.models.kai_config import KaiConfig
from kai.routes.get_incident_solutions_for_file import (
    PostGetIncidentSolutionsForFileParams,
)

# helper function to send requests to Kai service
def _generate_fix(params: PostGetIncidentSolutionsForFileParams):
    headers = {"Content-type": "application/json", "Accept": "text/plain"}
    response = requests.post(
        "http://0.0.0.0:8080/get_incident_solutions_for_file",
        data=params.model_dump_json(),
        headers=headers,
        timeout=3600,
    )
    return response
# helper function to send requests to Kai service
def generate_fix(params: PostGetIncidentSolutionsForFileParams):
    retries_left = 6
    for i in range(retries_left):
        try:
            response = _generate_fix(params)
            if response.status_code == 200:
                return response
            else:
                print(f"[{params.file_name}] Received status code {response.status_code}")
        except requests.exceptions.RequestException as e:
            print(f"[{params.file_name}] Received exception: {e}")
        print(f"[{params.file_name}] Failed to get a '200' response from the server.  Retrying {retries_left-i} more times")
    print(f"[{params.file_name}] Failed to get a '200' response from the server.  Parameters = {params}")

# write a Kai config to a known location 
def ensure_config(model_key: str, experiment_key: str, prompt_template: str) -> tuple[str, str]:
    ensure_dirs(f"{CONFIG_BASE_PATH}{experiment_key}")
    config_path = f"{CONFIG_BASE_PATH}{experiment_key}/{model_key}.toml"
    config = configs[model_key].render(prompt_template=prompt_template)
    with open(config_path, "w+") as f: f.write(config)
    config_parsed = KaiConfig.model_validate_filepath(config_path)
    return config_path, config_parsed

# run Kai service, this is used when we use Kai service instead of evaluate.py for generating fixes
def ensure_kai_service(output_path: str) -> list:
    processes = []
    def run_command(cmd: str, stdout: any): 
        p = subprocess.Popen(cmd, shell=True, cwd="../../", 
            env=os.environ.copy(), stdout=stdout, stderr=stdout)
        processes.append(p)
        p.wait()
    ensure_dirs(f"{output_path}")
    postgres_log = open(f"{output_path}/postgres.log", "w+")
    db_thread  = threading.Thread(target=run_command, args=("DROP_TABLES=true POSTGRES_RUN_ARGS=--rm make run-postgres", postgres_log, ))
    kai_log = open(f"{output_path}/kai.log", "w+")
    kai_thread = threading.Thread(target=run_command, args=("make run-server", kai_log, ))
    db_thread.start()
    subprocess.run(["make", "load-data"], cwd="../../")
    kai_thread.start()
    return processes

# helper function to kill processes gracefully, needed to clean up Kai service and db
def kill(processes: list):
    for p in processes:
        p.send_signal(signal.SIGINT)
        p.send_signal(signal.SIGTERM)

# this function runs "evaluate" function from evaluation.py and compares LLM responses with expected output to get edit distance
def run_evaluate_for_example(model_key: str, experiment_key: str, prompt_template: str, example: BenchmarkExample):
    config_path, config_parsed = ensure_config(model_key, experiment_key, prompt_template)
    full_response = evaluate(configs={config_path: config_parsed}, examples={example.original_file: example})
    response = full_response[(example.original_file, config_path)]
    output_path = f"{OUTPUT_BASE_PATH}{experiment_key}/{model_key}"
    ensure_dirs(output_path)
    with open(f"{output_path}/llm_response", "w+") as f: f.write(response.llm_result)
    with open(f"{output_path}/edit_distance", "w+") as f: f.write(f"{response.similarity}")

# this function sends example in a query to Kai service to get LLM response
def run_kai_response(model_key: str, experiment_key: str, prompt_template: str, example: BenchmarkExample):
    config_path, config_parsed = ensure_config(model_key, experiment_key, f"../notebooks/evaluation/{prompt_template}")
    subprocess.run(['cp', f'{config_path}', f'../../kai/config.toml'])
    output_path = f"{OUTPUT_BASE_PATH}{experiment_key}/{model_key}"
    processes = ensure_kai_service(output_path=output_path)
    file_contents = ""
    try:
        with open(example.original_file, "r") as f: file_contents = f.read()
        params = PostGetIncidentSolutionsForFileParams(
            application_name=example.application.application_name,
            file_contents=str(file_contents),
            file_name=example.original_file,
            include_llm_results=False,
        )
        response = generate_fix(params)
        ensure_dirs(output_path)
        print(response)
    except:
        print("failed to run evaluate")
    finally:
        kill(processes)
    pass

In [7]:
# make sure you set GENAI_KEY / OPENAI_KEY

%load_ext dotenv
%dotenv

## Zero Shot with No Analysis Information

In this section, we will evaluate performance for an easy fix. This fix simply requires updating import statements. We will run this with different models and will not provide any analysis information in the prompt. The prompt we will use can be found [here](./templates/zero_shot/example1.jinja). Since this fix is easy, we will use edit distance to understand accuracy of responses.

In [31]:
# this file only contains import replaces as fixes needed, an easy example
example = examples['src/main/java/com/redhat/coolstore/service/InventoryNotificationMDB.java']

# run with all models and plot graphs for edit distance and consistency
for model in [IBM_GRANITE, IBM_MIXTRAL, IBM_LLAMA_13b, IBM_LLAMA_70b]:
    #response = run_evaluate_for_example(
    #    model_key=model, experiment_key="zero_shot_easy", 
    #    prompt_template="./templates/zero_shot/template1.jinja", example=example)
    
    response = run_kai_response(model, "zero_shot_easy", "./templates/zero_shot/template1.jinja", example)

FileNotFoundError: [Errno 2] No such file or directory: './data/outputs/zero_shot_easy/ibm-granite/postgres.log'