# Few Shot: JMS -> SmallRye Reactive - jms-to-reactive-quarkus-00010

> Work with a LLM to generate a fix for the rule: jms-to-reactive-quarkus-00010

We will include the solved example in this example to see if we get better quality results

##### Sample Applications Used
* 2 sample apps from [JBoss EAP Quickstarts](https://github.com/jboss-developer/jboss-eap-quickstarts/tree/7.4.x) were chosen:  helloworld-mdb & cmt
    * [helloworld-mdb](https://github.com/savitharaghunathan/helloworld-mdb)
    * [cmt](https://github.com/konveyor-ecosystem/cmt)

##### Using Custom Rules for JMS to SmallRye Reactive
* Rules were developed by Juanma [@jmle](https://github.com/jmle)
    * Rules originally from: https://github.com/jmle/rulesets/blob/jms-rule/default/generated/quarkus/05-jms-to-reactive-quarkus.windup.yaml


In [None]:
# Install local kai package in the current Jupyter kernel
import sys

!{sys.executable} -m pip install -e ../../../

In [1]:
# Bring in the common deps
import os
import pprint

from langchain import PromptTemplate
from langchain_community.chat_models import ChatOpenAI
from langchain.chains import LLMChain

pp = pprint.PrettyPrinter(indent=2)


def get_java_in_result(text: str):
    _, after = text.split("```java")
    return after.split("```")[0]


def write_output_to_disk(
    out_dir,
    formatted_template,
    example_javaee_file_contents,
    example_quarkus_file_contents,
):
    try:
        os.makedirs(out_dir, exist_ok=True)
    except OSError as error:
        print(f"Error creating directory {out_dir}: {error}")
        raise error

    # Save the template to disk
    template_path = os.path.join(out_dir, "template.txt")
    with open(template_path, "w") as f:
        f.truncate()
        f.write(formatted_template)
    print(f"Saved template to {template_path}")

    # Save the original source code
    original_file_path = os.path.join(out_dir, "original_file.java")
    with open(original_file_path, "w") as f:
        f.truncate()
        f.write(example_javaee_file_contents)
    print(f"Saved original java file to {original_file_path}")

    # Save the Quarkus version we already in Git to use as a comparison
    known_quarkus_file_path = os.path.join(out_dir, "quarkus_version_from_git.java")
    with open(known_quarkus_file_path, "w") as f:
        f.truncate()
        f.write(example_quarkus_file_contents)
    print(f"Saved Quarkus version from Git to {known_quarkus_file_path}")

# Create a Prompt
## Step #1:  Create a Prompt Template


In [2]:
template_with_solved_example_and_analysis = """
    # Java EE to Quarkus Migration
    You are an AI Assistant trained on migrating enterprise JavaEE code to Quarkus.
    I will give you an example of a JavaEE file and you will give me the Quarkus equivalent.

    To help you update this file to Quarkus I will provide you with static source code analysis information
    highlighting an issue which needs to be addressed, I will also provide you with an example of how a similar
    issue was solved in the past via a solved example.  You can refer to the solved example for a pattern of
    how to update the input Java EE file to Quarkus.

    Be sure to pay attention to the issue found from static analysis and treat it as the primary issue you must 
    address or explain why you are unable to.

    Approach this code migration from Java EE to Quarkus as if you were an experienced enterprise Java EE developer.
    Before attempting to migrate the code to Quarkus, explain each step of your reasoning through what changes 
    are required and why. 

    Pay attention to changes you make and impacts to external dependencies in the pom.xml as well as changes 
    to imports we need to consider.

    As you make changes that impact the pom.xml or imports, be sure you explain what needs to be updated.
    
    {format_instructions}
    
### Input:
    ## Issue found from static code analysis of the Java EE code which needs to be fixed to migrate to Quarkus
    Issue to fix:  "{analysis_message}"

    ## Solved Example Filename
    Filename: "{solved_example_file_name}"

    ## Solved Example Git Diff 
    This diff of the solved example shows what changes we made in past to address a similar problem.
    Please consider this heavily in your response.
    ```diff
    {solved_example_diff}
    ```

    ## Input file name
    Filename: "{src_file_name}"

    ## Input Line number of the issue first appearing in the Java EE code source code example below
    Line number: {analysis_lineNumber}
    
    ## Input source code file contents for "{src_file_name}"
    ```java 
    {src_file_contents}
    ```
    """

# Step 2: Gather the data we want to inject into the prompt

In [3]:
from kai.report import Report
from kai.scm import GitDiff

# Static code analysis was run prior and committed to this repo
path_cmt_analysis = "./analysis_report/cmt/output.yaml"
path_helloworld_analysis = "./analysis_report/helloworld-mdb-quarkus/output.yaml"

cmt_report = Report.load_report_from_file(path_cmt_analysis)
helloworld_report = Report.load_report_from_file(path_helloworld_analysis)

# We are limiting our experiment to a single rule for this run
ruleset_name = "custom-ruleset"
rule = "jms-to-reactive-quarkus-00010"
cmt_rule_data = cmt_report.report[ruleset_name]["violations"][rule]
helloworld_rule_data = helloworld_report.report[ruleset_name]["violations"][rule]

# We are expecting to find 1 impacted file in CMT and 2 impacted files in HelloWorld
assert len(cmt_rule_data["incidents"]), 1
assert len(helloworld_rule_data["incidents"]), 2

# Setup access to look at the Git repo source code for each sample app
cmt_src_path = "../../../kai_solution_server/samples/sample_repos/cmt"
cmt_diff = GitDiff(cmt_src_path)

# Ensure we checked out the 'quarkus' branch of the CMT repo
cmt_diff.checkout_branch("quarkus")

helloworld_src_path = "../../../kai_solution_server/samples/sample_repos/helloworld-mdb-quarkus"
helloworld_diff = GitDiff(helloworld_src_path)

# We want to be sure the the 'quarkus' branch of both repos has been checked out
# so it's available to consult
cmt_diff.checkout_branch("quarkus")
helloworld_diff.checkout_branch("quarkus")
# Resetting to 'main' for HelloWorld as that represents th initial state of the repo
helloworld_diff.checkout_branch("main")

# Now we can extract the info we need
## We will assume:
## .   HelloWorld will be our source application to migrate, so we will use it's 'main' branch which maps to Java EE
## .   CMT will serve as our solved example, so we will consult it's 'quarkus' branch

hw_entry = helloworld_rule_data["incidents"][0]
src_file_name = helloworld_report.get_cleaned_file_path(hw_entry["uri"])
src_file_contents = helloworld_diff.get_file_contents(src_file_name, "main")
analysis_message = hw_entry["message"]
analysis_lineNumber = hw_entry["lineNumber"]

cmt_entry = cmt_rule_data["incidents"][0]
solved_example_file_name = cmt_report.get_cleaned_file_path(cmt_entry["uri"])
# solved_file_contents = cmt_diff.get_file_contents(solved_example_file_name, "quarkus")

start_commit_id = cmt_diff.get_commit_from_branch("main").hexsha
end_commit_id = cmt_diff.get_commit_from_branch("quarkus").hexsha

solved_example_diff = cmt_diff.get_patch_for_file(
    start_commit_id, end_commit_id, solved_example_file_name
)

Reading report from ./analysis_report/cmt/output.yaml
Reading report from ./analysis_report/helloworld-mdb-quarkus/output.yaml


# Step 3: Run the prompt through the LLM and write the output to disk

In [4]:
import caikit_tgis_langchain
import traceback

from langchain_community.chat_models import ChatOpenAI
from langchain.output_parsers import PydanticOutputParser
from langchain_core.pydantic_v1 import BaseModel, Field, validator
from typing import List

from langchain_core.messages import HumanMessage, SystemMessage
from genai import Client, Credentials
from genai.extensions.langchain.chat_llm import LangChainChatInterface
from genai.schema import (
    DecodingMethod,
    ModerationHAP,
    ModerationParameters,
    TextGenerationParameters,
    TextGenerationReturnOptions,
)


class UpdatedFile(BaseModel):
    file_name: str = Field(description="File name of udpated file")
    file_contents: str = Field(description="Contents of the updated file")


class Response(BaseModel):
    reasoning: List[str] = Field(description="Process Explanation")
    updated_files: List[UpdatedFile] = Field(
        description="List containing updated files"
    )


response_parser = PydanticOutputParser(pydantic_object=Response)

# For first run we will only include the source file and analysis info (we are omitting solved example)
template_args = {
    "src_file_name": src_file_name,
    "src_file_contents": src_file_contents,
    "analysis_message": analysis_message,
    "analysis_lineNumber": analysis_lineNumber,
    "solved_example_file_name": solved_example_file_name,
    "solved_example_diff": solved_example_diff,
    "format_instructions": response_parser.get_format_instructions(),
}

formatted_prompt = template_with_solved_example_and_analysis.format(**template_args)

# For comparison, we will look up what the source file looks like from Quarkus branch
# this serves as a way of comparing to what the 'answer' is that was done manually by a knowledgeable developer
src_file_from_quarkus_branch = helloworld_diff.get_file_contents(
    src_file_name, "quarkus"
)

### Choose one of the options below to run against
### Uncomment the provider, one model, and the llm

## OpenAI: Specify a model and make sure OPENAI_API_KEY is exported
# provider = "openai"
# model = "gpt-3.5-turbo"
# model = "gpt-3.5-turbo-16k"
# model = "gpt-4-1106-preview"
# llm = ChatOpenAI(temperature=0.1,
#                 model_name=model,
#                 streaming=True)

## Oobabooga: text-gen uses the OpenAI API so it works similarly.
## We load a model and specify temperature on the server so we don't need to specify them.
## We've had decent results with Mistral-7b and Codellama-34b-Instruct
## Just the URL for the server and make sure OPENAI_API_KEY is exported
# provider = "text-gen"
# model = "CodeLlama-7b-Instruct-hf"
# model = "CodeLlama-13b-Instruct-hf"
# model = "CodeLlama-34b-Instruct-hf"
# model = "CodeLlama-70b-Instruct-hf"
# model = "starcoder"
# model = "starcoder2-3b"
# model = "starcoder2-7b"
# model = "starcoder2-15b"
# model = "WizardCoder-15B-V1.0"
# model = "Mistral-7B-v0.1"
# model = "Mixtral-8x7B-v0.1"
# llm = ChatOpenAI(openai_api_base="https://text-gen-api-text-gen.apps.ai.migration.redhat.com/v1",
#                 temperature=0.1,
#                 streaming=True)

## OpenShift AI:
## We need to make sure caikit-nlp-client is installed.
## As well as caikit_tgis_langchain.py: https://github.com/rh-aiservices-bu/llm-on-openshift/blob/main/examples/notebooks/langchain/caikit_tgis_langchain.py
## Then set the model_id and server url
## But we are having issues with responses: https://github.com/opendatahub-io/caikit-nlp-client/issues/95
# provider = "openshift-ai"
# model = "codellama-7b-hf"
# llm = caikit_tgis_langchain.CaikitLLM(inference_server_url="https://codellama-7b-hf-predictor-kyma-workshop.apps.ai.migration.redhat.com",
#                                      model_id="CodeLlama-7b-hf",
#                                      streaming=True)

## IBM Models:
## Ensure GENAI_KEY is exported. Change the model if desired.
# provider = "ibm"
# model = "ibm/granite-13b-instruct-v1"
# model = "ibm/granite-13b-instruct-v2"
# model = "ibm/granite-20b-5lang-instruct-rc"
# model = "ibm/granite-20b-code-instruct-v1"
# model = "ibm/granite-20b-code-instruct-v1-gptq"
# model = "ibm-mistralai/mixtral-8x7b-instruct-v01-q"
# model = "codellama/codellama-34b-instruct"
# model = "codellama/codellama-70b-instruct"
# model = "kaist-ai/prometheus-13b-v1"
# model = "mistralai/mistral-7b-instruct-v0-2"
# model = "thebloke/mixtral-8x7b-v0-1-gptq"
# llm = LangChainChatInterface(
#    client=Client(credentials=Credentials.from_env()),
#    model_id=model,
#    parameters=TextGenerationParameters(
#        decoding_method=DecodingMethod.SAMPLE,
#        max_new_tokens=4096,
#        min_new_tokens=10,
#        temperature=0.1,
#        top_k=50,
#        top_p=1,
#        return_options=TextGenerationReturnOptions(input_text=False, input_tokens=True),
#    ),
#    moderations=ModerationParameters(
#        # Threshold is set to very low level to flag everything (testing purposes)
#        # or set to True to enable HAP with default settings
#        hap=ModerationHAP(input=True, output=False, threshold=0.01)
#    ),
# )

prompt = PromptTemplate.from_template(template_with_solved_example_and_analysis)
chain = LLMChain(llm=llm, prompt=prompt, output_parser=response_parser)

model = model.replace("/", "_")
output_dir = "output/" + provider + "_" + model + "/few_shot_pydantic/"
write_output_to_disk(
    output_dir,
    formatted_prompt,
    src_file_contents,
    src_file_from_quarkus_branch,
)

try:
    result = chain.invoke(template_args)
    with open(output_dir + "result.txt", "a") as f:
        f.write("### Reasoning:\n")
        for i in range(len(result["text"].reasoning)):
            f.write(result["text"].reasoning[i])
        f.write("\n")
        for i in range(len(result["text"].updated_files)):
            f.write("### Updated file " + str((i + 1)) + "\n")
            f.write(result["text"].updated_files[i].file_name + ":")
            f.write(result["text"].updated_files[i].file_contents)
    print("Saved result in " + output_dir + "result.txt")
except Exception as e:
    with open(output_dir + "traceback.txt", "a") as f:
        f.write(str(e))
        f.write(traceback.format_exc())
    print(
        "Something went wrong. A traceback was saved in " + output_dir + "traceback.txt"
    )

Saved template to output/openai_gpt-3.5-turbo-16k/few_shot_pydantic/template.txt
Saved original java file to output/openai_gpt-3.5-turbo-16k/few_shot_pydantic/original_file.java
Saved Quarkus version from Git to output/openai_gpt-3.5-turbo-16k/few_shot_pydantic/quarkus_version_from_git.java
Saved result in output/openai_gpt-3.5-turbo-16k/few_shot_pydantic/result.txt


# Step 4: Print results

In [None]:
print("### Reasoning:\n")
for i in range(len(result["text"].reasoning)):
    print(result["text"].reasoning[i])

print("\n")

for i in range(len(result["text"].updated_files)):
    print("### Updated file " + str((i + 1)) + "\n")
    print(result["text"].updated_files[i].file_name + ":")
    print(result["text"].updated_files[i].file_contents)