# Few Shot: JPA - Rule: persistence-to-quarkus

> This notebook will work with a LLM to generate a fix for JPA related persistence rules found in 'cmt' and 'bmt'.  We will use 'bmt' as the solved example.

##### Sample Applications Used
* 2 sample apps from [JBoss EAP Quickstarts](https://github.com/jboss-developer/jboss-eap-quickstarts/tree/7.4.x) were chosen: cmt & bmt. 
* Christopher May [christophermay07](https://github.com/christophermay07) migrated these applications to Quarkus in https://github.com/christophermay07/quarkus-migrations.  
* We have created new repositories that contain the original Java EE state in the 'main' branch and Chris' Quarkus migration in the 'quarkus' branch 
    * [bmt](https://github.com/konveyor-ecosystem/bmt)
    * [cmt](https://github.com/konveyor-ecosystem/cmt)

##### Using Custom Rules for JPA
* Rules were developed by Juanma [@jmle](https://github.com/jmle)
    * Rules originally from: https://github.com/jmle/kai-examples/blob/main/analyzer-lsp-rules/01-persistence-to-quarkus.windup.yaml
    * Specific rule this notebook will fix: `persistence-to-quarkus-00000`
            The persistence API of JavaEE provides all sorts of utilities for accessing databases in an object oriented manner (object-relational mapping and such). It is 100% supported, but still needs to be migrated.

            The Persistence API in JavaEE has the concept of datasources and persistence units. These are abstractions of the underlying databases for ORM purposes, and they need to be configured. In JavaEE, XML files are used for this, but in Quarkus all this configuration can (and should) be moved to a single, centralized *.properties file.

            This rule simply looks for the existence of the XML configuration rules.
    * persistence-to-quarkus-00011
            This is a rule to remove an unneeded (and illegal) `@Produces` annotation on the `@EntityManager`. We want to to see if the LLM will suggest to add the Qualifier, as explained in the message.
    
    


In [1]:
# Install local kai package in the current Jupyter kernel
import sys

!{sys.executable} -m pip install -e ../../

Obtaining file:///Users/jmatthews/git/jwmatthews/kai
  Preparing metadata (setup.py) ... [?25ldone
[?25hInstalling collected packages: kai
  Attempting uninstall: kai
    Found existing installation: kai 0.0.1
    Uninstalling kai-0.0.1:
      Successfully uninstalled kai-0.0.1
  Running setup.py develop for kai
Successfully installed kai-0.0.1


In [2]:
# Bring in the common deps
import os
import pprint

from langchain import PromptTemplate
from langchain.chat_models import ChatOpenAI
from langchain.chains import LLMChain

pp = pprint.PrettyPrinter(indent=2)


def get_java_in_result(text: str):
    languages = ["java", "xml"]
    for lang in languages:
        _, updated_file = text.split(f"## Updated File")
        try:
            _, after = updated_file.split(f"```{lang}")
            return after.split("```")[0]
        except ValueError:
            # ValueError: not enough values to unpack (expected 2, got 1)
            print(f"Didn't find a result for language '{lang}'")
    print(f"Unable to find any Java or XML in the result:\n {text}")
    raise ValueError("Unable to parse result")


def write_output_to_disk(
    out_dir,
    formatted_template,
    example_javaee_file_contents,
    example_quarkus_file_contents,
    result,
):
    try:
        os.makedirs(out_dir, exist_ok=True)
    except OSError as error:
        print(f"Error creating directory {out_dir}: {error}")
        raise error

    # Save the template to disk
    template_path = os.path.join(out_dir, "template.txt")
    with open(template_path, "w") as f:
        f.truncate()
        f.write(formatted_template)
    print(f"Saved template to {template_path}")

    # Save the result
    result_path = os.path.join(out_dir, "result.txt")
    with open(result_path, "w") as f:
        f.truncate()
        f.write(result)
    print(f"Saved result to {result_path}")

    # Extract the updated java code and save it
    updated_file_contents = get_java_in_result(result)
    updated_file_path = os.path.join(out_dir, "updated_file.java")
    with open(updated_file_path, "w") as f:
        f.truncate()
        f.write(updated_file_contents)
    print(f"Saved updated java file to {updated_file_path}")

    # Save the original source code
    original_file_path = os.path.join(out_dir, "original_file.java")
    with open(original_file_path, "w") as f:
        f.truncate()
        f.write(example_javaee_file_contents)
    print(f"Saved original java file to {original_file_path}")

    # Save the Quarkus version we already in Git to use as a comparison
    known_quarkus_file_path = os.path.join(out_dir, "quarkus_version_from_git.java")
    with open(known_quarkus_file_path, "w") as f:
        f.truncate()
        f.write(example_quarkus_file_contents)
    print(f"Saved Quarkus version from Git to {known_quarkus_file_path}")

# Create a Prompt
## Step #1:  Create a Prompt Template


In [3]:
template_with_solved_example_and_analysis = """
    # Java EE to Quarkus Migration
    You are an AI Assistant trained on migrating enterprise JavaEE code to Quarkus.
    I will give you an example of a JavaEE file and you will give me the Quarkus equivalent.

    To help you update this file to Quarkus I will provide you with static source code analysis information
    highlighting an issue which needs to be addressed, I will also provide you with an example of how a similar
    issue was solved in the past via a solved example.  You can refer to the solved example for a pattern of
    how to update the input Java EE file to Quarkus.

    Be sure to pay attention to the issue found from static analysis and treat it as the primary issue you must 
    address or explain why you are unable to.

    Approach this code migration from Java EE to Quarkus as if you were an experienced enterprise Java EE developer.
    Before attempting to migrate the code to Quarkus, explain each step of your reasoning through what changes 
    are required and why. 

    Pay attention to changes you make and impacts to external dependencies in the pom.xml as well as changes 
    to imports we need to consider.

    As you make changes that impact the pom.xml or imports, be sure you explain what needs to be updated.
    
    After you have shared your step by step thinking, provide a full output of the updated file:

    # Input information
    ## Issue found from static code analysis of the Java EE code which needs to be fixed to migrate to Quarkus
    Issue to fix:  "{analysis_message}"

    ## Solved Example Filename
    Filename: "{solved_example_file_name}"

    ## Solved Example Git Diff 
    This diff of the solved example shows what changes we made in past to address a similar problem.
    Please consider this heavily in your response.
    ```diff
    {solved_example_diff}
    ```

    ## Input file name
    Filename: "{src_file_name}"

    ## Input Line number of the issue first appearing in the Java EE code source code example below
    Line number: {analysis_lineNumber}
    
    ## Input source code file contents for "{src_file_name}"
    ```java 
    {src_file_contents}
    ```

    # Output Instructions
    Structure your ouput in Markdown format such as:

    ## Reasoning
    Write the step by step reasoning in this markdown section.
    If you are unsure of a step or reasoning, clearly state you are unsure and why.

    ## Updated File
    ```java
        Write the updated file for Quarkus in this section
    ```
   
    """

## Step #2: Find the common incidents we see in both applications

In [4]:
from kai.report import Report

ruleset_name = "kai/quarkus"


def find_common_rules(report1, report2, ruleset_name):
    common_rules = set(report1.report[ruleset_name]["violations"].keys()) & set(
        report2.report[ruleset_name]["violations"].keys()
    )
    return common_rules


# Static code analysis was run prior and committed to this repo
path_cmt_analysis = "./analysis_report/cmt/output.yaml"
path_bmt_analysis = "./analysis_report/bmt/output.yaml"

# We will assume that cmt is the app we want to migrate
# We will assume that bmt serves as our solved example

cmt_report = Report.load_report_from_file(path_cmt_analysis)
bmt_report = Report.load_report_from_file(path_bmt_analysis)

common_rule_names = find_common_rules(cmt_report, bmt_report, ruleset_name)
common_rule_names

Reading report from ./analysis_report/cmt/output.yaml
Reading report from ./analysis_report/bmt/output.yaml


{'persistence-to-quarkus-00000'}

## Step #3 Define a function to extract the analysis incident metadata

In [5]:
def get_incident_data(report, ruleset_name, rule):
    return report.report[ruleset_name]["violations"][rule]

## Step #4: Setup access to the source code in both applications via Git Repos


In [6]:
from kai.scm import GitDiff

##
# The App to MIGRATE will be cmt
cmt_src_path = "../../kai_solution_server/samples/sample_repos/cmt"
cmt_diff = GitDiff(cmt_src_path)
# Ensure we checked out the 'quarkus' branch of the CMT repo
cmt_diff.checkout_branch("quarkus")

###
# SOLVED EXAMPLE will be bmt
###
# Setup access to look at the Git repo source code for each sample app
bmt_src_path = "../../kai_solution_server/samples/sample_repos/bmt"
bmt_diff = GitDiff(bmt_src_path)
# We checkout the quarkus branch so it's avaialble for us to consult, but we plan to use 'main'
bmt_diff.checkout_branch("quarkus")
# Ensure we checked out the 'main' branch of the bmt repo
bmt_diff.checkout_branch("main")

# Now we can extract the info we need
## We will assume:
## .   cmt will be our source application to migrate, so we will use it's 'main' branch which maps to Java EE
## .   bmt will serve as our solved example, so we will consult it's 'quarkus' branch

## Step 5: Define function to generate our prompt


In [7]:
def create_template_args_from_incident(
    cmt_report, bmt_report, cmt_diff, bmt_diff, ruleset_name, rule_name
):
    cmt_incident = get_incident_data(cmt_report, ruleset_name, rule_name)
    bmt_incident = get_incident_data(bmt_report, ruleset_name, rule_name)

    # We assume that we have at least one incident for each rule
    # will only use the first and ignore extras
    cmt_entry = cmt_incident["incidents"][0]
    print(f"Using cmt incident: {cmt_entry}")
    src_file_name = cmt_report.get_cleaned_file_path(cmt_entry["uri"])
    src_file_contents = cmt_diff.get_file_contents(src_file_name, "main")
    analysis_message = cmt_entry["message"]
    ###
    # Note that for a match of 'persistence-to-quarkus-00000' we will NOT have a lineNumber
    ###
    analysis_lineNumber = cmt_entry.get("lineNumber", "")

    bmt_entry = bmt_incident["incidents"][0]
    solved_example_file_name = bmt_report.get_cleaned_file_path(bmt_entry["uri"])

    start_commit_id = bmt_diff.get_commit_from_branch("main").hexsha
    end_commit_id = bmt_diff.get_commit_from_branch("quarkus").hexsha

    solved_example_diff = bmt_diff.get_patch_for_file(
        start_commit_id, end_commit_id, solved_example_file_name
    )

    template_args = {
        "src_file_name": src_file_name,
        "src_file_contents": src_file_contents,
        "analysis_message": analysis_message,
        "analysis_lineNumber": analysis_lineNumber,
        "solved_example_file_name": solved_example_file_name,
        "solved_example_diff": solved_example_diff,
    }
    return template_args

# Step 6: Run through all the common rule violations

In [8]:
model_name = "gpt-4-1106-preview"
# model_name = "gpt-3.5-turbo-16k"
llm = ChatOpenAI(temperature=0.1, model_name=model_name)

# common_rule_names computed in earlier cell
for rule_name in common_rule_names:
    print(f"Processing rule: {rule_name}")
    template_args = create_template_args_from_incident(
        cmt_report,
        bmt_report,
        cmt_diff,
        bmt_diff,
        ruleset_name,
        rule_name,
    )
    formatted_prompt = template_with_solved_example_and_analysis.format(**template_args)
    prompt = PromptTemplate.from_template(template_with_solved_example_and_analysis)
    chain = LLMChain(llm=llm, prompt=prompt)
    result = chain.run(template_args)

    src_file_name = template_args["src_file_name"]
    src_file_contents = template_args["src_file_contents"]
    # For comparison, we will look up what the source file looks like from Quarkus branch
    # this serves as a way of comparing to what the 'answer' is that was done manually by a knowledgeable developer
    src_file_from_quarkus_branch = bmt_diff.get_file_contents(src_file_name, "quarkus")

    output_dir = f"output/{model_name}/cmt/{ruleset_name}/{rule_name}/few_shot/"
    write_output_to_disk(
        output_dir,
        formatted_prompt,
        src_file_contents,
        src_file_from_quarkus_branch,
        result,
    )
    print(f"Completed processing of rule: {rule_name}")

    ###
    # TODO: We have some work to do on extracting the result.  This example is a case where
    # we are removing the original file (XML file) and we will be creating a new application.properties file
    # the parsing logic we have been using for these examples is too simple for this case,
    # i.e this isn't a case of updating an existing java file.
    ###

  warn_deprecated(
  warn_deprecated(


Processing rule: persistence-to-quarkus-00000
Using cmt incident: {'uri': 'file:///tmp/source-code/src/main/resources/META-INF/persistence.xml', 'message': 'It is recommended to move persistence related configuration from an XML file to a properties one.\n This allows centralization of the configuration in Quarkus. Check the link for more information.\n \n \n Datasource and persistence configurations in XML can be substituted with a single centralized properties file. Here is an example of a translation:\n \n The following datasource configuration:\n ```\n <datasources xmlns="http://www.jboss.org/ironjacamar/schema"\n xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"\n xsi:schemaLocation="http://www.jboss.org/ironjacamar/schema http://docs.jboss.org/ironjacamar/schema/datasources_1_0.xsd">\n <!-- The datasource is bound into JNDI at this location. We reference\n this in META-INF/persistence.xml -->\n <datasource jndi-name="java:jboss/datasources/TasksJsfQuickstartDS"\n pool-name="t

Should we consider iterating over fixes one at a time?
