# AI tool to generate unit tests for the provided Java code

Here we build a Gradio App that uses the frontier models to generate unit tests for a java code. For testing purposes I have used the *cheaper* versions of the models, not the ones the leaderboards indicate as the best ones.

In [0]:
# imports

import os
from dotenv import load_dotenv
from openai import OpenAI
import google.generativeai as genai
import anthropic
import gradio as gr

In [0]:
# environment

load_dotenv(override=True)
openai_api_key = os.getenv('OPENAI_API_KEY')
anthropic_api_key = os.getenv('ANTHROPIC_API_KEY')
google_api_key = os.getenv('GOOGLE_API_KEY')

In [0]:
openai = OpenAI()
claude = anthropic.Anthropic()
genai.configure()

In [0]:
OPENAI_MODEL = "gpt-4o-mini"
CLAUDE_MODEL = "claude-3-haiku-20240307"
GEMINI_MODEL = 'gemini-2.0-flash-lite'

In [0]:
system_message = "You are an assistant that generates unit test for java code. "
system_message += "Generate one JUnit5 test class with all the relevant test cases in it."

In [0]:
def user_prompt_for(code):
    user_prompt = "Generate unit tests for this java code.\n\n"
    user_prompt += code
    return user_prompt

In [0]:
test_code = """
package com.hma.kafkaproducertest.rest;

import com.hma.kafkaproducertest.model.TestDTO;
import com.hma.kafkaproducertest.producer.TestProducer;
import org.springframework.web.bind.annotation.*;

@RestController
@RequestMapping("/api")
public class TestController {

    private final TestProducer producer;

    public TestController(TestProducer producer) {
        this.producer = producer;
    }

    @PostMapping("/event")
    public TestDTO triggerKafkaEvent(@RequestBody TestDTO payload) {
        producer.sendMessage(payload, "test");
        return payload;
    }

}

"""

In [0]:
def stream_gpt(code):
    messages = [
        {"role": "system", "content": system_message},
        {"role": "user", "content": user_prompt_for(code)}
      ]
    stream = openai.chat.completions.create(
        model=OPENAI_MODEL,
        messages=messages,
        stream=True
    )
    result = ""
    for chunk in stream:
        result += chunk.choices[0].delta.content or ""
        yield result

In [0]:
def stream_claude(code):
    result = claude.messages.stream(
        model=CLAUDE_MODEL,
        max_tokens=2000,
        system=system_message,
        messages=[
            {"role": "user", "content": user_prompt_for(code)},
        ],
    )
    response = ""
    with result as stream:
        for text in stream.text_stream:
            response += text or ""
            yield response

In [0]:
def stream_gemini(code):
    gemini = genai.GenerativeModel(
        model_name=GEMINI_MODEL,
        system_instruction=system_message
    )
    stream = gemini.generate_content(user_prompt_for(code), stream=True)
    result = ""
    for chunk in stream:
        result += chunk.text or ""
        yield result

In [0]:
def generate_tests(code, model):
    if model=="GPT":
        result = stream_gpt(code)
    elif model=="Claude":
        result = stream_claude(code)
    elif model=="Gemini":
        result = stream_gemini(code)
    else:
        raise ValueError("Unknown model")
    yield from result

In [0]:
with gr.Blocks() as ui:
    with gr.Row():
        original_code = gr.Textbox(label="Java code:", lines=10, value=test_code)
        generated_code = gr.Markdown(label="Unit tests:")
    with gr.Row():
        model = gr.Dropdown(["GPT", "Claude", "Gemini"], label="Select model", value="GPT")
        generate = gr.Button("Generate tests")

    generate.click(generate_tests, inputs=[original_code, model], outputs=[generated_code])

ui.launch(inbrowser=True)

In [0]:
ui.close()

## Conclusion

The models are missing some information as the `TestDTO` is not defined in the code provided as an input.

Results:
- Gemini: Generates a well constructed test class with multiple test cases covering scenarios with valid and invalid inputs. It makes assumptions about the content of `TestDTO` and adds a note about those as a comment.
- Claude: Similar approach to unknown format of `TestDTO`, although no comment added about the assumptions made. The test cases are strutured differently, and they don't cover any case of invalid input, which in my opinion is an important test for a REST endpoint.
- GPT: While the other two generated *real* unit tests using the mockito extension, GPT generated a *webMVC* test. The other two relied on the equality impelemntation of `TestDTO`, while GPT checks separately each field in the response. As this type of test spins up the application context, the test won't run without additional configuration. In addition, some imports are missing from the test file.

It comes down to personal preferences, but I would give the point to Gemini for this one.