# Fake comment correction using Gemma 3

A few years ago, I had a dispute with an external developer who flat-out refused to write any documentation in the code. No matter how often I brought it up, he insisted that documentation was a waste of time and that “clean code should be self-explanatory.”

At some point, the discussion escalated. In a mixture of defiance and sarcasm, he said that if we ever imposed a rule requiring code comments, he would just paste lyrics from Helene Fischer songs into the comments &ndash; because, in his words, “you wouldn't be able to check anyway.”

Well… now I can.

I've created the following demo that uses a local Large Language Model to detect all kind of fake comments and also directly tries to create better comment for those source code files.

*Please, don't take the original story and this analysis too dead serious ;-) .*

**Serious background information:**

This is a demo of human-in-the-loop modernization scripts. You can query your codebase (e.g. like here with Tree-sitter) or other artifacts for potential issues, review the analysis results, and then apply a transformation recipe that (in the best case) addresses all identified problems. While this example is quite simple, it provides a solid starting point for developing your own automated migration scripts usind Large Language Models. Have fun!

## Finding class comments

### Create file list

In [1]:
import glob
import os

root_dir = "/mnt/c/dev/repos/spring-petclinic_joa_llm"

java_files = glob.glob(os.path.join(root_dir, "**/*.java"), recursive=True)
java_files[:5]

['/mnt/c/dev/repos/spring-petclinic_joa_llm/src/main/java/org/springframework/samples/petclinic/PetclinicInitializer.java',
 '/mnt/c/dev/repos/spring-petclinic_joa_llm/src/main/java/org/springframework/samples/petclinic/architecture/ActualSubdomain.java',
 '/mnt/c/dev/repos/spring-petclinic_joa_llm/src/main/java/org/springframework/samples/petclinic/architecture/ActualTechnicalAspect.java',
 '/mnt/c/dev/repos/spring-petclinic_joa_llm/src/main/java/org/springframework/samples/petclinic/architecture/BoundedContext.java',
 '/mnt/c/dev/repos/spring-petclinic_joa_llm/src/main/java/org/springframework/samples/petclinic/architecture/MixedBoundedContexts.java']

### Find all relevant comments with Tree-sitter

In [2]:
import tree_sitter_java as tsjava
from tree_sitter import Language, Parser
import pandas as pd

parser = Parser(Language(tsjava.language()))

comments = []

def walk_tree(node):
    """Yield all nodes in the subtree rooted at the given node."""
    yield node
    for child in node.children:
        yield from walk_tree(child)

def extract_class_comments_from_code(code):
    byte_code = code.encode('utf-8')
    tree = parser.parse(byte_code)
    root_node = tree.root_node

    # Collect all comment nodes
    comment_nodes = [n for n in walk_tree(root_node) if n.type == 'block_comment']

    class_comments = []

    for node in walk_tree(root_node):
        if node.type == 'class_declaration':
            class_name = next(
                (c.text.decode('utf-8') for c in node.children if c.type == 'identifier'),
                "<unknown>"
            )
            # Find nearest preceding comment
            preceding_comments = [c for c in comment_nodes if c.end_byte < node.start_byte]
            comment_node = preceding_comments[-1] if preceding_comments else None
            comment_text = comment_node.text.decode('utf-8') if comment_node else None

            if comment_node and comment_node:
                comment_entry = {}
                comment_entry['path'] = path
                comment_entry['comment'] = comment_text
                comment_entry['type'] = comment_node.type
                comment_entry['start'] = comment_node.start_point.row 
                comment_entry['end'] = comment_node.end_point.row
                comments.append(comment_entry)
         
    return comments

# Now loop over the list of files
for path in java_files:
    if not os.path.isfile(path):
        continue
    with open(path, 'r', encoding='utf-8') as f:
        code = f.read()
        comments = extract_class_comments_from_code(code)

# Show first 5 entries
pd.DataFrame(comments).head()

Unnamed: 0,path,comment,type,start,end
0,/mnt/c/dev/repos/spring-petclinic_joa_llm/src/...,"/**\n * In Servlet 3.0+ environments, this cla...",block_comment,32,42
1,/mnt/c/dev/repos/spring-petclinic_joa_llm/src/...,/**\n * Simple JavaBean domain object with an ...,block_comment,22,27
2,/mnt/c/dev/repos/spring-petclinic_joa_llm/src/...,/**\n * Simple JavaBean domain object adds a n...,block_comment,21,27
3,/mnt/c/dev/repos/spring-petclinic_joa_llm/src/...,/**\n * Simple JavaBean domain object represen...,block_comment,36,43
4,/mnt/c/dev/repos/spring-petclinic_joa_llm/src/...,/**\n * Simple JavaBean domain object represen...,block_comment,22,26


## Assessing the comments

### Set up and configure local LLM

In [3]:
from openai import OpenAI

base_url = os.environ.get("OLLAMA_BASE_URL", "http://localhost:11434") + "/v1"

client = OpenAI(base_url=base_url)


def ask_llm_for_text(prompt):

    response = client.completions.create(
      model="gemma3:27b",
      prompt=prompt
    )

    return response.choices[0].text
    
def ask_llm_for_number(prompt):
    return float(ask_llm_for_text(prompt))


def ask_llm_for_markup_content(prompt):
    content = ask_llm_for_text(prompt).strip()
    lines = content.splitlines()

    if lines and lines[0].startswith("```") and lines[-1].strip() == "```":
        # Return everything in between
        return "\n".join(lines[1:-1]).strip()

    return content

### Define the prompt template for the indicator

In [4]:
prompt_template_indicator = f"""

Tell me, using an indicator from 0.0 to 1.0, whether the comment is a usable Java comment, a bad comment, or a fake comment.

- A usable comment provides a clear and meaningful description of what the code is supposed to do. The indicator should be close to 1.0.
- A bad comment gives only a vague or incomplete explanation of the code, or contains nothing more than author or version tags. The indicator should be around 0.5 +/- 0.15.
- A fake comment contains text that has nothing to do with the code, versions or mentioned authors in the comments at all. The indicator should be close to 0.0.

Be aware that some developers might get very creative when it comes to hiding fake comments.

Return only the numeric value. Default to -1 if you are unable to evaluate the comment."""

### Call the LLM

In [5]:
from tqdm import tqdm

for entry in tqdm(comments):
    prompt = "\n\n".join([prompt_template_indicator, entry['comment'], entry['path']])
    indicator = ask_llm_for_number(prompt)
    entry['indicator'] = indicator

100%|███████████████████████████████████████████████████████████████████████████████████| 57/57 [01:48<00:00,  1.90s/it]


### Show top 5 findings

In [6]:
result = pd.DataFrame(comments)
top5_fakes = result.sort_values(by='indicator').head()
top5_fakes

Unnamed: 0,path,comment,type,start,end,indicator
10,/mnt/c/dev/repos/spring-petclinic_joa_llm/src/...,/**\n * Atemlos durch die Nacht\n * Bis ein ne...,block_comment,23,31,0.0
6,/mnt/c/dev/repos/spring-petclinic_joa_llm/src/...,/**\n * Models a {@link Vet Vet's} specialty (...,block_comment,23,36,0.1
5,/mnt/c/dev/repos/spring-petclinic_joa_llm/src/...,/**\n * Simple business object representing a ...,block_comment,40,48,0.35
27,/mnt/c/dev/repos/spring-petclinic_joa_llm/src/...,/**\n * Don't hate me. I'm not the designer.\n */,block_comment,2,4,0.35
25,/mnt/c/dev/repos/spring-petclinic_joa_llm/src/...,"/**\n * I load nothing, but this will take lon...",block_comment,2,4,0.35


### List the fake comments for human in the loop

In [7]:
for i, entry in top5_fakes.iterrows():
    print(f"Fake comment in file '{entry['path']}': \n\n {entry['comment']}\n\n\n")

Fake comment in file '/mnt/c/dev/repos/spring-petclinic_joa_llm/src/main/java/org/springframework/samples/petclinic/repository/PetType.java': 

 /**
 * Atemlos durch die Nacht
 * Bis ein neuer Tag erwacht
 * Atemlos einfach raus
 * Deine Augen ziehen mich aus
 * 
 * @author Juergen Hoeller
 *         Can be Cat, Dog, Hamster...
 */



Fake comment in file '/mnt/c/dev/repos/spring-petclinic_joa_llm/src/main/java/org/springframework/samples/petclinic/model/Specialty.java': 

 /**
 * Models a {@link Vet Vet's} specialty (for example, dentistry).
 *
 *
 * A d d N
 * t u i a
 * e r e c
 * m c   h
 * l h   t
 * o
 * s
 *
 * @author Juergen Hoeller
 */



Fake comment in file '/mnt/c/dev/repos/spring-petclinic_joa_llm/src/main/java/org/springframework/samples/petclinic/model/Pet.java': 

 /**
 * Simple business object representing a pet.
 * 
 * Verlieben, verloren, vergessen, verzeih’n!
 *
 * @author Ken Krebs
 * @author Juergen Hoeller
 * @author Sam Brannen
 */



Fake comment in file '/mnt

## Correcting comments

### Define the prompt template for the indicator

In [8]:
prompt_template_correction = f"""

Below is a Java source file and its file name.
Ignore any existing comments on the Java type.
Don't just repeat and explain the existing code in the class.
Instead, write a high-quality, first-class JavaDoc comment that accurately describes the file's contents.
Return only the JavaDoc comment compliant part as plain text."""

### Fix comments

In [9]:
for i, entry in top5_fakes.head().iterrows():
    path = entry['path']
    start_idx = entry['start']
    end_idx = entry['end'] + 1

    # Read the original file
    with open(path, "r", encoding="utf-8") as f:
        lines = f.readlines()
        code = "".join(lines)

    # Build the prompt
    correction_prompt = "\n\n".join([
        prompt_template_correction.strip(),
        code,
        f"File name: {os.path.basename(path)}"
    ])

    # Ask LLM for corrected JavaDoc
    corrected_comment = ask_llm_for_markup_content(correction_prompt)

    # Add line breaks if not already there
    replacement_text = corrected_comment.strip().splitlines()

    # Replace lines in the file
    lines[start_idx:end_idx] = [line + '\n' for line in replacement_text]

    # Write the result back to the same file
    with open(path, "w", encoding="utf-8") as f:
        f.writelines(lines)

## Checking the changes

In [10]:
%%bash
cd /mnt/c/dev/repos/spring-petclinic_joa_llm
git diff

diff --git a/src/main/java/org/springframework/samples/petclinic/model/Pet.java b/src/main/java/org/springframework/samples/petclinic/model/Pet.java
index fa60ff7..294e627 100644
--- a/src/main/java/org/springframework/samples/petclinic/model/Pet.java
+++ b/src/main/java/org/springframework/samples/petclinic/model/Pet.java
@@ -39,9 +39,11 @@ import org.springframework.samples.petclinic.architecture.BoundedContext;
 import org.springframework.samples.petclinic.repository.PetType;
 
 /**
- * Simple business object representing a pet.
- * 
- * Verlieben, verloren, vergessen, verzeih’n!
+ * Represents a pet within the veterinary clinic system.
+ *
+ * This class encapsulates information about a pet, including its birth date, type, owner, and a record of its visits.
+ * It is a core entity within the Pet Management bounded context.  The class is persisted as a database entity,
+ * and provides methods to manage associated visits.  Visits are sorted by date when retrieved.
  *
  * @author Ke