# Fake comment detector using Gemma 3

A few years ago, I had a dispute with an external developer who flat-out refused to write any documentation in the code. No matter how often I brought it up, he insisted that documentation was a waste of time and that “clean code should be self-explanatory.”

At some point, the discussion escalated. In a mixture of defiance and sarcasm, he said that if we ever imposed a rule requiring code comments, he would just paste lyrics from Helene Fischer songs into the comments &ndash; because, in his words, “you wouldn't be able to check anyway.”

Well… now I can.

I've created the following analysis using a local Large Language Model to detect all kind of fake comments.

*Please, don't take the original story and this analysis too dead serious ;-) .*

## Create file list

In [1]:
import glob
import os

root_dir = "/mnt/c/dev/repos/spring-petclinic_joa_llm"

java_files = glob.glob(os.path.join(root_dir, "**/*.java"), recursive=True)
java_files[:5]

['/mnt/c/dev/repos/spring-petclinic_joa_llm/src/main/java/org/springframework/samples/petclinic/PetclinicInitializer.java',
 '/mnt/c/dev/repos/spring-petclinic_joa_llm/src/main/java/org/springframework/samples/petclinic/architecture/ActualSubdomain.java',
 '/mnt/c/dev/repos/spring-petclinic_joa_llm/src/main/java/org/springframework/samples/petclinic/architecture/ActualTechnicalAspect.java',
 '/mnt/c/dev/repos/spring-petclinic_joa_llm/src/main/java/org/springframework/samples/petclinic/architecture/BoundedContext.java',
 '/mnt/c/dev/repos/spring-petclinic_joa_llm/src/main/java/org/springframework/samples/petclinic/architecture/MixedBoundedContexts.java']

In [2]:
## Find all comments with Tree-sitter

In [3]:
import tree_sitter_java as tsjava
from tree_sitter import Language, Parser

comments = []

parser = Parser(Language(tsjava.language()))

def is_comment(node):
    # possibilities: block_comment ("/* ..."), line_comment ("// ...")
    return node.type == "block_comment"
    
def is_license_header(node):
    # license headers are comments that are beginning at the first line of a file
    return node.start_point.row == 0
    
def walk(node):

        if is_comment(node) and not is_license_header(node):
            text = source_code[node.start_byte:node.end_byte]
            
            comment_entry = {}
            comment_entry['path'] = path
            comment_entry['comment'] = text
            comment_entry['type'] = node.type
            comments.append(comment_entry)
            
        for child in node.children:
            walk(child)


for path in java_files:
    with open(path, "r", encoding="utf-8") as f:
        source_code = f.read()
    tree = parser.parse(bytes(source_code, "utf8"))
    root_node = tree.root_node   

    walk(root_node)

comments[:5]

[{'path': '/mnt/c/dev/repos/spring-petclinic_joa_llm/src/main/java/org/springframework/samples/petclinic/PetclinicInitializer.java',
  'comment': '/**\n * In Servlet 3.0+ environments, this class replaces the traditional {@code web.xml}-based approach in order to configure the\n * {@link ServletContext} programmatically.\n * <p/>\n * Create the Spring "<strong>root</strong>" application context.<br/>\n * Register a {@link DispatcherServlet} and a {@link DandelionServlet} in the servlet context.<br/>\n * For both servlets, register a {@link CharacterEncodingFilter}, a {@link DandelionFilter} an a {@link DatatablesFilter}.\n * <p/>\n *\n * @author Antoine Rey\n */',
  'type': 'block_comment'},
 {'path': '/mnt/c/dev/repos/spring-petclinic_joa_llm/src/main/java/org/springframework/samples/petclinic/PetclinicInitializer.java',
  'comment': '/**\n     * Spring profile used to choose the persistence layer implementation.\n     * <p/>\n     * When using Spring jpa, use: jpa\n     * When using 

## Set up and configure local LLM

In [4]:
from openai import OpenAI

base_url = os.environ.get("OLLAMA_BASE_URL", "http://localhost:11434") + "/v1"

client = OpenAI(base_url=base_url)

def ask_llm(prompt):
    response = client.completions.create(
      model="gemma3:27b",
      prompt=prompt
    )

    return float(response.choices[0].text)

## Define the prompt

In [5]:
prompt = f"""

Tell me, using an indicator from 0.0 to 1.0, whether the comment is a usable Java comment, a bad comment, or a fake comment.

- A usable comment provides a clear and meaningful description of what the code is supposed to do. The indicator should be close to 1.0.
- A bad comment gives only a vague or incomplete explanation of the code, or contains nothing more than author tags (e.g., @author). The indicator should be around 0.5 +/- 0.2.
- A fake comment contains text that has nothing to do with the code at all. The indicator should be close to 0.0.

Be aware that some developers might get very creative when it comes to hiding fake comments.

Return only the numeric value. Default to -1 if you are unable to evaluate the comment."""

## Call the LLM

In [6]:
from tqdm import tqdm

for entry in tqdm(comments):
    indicator = ask_llm("\n\n".join([prompt, entry['comment'], entry['path']]))
    entry['indicator'] = indicator

100%|█████████████████████████████████████████████████████████████████████████████████| 105/105 [00:47<00:00,  2.19it/s]


## Show top 5 findings

In [7]:
import pandas as pd

result = pd.DataFrame(comments)
top5_fakes = result.sort_values(by='indicator').head()
top5_fakes

Unnamed: 0,path,comment,type,indicator
36,/mnt/c/dev/repos/spring-petclinic_joa_llm/src/...,/**\n * Atemlos durch die Nacht\n * Bis ein ne...,block_comment,0.1
14,/mnt/c/dev/repos/spring-petclinic_joa_llm/src/...,/**\n * Models a {@link Vet Vet's} specialty (...,block_comment,0.2
13,/mnt/c/dev/repos/spring-petclinic_joa_llm/src/...,/**\n * Simple business object representing a ...,block_comment,0.3
67,/mnt/c/dev/repos/spring-petclinic_joa_llm/src/...,"/**\n * I load nothing, but this will take lon...",block_comment,0.3
70,/mnt/c/dev/repos/spring-petclinic_joa_llm/src/...,/**\n * Don't hate me. I'm not the designer.\n */,block_comment,0.3


## List the fake comments

In [8]:
for i, entry in top5_fakes.iterrows():
    print(f"Fake comment in file '{entry['path']}': \n\n {entry['comment']}\n\n\n")

Fake comment in file '/mnt/c/dev/repos/spring-petclinic_joa_llm/src/main/java/org/springframework/samples/petclinic/repository/PetType.java': 

 /**
 * Atemlos durch die Nacht
 * Bis ein neuer Tag erwacht
 * Atemlos einfach raus
 * Deine Augen ziehen mich aus
 * 
 * @author Juergen Hoeller
 *         Can be Cat, Dog, Hamster...
 */



Fake comment in file '/mnt/c/dev/repos/spring-petclinic_joa_llm/src/main/java/org/springframework/samples/petclinic/model/Specialty.java': 

 /**
 * Models a {@link Vet Vet's} specialty (for example, dentistry).
 *
 *
 * A d d N
 * t u i a
 * e r e c
 * m c   h
 * l h   t
 * o
 * s
 *
 * @author Juergen Hoeller
 */



Fake comment in file '/mnt/c/dev/repos/spring-petclinic_joa_llm/src/main/java/org/springframework/samples/petclinic/model/Pet.java': 

 /**
 * Simple business object representing a pet.
 * 
 * Verlieben, verloren, vergessen, verzeih’n!
 *
 * @author Ken Krebs
 * @author Juergen Hoeller
 * @author Sam Brannen
 */
@



Fake comment in file '/m