In [None]:
# Automated Code Smell Detection Using Large Language Models

## Problem Statement

Code smells - patterns in source code that indicate potential design problems - are critical indicators of software quality issues. Traditionally, identifying code smells has relied on:

1. Manual code reviews (time-consuming and subjective)
2. Static analysis tools (limited to predefined patterns)
3. Software metrics (often producing false positives)

These approaches either require significant human effort or lack the contextual understanding needed to identify subtle design issues.

## How Generative AI Solves This Problem

This notebook demonstrates how Large Language Models (LLMs) can transform code smell detection by:

- **Contextual understanding**: Analyzing code with deep semantic understanding rather than just pattern matching
- **Knowledge integration**: Leveraging structured knowledge about different code smell types and refactoring solutions
- **Natural language reasoning**: Providing detailed explanations and actionable recommendations in human-readable format

## What This Notebook Demonstrates

This end-to-end implementation shows how to:

1. Build a knowledge base of code smells using vector embeddings and DeepLake
2. Create a retrieval-augmented generation (RAG) system to provide context about code smells
3. Analyze Java code files to detect multiple types of code smells
4. Generate structured output with smell locations, severity, and refactoring suggestions
5. Evaluate detection quality against manually labeled data

The approach provides a practical showcase of how generative AI can be applied to enhance software development practices through automated code quality assessment.

**Now, let's begin.**


*install all needed dependencies:*

In [30]:
%pip install -q -U "langchain-google-genai" "deeplake" "langchain" "langchain-text-splitters" "langchain-community" "tiktoken" "google-ai-generativelanguage==0.6.15" "deeplake[enterprise]<4.0.0"
%pip install pillow lz4 python-dotenv sonar-tools mysql-connector-python pandas

[0m
[1m[[0m[34;49mnotice[0m[1;39;49m][0m[39;49m A new release of pip is available: [0m[31;49m25.1[0m[39;49m -> [0m[32;49m25.1.1[0m
[1m[[0m[34;49mnotice[0m[1;39;49m][0m[39;49m To update, run: [0m[32;49mpip install --upgrade pip[0m
Note: you may need to restart the kernel to use updated packages.
Collecting argparse (from sonar-tools)
  Using cached argparse-1.4.0-py2.py3-none-any.whl.metadata (2.8 kB)
Using cached argparse-1.4.0-py2.py3-none-any.whl (23 kB)
Installing collected packages: argparse
Successfully installed argparse-1.4.0

[1m[[0m[34;49mnotice[0m[1;39;49m][0m[39;49m A new release of pip is available: [0m[31;49m25.1[0m[39;49m -> [0m[32;49m25.1.1[0m
[1m[[0m[34;49mnotice[0m[1;39;49m][0m[39;49m To update, run: [0m[32;49mpip install --upgrade pip[0m
Note: you may need to restart the kernel to use updated packages.


now initialize deep lake vectore store and store structured info about code smells in it

we will use 2 repos as a data sources:

* pixel-dungeon - as an example of software that contains numerous code smells.
* smells - a comprehensive classification of code smells. we will use structured information about code smells from this repository in our DeepLake database.

In [31]:
!git clone https://github.com/watabou/pixel-dungeon.git
!git clone https://github.com/Luzkan/smells.git
!git clone https://github.com/alibaba/arthas.git
!ls
!cd ./arthas && git checkout 4fc682265ce9e8db0101f978d32b142af6751493


fatal: destination path 'pixel-dungeon' already exists and is not an empty directory.
fatal: destination path 'smells' already exists and is not an empty directory.
fatal: destination path 'arthas' already exists and is not an empty directory.
[34marthas[m[m             import pytest.json [34msmells[m[m
experiment1.ipynb  [34mpixel-dungeon[m[m
HEAD is now at 4fc68226 classloader command support jdk.internal.loader.ClassLoaders$AppClassLoade. #2350


get all files with structured info about code smells

In [32]:
from glob import glob
from IPython.display import Markdown, display

from langchain.vectorstores import DeepLake
from langchain.document_loaders import TextLoader
from langchain_text_splitters import (
    Language,
    RecursiveCharacterTextSplitter,
)
from langchain_google_genai import ChatGoogleGenerativeAI, GoogleGenerativeAIEmbeddings
from langchain.chains import RetrievalQA
import deeplake

Set up your API key

To run the following cell, your API key must be stored it in a Kaggle secret named GOOGLE_API_KEY.

If you don't already have an API key, you can grab one from AI Studio. You can find detailed instructions in the docs.

To make the key available through Kaggle secrets, choose Secrets from the Add-ons menu and follow the instructions to add your key or enable it for this notebook.

In [33]:
import os
from dotenv import load_dotenv
load_dotenv()

api_key = os.environ["GOOGLE_API_KEY"]

In [34]:
#smells_match = "smells/content/smells/**/*.md"
smell_files = [
    "smells/content/smells/duplicated-code.md",
    "smells/content/smells/long-method.md", 
    "smells/content/smells/large-class.md",
    "smells/content/smells/long-parameter-list.md"
]

all_smells = []
for file_pattern in smell_files:
    matches = glob(file_pattern)
    all_smells.extend(matches)

print(f"Smells implemented in Sonarqube are represented by {len(all_smells)} markdown files:")
for file in all_smells:
    print(f"  {file}")

Smells implemented in Sonarqube are represented by 4 markdown files:
  smells/content/smells/duplicated-code.md
  smells/content/smells/long-method.md
  smells/content/smells/large-class.md
  smells/content/smells/long-parameter-list.md


Each file with a matching path will be loaded and split by RecursiveCharacterTextSplitter. only Markdown files with structured content will be processed

In [35]:
# common seperators used for Python files
RecursiveCharacterTextSplitter.get_separators_for_language(Language.MARKDOWN)

['\n#{1,6} ',
 '```\n',
 '\n\\*\\*\\*+\n',
 '\n---+\n',
 '\n___+\n',
 '\n\n',
 '\n',
 ' ',
 '']

In [36]:
%pip install python-frontmatter
%pip install -qU pypdf


[1m[[0m[34;49mnotice[0m[1;39;49m][0m[39;49m A new release of pip is available: [0m[31;49m25.1[0m[39;49m -> [0m[32;49m25.1.1[0m
[1m[[0m[34;49mnotice[0m[1;39;49m][0m[39;49m To update, run: [0m[32;49mpip install --upgrade pip[0m
Note: you may need to restart the kernel to use updated packages.

[1m[[0m[34;49mnotice[0m[1;39;49m][0m[39;49m A new release of pip is available: [0m[31;49m25.1[0m[39;49m -> [0m[32;49m25.1.1[0m
[1m[[0m[34;49mnotice[0m[1;39;49m][0m[39;49m To update, run: [0m[32;49mpip install --upgrade pip[0m
Note: you may need to restart the kernel to use updated packages.


In [37]:
from langchain_text_splitters import MarkdownHeaderTextSplitter, RecursiveCharacterTextSplitter
from langchain.document_loaders import TextLoader
from langchain.schema import Document
import os

# Define headers that match your code smell documentation structure
headers_to_split_on = [
    ("#", "Title"),
    ("##", "Section"),
    ("###", "Subsection")
]

docs = []
for file in all_smells:
    try:
        # First load the file as text
        with open(file, 'r', encoding='utf-8') as f:
            content = f.read()
        
        # Extract the main content (everything after the frontmatter)
        if '---' in content:
            # Find the second occurrence of '---' which ends the frontmatter
            parts = content.split('---', 2)
            if len(parts) >= 3:
                markdown_content = '---' + parts[2]  # Keep the separator for proper markdown parsing
            else:
                markdown_content = content
        else:
            markdown_content = content
            
        # Split by markdown headers
        markdown_splitter = MarkdownHeaderTextSplitter(headers_to_split_on=headers_to_split_on)
        header_splits = markdown_splitter.split_text(markdown_content)
        
        # Add source file info to metadata
        for doc in header_splits:
            doc.metadata['source'] = file
        
        # Add to collection
        docs.extend(header_splits)
        
    except Exception as e:
        print(f"Error processing {file}: {e}")

print(f"Created {len(docs)} chunks from {len(all_smells)} files")

Created 19 chunks from 4 files


In [38]:
# define path to database
dataset_path = 'mem://deeplake/smells'

In [39]:
# define the embedding model
embeddings = GoogleGenerativeAIEmbeddings(model="models/text-embedding-004")

In [40]:
smell_db = DeepLake.from_documents(docs, embeddings, dataset_path=dataset_path)

Creating 19 embeddings in 1 batches of size 19:: 100%|██████████| 1/1 [00:00<00:00,  1.59it/s]

Dataset(path='mem://deeplake/smells', tensors=['text', 'metadata', 'embedding', 'id'])

  tensor      htype      shape     dtype  compression
  -------    -------    -------   -------  ------- 
   text       text      (19, 1)     str     None   
 metadata     json      (19, 1)     str     None   
 embedding  embedding  (19, 768)  float32   None   
    id        text      (19, 1)     str     None   





In [41]:
retriever = smell_db.as_retriever()
retriever.search_kwargs['distance_metric'] = 'cos'
retriever.search_kwargs['k'] = 20 # number of documents to return

In [42]:
# define the chat model
llm = ChatGoogleGenerativeAI(model = "gemini-2.0-flash")

In [43]:
qa = RetrievalQA.from_llm(llm, retriever=retriever)

In [44]:
# a helper function for calling retrival chain
def call_qa_chain(prompt):
  response = qa.invoke(prompt)
  display(Markdown(response["result"]))

In [45]:
call_qa_chain("List a recommendations on how to get rid of 'God class' smell, rating from more simple to more sophisticated strategies.")

Here are some refactoring strategies to address the 'Large Class' code smell, ranging from simpler to more complex:

1.  **Extract Class**:
    *   This involves identifying related responsibilities within the large class and moving them into a new, separate class. This reduces the original class's scope and improves cohesion.
2.  **Extract Subclass**:
    *   If the large class has behaviors that vary based on certain states or types, creating subclasses can help organize and isolate these variations.
3.  **Extract Interface**:
    *   When a class implements multiple interfaces, extracting interfaces can help to decouple the class from its dependencies and make it more flexible.
4.  **Extract Domain Object**:
    *   If the class is dealing with complex data structures or business logic, extracting domain objects can encapsulate this logic and improve the overall design.
5.  **Replace Data Value with object**:
    *   If a class has a field that is only used by a subset of methods, it may be better to create a new class to hold the field and those methods.

In [46]:
import os
import enum
from glob import glob
from pathlib import Path

# Get all markdown files in the content/smells directory
smell_files = all_smells

# Create enum class dynamically
def create_code_smell_enum():
    # Process each filename to create enum-friendly names
    enum_entries = {}
    
    for file_path in smell_files:
        # Extract filename without extension
        filename = Path(file_path).stem
        
        # Convert kebab-case to UPPER_SNAKE_CASE for enum names
        enum_name = filename.replace('-', '_').upper()
        
        # Use the original filename (without extension) as the value
        enum_entries[enum_name] = filename
    
    # Create and return the Enum class
    return enum.Enum('CodeSmell', enum_entries)

# Create the enum
CodeSmell = create_code_smell_enum()

# Display the enum members
print(f"Created enum with {len(CodeSmell)} code smells:")
for smell in CodeSmell:
    print(f"{smell.name} = '{smell.value}'")

# Example usage
print("\nExample usage:")
print(f"CodeSmell.LONG_METHOD = '{CodeSmell.LONG_METHOD.value}'")
print(f"CodeSmell.LARGE_CLASS = '{CodeSmell.LARGE_CLASS.value}'")

# You can also look up an enum by value
def get_smell_by_name(name):
    for smell in CodeSmell:
        if smell.value == name:
            return smell
    return None

print("\nLooking up by name:")
print(f"get_smell_by_name('long-method') = {get_smell_by_name('long-method')}")

Created enum with 4 code smells:
DUPLICATED_CODE = 'duplicated-code'
LONG_METHOD = 'long-method'
LARGE_CLASS = 'large-class'
LONG_PARAMETER_LIST = 'long-parameter-list'

Example usage:
CodeSmell.LONG_METHOD = 'long-method'
CodeSmell.LARGE_CLASS = 'large-class'

Looking up by name:
get_smell_by_name('long-method') = CodeSmell.LONG_METHOD


In [47]:
%pip install langchain_community langchain-text-splitters pypdf


[1m[[0m[34;49mnotice[0m[1;39;49m][0m[39;49m A new release of pip is available: [0m[31;49m25.1[0m[39;49m -> [0m[32;49m25.1.1[0m
[1m[[0m[34;49mnotice[0m[1;39;49m][0m[39;49m To update, run: [0m[32;49mpip install --upgrade pip[0m
Note: you may need to restart the kernel to use updated packages.


In [48]:
from langchain.output_parsers import PydanticOutputParser
from langchain.prompts import PromptTemplate
from pydantic import BaseModel, Field
from typing import List, Optional, Literal
import enum

# Define your structured output schema using Pydantic
class CodeSmellSeverity(str, enum.Enum):
    HIGH = "HIGH"
    MEDIUM = "MEDIUM" 
    LOW = "LOW"

class CodeSmellDetection(BaseModel):
    smell_type: str = Field(description="The type of code smell detected")
    location: str = Field(description="Where in the code the smell was found")
    severity: CodeSmellSeverity = Field(description="How severe the smell is")
    description: str = Field(description="Brief explanation of the issue")
    refactoring_suggestion: str = Field(description="How to fix the code smell")
    code_example: Optional[str] = Field(None, description="Example code showing the fix")

class CodeAnalysisResult(BaseModel):
    analysis_summary: str = Field(description="Overall summary of code quality")
    smells_detected: List[CodeSmellDetection] = Field(description="List of detected code smells")

# Create a parser for the structured output
parser = PydanticOutputParser(pydantic_object=CodeAnalysisResult)

# Create a prompt template that includes formatting instructions
code_analysis_prompt = PromptTemplate(
    template="""
You are an expert code analyst. Analyze the following code for code smells:

```java
{code}
{format_instructions}

Only identify code smells from this list: {valid_smells} """, input_variables=["code", "valid_smells"], partial_variables={"format_instructions": parser.get_format_instructions()} )

In [49]:
def analyze_code_with_structure(code_content):
    # Get list of valid smells for reference
    valid_smells = ", ".join([smell.name for smell in CodeSmell])
    
    # Create a prompt that sends the code directly to the QA system
    analysis_prompt = f"""
    Analyze the following code for code smells:
    
    ```java
    {code_content}
    ```
    
    What code smells can you identify in this code and why? Only consider these code smell types: {valid_smells}.
    
    For each smell found, provide:
    1. The exact smell type (from the list provided)
    2. Location in the code (file, line numbers, method names)
    3. Severity (HIGH, MEDIUM, LOW)
    4. Description of why this is a code smell
    5. Refactoring suggestion to fix the issue
    6. Optional: Example code showing the fix
    
    Format your response as a structured JSON object matching this schema:
    {parser.get_format_instructions()}
    """
    
    # Use the QA system which leverages the smell knowledge base
    try:
        qa_result = qa.invoke(analysis_prompt)
        
        # Try to extract and parse JSON from the result
        output_text = qa_result["result"]
        parsed_output = parser.parse(output_text)
        return parsed_output
    except Exception as e:
        print(f"Error in code smell analysis: {e}")
        print(f"Raw QA output: {qa_result['result'] if 'qa_result' in locals() else 'No output'}")
        
        # Fallback to direct LLM call if QA system parsing fails
        try:
            direct_output = llm.invoke(analysis_prompt)
            parsed_output = parser.parse(direct_output.content)
            return parsed_output
        except Exception as e2:
            print(f"Fallback also failed: {e2}")
            return None

In [50]:
def display_code_analysis(file_path): # Read the file content 
    with open(file_path, 'r') as f: code_content = f.read()
        # Analyze the code
    analysis = analyze_code_with_structure(code_content)

    if not analysis:
        print("Failed to analyze code.")
        return

    # Display results
    print(f"Analysis Summary: {analysis.analysis_summary}\n")
    print(f"Detected {len(analysis.smells_detected)} code smells:")

    for i, smell in enumerate(analysis.smells_detected, 1):
        print(f"\n{i}. {smell.smell_type} ({smell.severity})")
        print(f"   Location: {smell.location}")
        print(f"   Description: {smell.description}")
        print(f"   Refactoring: {smell.refactoring_suggestion}")
        if smell.code_example:
            print(f"\n   Example fix:\n   ```\n{smell.code_example}\n   ```")

In [51]:
# Analyze a Java file from the pixel-dungeon repository
java_file = "pixel-dungeon/src/com/watabou/pixeldungeon/levels/HallsLevel.java"
display_code_analysis(java_file)

Analysis Summary: The code exhibits some code smells related to method length and potential for large class refactoring. Specifically, the `decorate` method and the nested `Stream` class's `update` method could be candidates for refactoring. There's also duplicated code in `tileName` and `tileDesc`.

Detected 3 code smells:

1. LONG_METHOD (CodeSmellSeverity.MEDIUM)
   Location: HallsLevel.java, decorate() method, lines 51-73
   Description: The `decorate` method performs multiple operations: decorating empty tiles, decorating wall tiles, and placing a sign. This violates the single responsibility principle and makes the method harder to understand and maintain.
   Refactoring: Extract each decoration logic (empty tile decoration, wall decoration, sign placement) into separate methods. This will make the `decorate` method shorter and more focused.

2. DUPLICATED_CODE (CodeSmellSeverity.LOW)
   Location: HallsLevel.java, tileName() and tileDesc() methods, lines 76-95 and 98-114
   Descr

In [52]:
from langchain.output_parsers import PydanticOutputParser
from langchain.prompts import PromptTemplate
from langchain.schema import Document
from pydantic import BaseModel, Field
from typing import List, Optional, Dict, Any
import enum
import json
from IPython.display import Markdown, display

# 1. Define evaluation schema
class EvaluationScore(str, enum.Enum):
    EXCELLENT = "EXCELLENT"
    GOOD = "GOOD"
    ACCEPTABLE = "ACCEPTABLE"
    POOR = "POOR"
    INCORRECT = "INCORRECT"

class SmellEvaluation(BaseModel):
    detected_smell: str = Field(description="The detected code smell type")
    location: str = Field(description="Where the smell was detected")
    ground_truth_match: Optional[str] = Field(None, description="The matching ground truth smell if any")
    score: EvaluationScore = Field(description="Evaluation score for this detection")
    justification: str = Field(description="Explanation for the rating")

class CodeSmellEvaluationResult(BaseModel):
    overall_score: float = Field(description="Overall evaluation score out of 5")
    precision: float = Field(description="Ratio of correctly identified smells to all detections")
    recall: float = Field(description="Ratio of correctly identified smells to all actual smells")
    evaluations: List[SmellEvaluation] = Field(description="Individual smell evaluations")
    summary: str = Field(description="Summary of evaluation results")

# 2. Create the evaluation prompt template
eval_template = """
# Instruction
You are an expert evaluator specializing in code smell detection. Your task is to evaluate the quality of code smell detections by comparing them with ground truth data.

# Evaluation
## Metric Definition
You will be assessing code smell detection quality, which measures how accurately the system identifies:
1. The correct type of code smell
2. The correct location of the smell (file, line numbers, method)

## Criteria
Smell Type Accuracy: The detected smell type matches the actual smell type in the code.
Location Accuracy: The location identified for the smell (line numbers, method, class) matches where the smell actually exists.
Justification Quality: The explanation provided for the smell makes sense and correctly describes the issue.
Refactoring Relevance: The suggested refactoring is appropriate for the identified smell.

## Rating Rubric
EXCELLENT: Perfect match of smell type and exact location (score: 5).
GOOD: Correct smell type with minor location imprecision (score: 4).
ACCEPTABLE: Partial match (either correct smell type or approximate location) (score: 3).
POOR: Wrong smell type but area of concern correctly identified (score: 2).
INCORRECT: Completely incorrect detection (wrong smell type and location) (score: 1).

## Evaluation Steps
STEP 1: For each detected smell, find any matching ground truth smells.
STEP 2: Evaluate the accuracy of the detected smell type.
STEP 3: Evaluate the precision of the location identified.
STEP 4: Assign a score based on the rating rubric.
STEP 5: Calculate overall precision and recall metrics.

# Input Data
## Ground Truth Smells
{ground_truth}

## Detected Smells
{detected_smells}

{format_instructions}
"""

# 3. Create the evaluation parser and prompt
eval_parser = PydanticOutputParser(pydantic_object=CodeSmellEvaluationResult)
eval_prompt = PromptTemplate(
    template=eval_template,
    input_variables=["ground_truth", "detected_smells"],
    partial_variables={"format_instructions": eval_parser.get_format_instructions()}
)

# 4. Create the evaluation function
def evaluate_smell_detection(ground_truth_data, detected_smells, llm):
    """
    Evaluate the quality of code smell detection by comparing with ground truth.
    
    Args:
        ground_truth_data: Dictionary with ground truth smells
        detected_smells: Dictionary with detected smells
        llm: LangChain LLM instance
    
    Returns:
        Evaluation results with scores and metrics
    """
    # Convert data to JSON strings
    ground_truth_str = json.dumps(ground_truth_data, indent=2)
    detected_str = json.dumps(detected_smells, indent=2)
    
    # Format the prompt
    formatted_prompt = eval_prompt.format(
        ground_truth=ground_truth_str,
        detected_smells=detected_str
    )
    
    # Get response from LLM
    try:
        output = llm.invoke(formatted_prompt)
        parsed_output = eval_parser.parse(output.content)
        return parsed_output
    except Exception as e:
        print(f"Error during evaluation: {e}")
        print(f"Raw output: {output.content if 'output' in locals() else 'No output'}")
        return None

# 5. Helper function to create ground truth dataset
def create_ground_truth(file_path, manual_annotations):
    """Create ground truth data structure"""
    return {
        "file_path": file_path,
        "smells": manual_annotations
    }

# 6. Main evaluation function
def evaluate_code_analysis(file_path, manual_annotations, llm):
    """Run a full evaluation on a code file"""
    # Get the code content
    with open(file_path, 'r') as f:
        code_content = f.read()
    
    # Run code smell detection
    analysis_result = analyze_code_with_structure(code_content)
    
    if not analysis_result:
        print("Failed to analyze code")
        return None
    
    # Format detected smells
    detected_smells = {
        "file_path": file_path,
        "smells": []
    }
    
    for smell in analysis_result.smells_detected:
        detected_smells["smells"].append({
            "smell_type": smell.smell_type,
            "location": smell.location,
            "description": smell.description,
            "severity": str(smell.severity),
            "refactoring": smell.refactoring_suggestion
        })
    
    # Create ground truth
    ground_truth = create_ground_truth(file_path, manual_annotations)
    
    # Run evaluation
    evaluation = evaluate_smell_detection(ground_truth, detected_smells, llm)
    
    # Display results
    if evaluation:
        print(f"Overall Score: {evaluation.overall_score:.2f}/5.0")
        print(f"Precision: {evaluation.precision:.2f}")
        print(f"Recall: {evaluation.recall:.2f}")
        print(f"\nSummary: {evaluation.summary}\n")
        
        print("Individual Evaluations:")
        for i, eval_item in enumerate(evaluation.evaluations, 1):
            print(f"\n{i}. {eval_item.detected_smell} ({eval_item.location})")
            print(f"   Score: {eval_item.score.value}")
            print(f"   Matched with: {eval_item.ground_truth_match if eval_item.ground_truth_match else 'No match'}")
            print(f"   Justification: {eval_item.justification}")
    
    return evaluation

In [53]:
# Example ground truth data for a file - here we can utilize data from static analyzers such as Sonarqube etc.
evaluation_files = {
    "pixel-dungeon/src/com/watabou/pixeldungeon/levels/HallsLevel.java": [
        {
            "smell_type": "MAGIC_NUMBER",
            "location": "multiple locations: lines 25-26, 37, 52-53, 85, 89",
            "severity": "MEDIUM"
        },
        {
            "smell_type": "LONG_METHOD",
            "location": "decorate() method, lines 74-101",
            "severity": "MEDIUM"
        },
        {
            "smell_type": "CONDITIONAL_COMPLEXITY",
            "location": "decorate() method, lines 76-89",
            "severity": "LOW"
        },
        {
            "smell_type": "DUPLICATED_CODE",
            "location": "tileName() and tileDesc() methods, lines 104-138",
            "severity": "LOW"
        },
        {
            "smell_type": "DEAD_CODE",
            "location": "map[i] == 63 condition in addVisuals method, line 151",
            "severity": "MEDIUM"
        },
        {
            "smell_type": "FEATURE_ENVY",
            "location": "addVisuals method, lines 149-153",
            "severity": "LOW"
        },
        {
            "smell_type": "PRIMITIVE_OBSESSION",
            "location": "throughout class, using boolean arrays and ints for terrain",
            "severity": "LOW"
        }
    ]
}

# Run evaluation on a single file
java_file = "pixel-dungeon/src/com/watabou/pixeldungeon/levels/HallsLevel.java"
evaluation = evaluate_code_analysis(java_file, evaluation_files, llm)

Overall Score: 3.00/5.0
Precision: 0.50
Recall: 0.14

Summary: The tool correctly identified one LONG_METHOD smell, but its location was slightly off. The second detection was a false positive (LARGE_CLASS). This results in a low recall and moderate precision.

Individual Evaluations:

1. LONG_METHOD (HallsLevel.java, decorate method, lines 40-58)
   Score: ACCEPTABLE
   Matched with: decorate() method, lines 74-101
   Justification: The smell type is correct. The location is imprecise, but the method is correctly identified. The line numbers are off, as the detected lines (40-58) do not align with the ground truth (74-101), but it points to the correct method.

2. LARGE_CLASS (HallsLevel.java, entire class)
   Score: INCORRECT
   Matched with: No match
   Justification: The detected smell is incorrect. There is no ground truth for LARGE_CLASS in the provided data. While the class might be doing a lot, it doesn't clearly violate the single responsibility principle to the point of being

In [54]:
# Run evaluations and collect results
results = {}
for file_path, annotations in evaluation_files.items():
    print(f"\n=== Evaluating {file_path} ===")
    eval_result = evaluate_code_analysis(file_path, annotations, llm)
    if eval_result:
        results[file_path] = eval_result

# Calculate overall metrics
if results:
    total_score = sum(result.overall_score for result in results.values())
    avg_score = total_score / len(results)
    avg_precision = sum(result.precision for result in results.values()) / len(results)
    avg_recall = sum(result.recall for result in results.values()) / len(results)

    print("\n=== OVERALL EVALUATION ===")
    print(f"Average Score: {avg_score:.2f}/5.0")
    print(f"Average Precision: {avg_precision:.2f}")
    print(f"Average Recall: {avg_recall:.2f}")


=== Evaluating pixel-dungeon/src/com/watabou/pixeldungeon/levels/HallsLevel.java ===
Overall Score: 3.00/5.0
Precision: 0.50
Recall: 0.14

Summary: The code smell detection identified one LONG_METHOD smell with approximate location accuracy and incorrectly identified a LARGE_CLASS smell. The overall precision is 0.5, and the recall is 0.14285714285714285.

Individual Evaluations:

1. LONG_METHOD (HallsLevel.java, decorate() method, lines 50-74)
   Score: ACCEPTABLE
   Matched with: LONG_METHOD
   Justification: The detected smell type is correct (LONG_METHOD), and the location is approximately correct. The ground truth location is lines 74-101, while the detected location is 50-74. There is overlap, but it is not a perfect match.

2. LARGE_CLASS (HallsLevel.java, entire class)
   Score: INCORRECT
   Matched with: No match
   Justification: There is no LARGE_CLASS smell in the ground truth for HallsLevel.java. While the class may have multiple responsibilities, it's not explicitly iden

In [None]:
%%bash
# Read the token securely from environment variable
sonar-scanner \
  -Dsonar.projectKey=arthas \
  -Dsonar.sources=. \
  -Dsonar.host.url=http://localhost:9000 \
  -Dsonar.token=${SONARQUBE_TOKEN} \
  -Dsonar.java.binaries=. \
  -Dsonar.language=java

In [67]:
import json
from typing import List, Dict, Optional

class CodeSmellLocation:
    def __init__(self, start_line: int, end_line: int, start_offset: int, end_offset: int):
        self.start_line = start_line
        self.end_line = end_line
        self.start_offset = start_offset
        self.end_offset = end_offset
    
    def __str__(self) -> str:
        return f"Lines {self.start_line}-{self.end_line}"
    
    @classmethod
    def from_text_range(cls, text_range: Dict):
        return cls(
            text_range.get("startLine", 0),
            text_range.get("endLine", 0),
            text_range.get("startOffset", 0),
            text_range.get("endOffset", 0)
        )

class Impact:
    def __init__(self, quality: str, severity: str):
        self.quality = quality  # e.g., "SECURITY", "MAINTAINABILITY"
        self.severity = severity  # e.g., "HIGH", "MEDIUM", "LOW"
    
    def __str__(self) -> str:
        return f"{self.quality} ({self.severity})"

class CodeSmell:
    def __init__(self, 
                 issue_key: str,
                 rule_key: str,
                 message: str,
                 component: str,
                 location: CodeSmellLocation,
                 impacts: List[Impact],
                 effort: Optional[str] = None,
                 rule_name: Optional[str] = None,
                 file_path: Optional[str] = None,
                 clean_code_attribute: Optional[str] = None,
                 author: Optional[str] = None):
        self.issue_key = issue_key
        self.rule_key = rule_key
        self.rule_name = rule_name
        self.message = message
        self.component = component
        self.file_path = file_path
        self.location = location
        self.impacts = impacts
        self.effort = effort
        self.clean_code_attribute = clean_code_attribute
        self.author = author
    
    def __str__(self) -> str:
        impacts_str = ", ".join(str(impact) for impact in self.impacts)
        return (f"Code Smell: {self.rule_name} ({self.rule_key})\n"
                f"Location: {self.location}, File: {self.file_path or self.component}\n"
                f"Message: {self.message}\n"
                f"Impacts: {impacts_str}\n"
                f"Effort: {self.effort or 'Unknown'}")

def parse_sonarqube_code_smells(json_data: Dict) -> List[CodeSmell]:
    """Parse SonarQube API response and extract code smell information"""
    
    # Create lookup dictionaries for rules and components
    rules_dict = {rule["key"]: rule for rule in json_data.get("rules", [])}
    components_dict = {comp["key"]: comp for comp in json_data.get("components", [])}
    
    code_smells = []
    
    for issue in json_data.get("issues", []):
        # Extract rule information
        rule_key = issue.get("rule")
        rule_info = rules_dict.get(rule_key, {})
        rule_name = rule_info.get("name")
        
        # Extract component information
        component_key = issue.get("component")
        component_info = components_dict.get(component_key, {})
        file_path = component_info.get("path") or component_info.get("longName")
        
        # Extract location
        location = CodeSmellLocation.from_text_range(issue.get("textRange", {
            "startLine": issue.get("line", 0),
            "endLine": issue.get("line", 0),
            "startOffset": 0,
            "endOffset": 0
        }))
        
        # Extract impacts
        impacts = [Impact(impact.get("softwareQuality", ""), impact.get("severity", ""))
                  for impact in issue.get("impacts", [])]
        
        # Create CodeSmell object
        code_smell = CodeSmell(
            issue_key=issue.get("key", ""),
            rule_key=rule_key,
            rule_name=rule_name,
            message=issue.get("message", ""),
            component=component_key,
            file_path=file_path,
            location=location,
            impacts=impacts,
            effort=issue.get("effort"),
            clean_code_attribute=issue.get("cleanCodeAttribute"),
            author=issue.get("author")
        )
        
        code_smells.append(code_smell)
    
    return code_smells

# Example usage:
def process_sonarqube_response(json_string: str):
    data = json.loads(json_string)
    code_smells = parse_sonarqube_code_smells(data)
    
    print(f"Found {len(code_smells)} code smells:")
    for i, smell in enumerate(code_smells, 1):
        print(f"\n{i}. {smell}")
    
    return code_smells

In [75]:
import requests
import json
from enum import Enum

# Create a mapping between code smells and SonarQube rule IDs
class SonarQubeSmellRule(Enum):
    LONG_PARAMETER_LIST = "java:S107"  # Methods should not have too many parameters
    LONG_METHOD = "java:S138"         # Methods should not have too many lines
    COMPLEX_METHOD = "java:S3776"     # Cognitive Complexity of methods should not be too high
    LARGE_CLASS = "java:S1448"        # Classes should not have too many lines
    DUPLICATED_CODE = "java:S1192"    # String literals should not be duplicated

# Helper function to get rule string from smell type(s)
def get_rule_string(smells):
    if isinstance(smells, SonarQubeSmellRule):
        return smells.value
    elif isinstance(smells, list):
        return ", ".join([smell.value for smell in smells])
    else:
        raise ValueError("Smells must be a SonarQubeSmellRule or list of SonarQubeSmellRule")

# Configuration
sonar_url = "http://localhost:9000"  # SonarQube server URL
sonar_user_token = "squ_5f531b81a0c5bc427a57c61d10c26c69a58b67d3"
project_key = "arthas"

# API endpoint for issues search
url = f"{sonar_url}/api/issues/search"

# Choose which smell to look for
target_smell = SonarQubeSmellRule.LONG_PARAMETER_LIST
# Or use multiple smells: target_smells = [SonarQubeSmellRule.LONG_PARAMETER_LIST, SonarQubeSmellRule.LONG_METHOD]

# Parameters to get only code smells
params = {
    "componentKeys": project_key,
    "types": "CODE_SMELL",
    "rules": get_rule_string(target_smell),
    "ps": 50  # page size (max 500)
}

# Authentication with token
auth = (sonar_user_token, "")

# Make the API request
response = requests.get(url, params=params, auth=auth)
code_smells = response.json()
print(json.dumps(code_smells, indent=2))

# Process results
if response.status_code == 200:
    print("Successfully retrieved code smells from SonarQube.")
    # Handle pagination if needed (total > 500)
    total = process_sonarqube_response(json.dumps(code_smells, indent=2))
else:
    print(f"Error retrieving code smells: {response.status_code}")
    print(response.text)


{
  "total": 1,
  "p": 1,
  "ps": 50,
  "paging": {
    "pageIndex": 1,
    "pageSize": 50,
    "total": 1
  },
  "effortTotal": 20,
  "issues": [
    {
      "key": "53ef6fab-71bd-4e51-a221-1f1b25e23003",
      "rule": "java:S107",
      "severity": "MAJOR",
      "component": "arthas:core/src/main/java/com/taobao/arthas/core/advisor/Advice.java",
      "project": "arthas",
      "line": 71,
      "hash": "64e96bf261dd6801924266f9bdc8278a",
      "textRange": {
        "startLine": 71,
        "endLine": 71,
        "startOffset": 12,
        "endOffset": 18
      },
      "flows": [],
      "status": "OPEN",
      "message": "Constructor has 8 parameters, which is greater than 7 authorized.",
      "effort": "20min",
      "debt": "20min",
      "author": "hengyunabc@gmail.com",
      "tags": [
        "brain-overload"
      ],
      "creationDate": "2018-08-31T03:49:48+0000",
      "updateDate": "2025-05-15T22:28:08+0000",
      "type": "CODE_SMELL",
      "scope": "MAIN",
      "qu

In [73]:
import pandas as pd
import mysql.connector

# Connect to MySQL
conn = mysql.connector.connect(
    host="localhost",
    user="root",
    password="",
    database="dacos"
)

# Create cursor and run queries
cursor = conn.cursor()
cursor.execute("SELECT id, designite_id, has_smell, is_class, path_to_file, project_name, sample_constraints, smells FROM tagman5.sample WHERE smells IN (2) LIMIT 10")
rows = cursor.fetchall()

column_names = [column[0] for column in cursor.description]

df = pd.DataFrame(rows, columns=column_names)

print(f"Retrieved {len(df)} samples with smells")
display(df)

if len(df) > 0:
    print("\nAccessing first row:")
    print(f"File path: {df.iloc[0]['path_to_file']}")
    print(f"Project name: {df.iloc[0]['project_name']}")
    print(f"Has smell: {'Yes' if df.iloc[0]['has_smell'] else 'No'}")

conn.close()

Retrieved 10 samples with smells


Unnamed: 0,id,designite_id,has_smell,is_class,path_to_file,project_name,sample_constraints,smells
0,2386,538,1,0,/codesplit_java_method/Blankj_AndroidUtilCode/...,Blankj_AndroidUtilCode,5,2
1,2387,542,1,0,/codesplit_java_method/Blankj_AndroidUtilCode/...,Blankj_AndroidUtilCode,6,2
2,2406,532,1,0,/codesplit_java_method/Blankj_AndroidUtilCode/...,Blankj_AndroidUtilCode,9,2
3,2407,536,1,0,/codesplit_java_method/Blankj_AndroidUtilCode/...,Blankj_AndroidUtilCode,7,2
4,2494,552,1,0,/codesplit_java_method/Blankj_AndroidUtilCode/...,Blankj_AndroidUtilCode,3,2
5,2501,5129,1,0,/codesplit_java_method/Blankj_AndroidUtilCode/...,Blankj_AndroidUtilCode,3,2
6,2551,5487,1,0,/codesplit_java_method/Blankj_AndroidUtilCode/...,Blankj_AndroidUtilCode,3,2
7,2567,5480,1,0,/codesplit_java_method/Blankj_AndroidUtilCode/...,Blankj_AndroidUtilCode,3,2
8,2569,5479,1,0,/codesplit_java_method/Blankj_AndroidUtilCode/...,Blankj_AndroidUtilCode,3,2
9,2577,5441,1,0,/codesplit_java_method/Blankj_AndroidUtilCode/...,Blankj_AndroidUtilCode,3,2



Accessing first row:
File path: /codesplit_java_method/Blankj_AndroidUtilCode/Blankj_AndroidUtilCode/com.blankj.subutil.util/ContentProvider4SubUtil/query.code
Project name: Blankj_AndroidUtilCode
Has smell: Yes


In [74]:
import os
import pandas as pd
import mysql.connector

# Connect to MySQL
conn = mysql.connector.connect(
    host="localhost",
    user="root",
    password="",
    database="dacos"
)

# Create cursor and run queries
cursor = conn.cursor()
cursor.execute("SELECT id, designite_id, has_smell, is_class, path_to_file, project_name, sample_constraints, smells FROM tagman5.sample" \
" WHERE smells IN (2) AND project_name='alibaba_arthas' LIMIT 10")
rows = cursor.fetchall()

column_names = [column[0] for column in cursor.description]

df = pd.DataFrame(rows, columns=column_names)

print(f"Retrieved {len(df)} samples with smells")
display(df)

if len(df) > 0:
    print("\nAccessing first row:")
    file_path = df.iloc[0]['path_to_file']
    print(f"File path: {file_path}")
    print(f"Project name: {df.iloc[0]['project_name']}")
    print(f"Has smell: {'Yes' if df.iloc[0]['has_smell'] else 'No'}")
    
    # Construct new file path with .java extension
    dir_name, file_name = os.path.split(file_path)
    print(dir_name, file_name)
    base_name = os.path.splitext(file_name)[0]
    new_file_path = os.path.join(dir_name, base_name + ".java")
    
    print(f"\nNew file path (with .java): {new_file_path}")
    
    # Analyze the code using the function defined in previous cells
    try:
        # First check if the file exists
        if os.path.exists(new_file_path):
            # Use display_code_analysis which is already defined
            print(f"\nAnalyzing code in: {new_file_path}")
            display_code_analysis(new_file_path)
        else:
            print(f"\nFile not found: {new_file_path}")
            print("Please check if the path is correct.")
    except Exception as e:
        print(f"\nError analyzing file: {e}")
        
conn.close()

Retrieved 10 samples with smells


Unnamed: 0,id,designite_id,has_smell,is_class,path_to_file,project_name,sample_constraints,smells
0,44635,59787,1,0,/codesplit_java_method/alibaba_arthas/alibaba_...,alibaba_arthas,2,2
1,44787,63642,1,0,/codesplit_java_method/alibaba_arthas/alibaba_...,alibaba_arthas,2,2
2,44917,59836,1,0,/codesplit_java_method/alibaba_arthas/alibaba_...,alibaba_arthas,2,2
3,44951,59902,1,0,/codesplit_java_method/alibaba_arthas/alibaba_...,alibaba_arthas,2,2
4,44998,59960,1,0,/codesplit_java_method/alibaba_arthas/alibaba_...,alibaba_arthas,2,2
5,45029,59926,1,0,/codesplit_java_method/alibaba_arthas/alibaba_...,alibaba_arthas,2,2
6,45044,59981,1,0,/codesplit_java_method/alibaba_arthas/alibaba_...,alibaba_arthas,2,2
7,45061,59814,1,0,/codesplit_java_method/alibaba_arthas/alibaba_...,alibaba_arthas,2,2
8,45136,59195,1,0,/codesplit_java_method/alibaba_arthas/alibaba_...,alibaba_arthas,2,2
9,45155,59172,1,0,/codesplit_java_method/alibaba_arthas/alibaba_...,alibaba_arthas,2,2



Accessing first row:
File path: /codesplit_java_method/alibaba_arthas/alibaba_arthas/com.taobao.arthas.core.command.express/ArthasObjectPropertyAccessor/setPossibleProperty.code
Project name: alibaba_arthas
Has smell: Yes
/codesplit_java_method/alibaba_arthas/alibaba_arthas/com.taobao.arthas.core.command.express/ArthasObjectPropertyAccessor setPossibleProperty.code

New file path (with .java): /codesplit_java_method/alibaba_arthas/alibaba_arthas/com.taobao.arthas.core.command.express/ArthasObjectPropertyAccessor/setPossibleProperty.java

File not found: /codesplit_java_method/alibaba_arthas/alibaba_arthas/com.taobao.arthas.core.command.express/ArthasObjectPropertyAccessor/setPossibleProperty.java
Please check if the path is correct.
