# Evaluate localization strategies

This notebook does a comparative evaluation of different localization strategies.
- Defines a base interface for localization
- Implements a few localization strategies
- Defines an evaluator that runs a test suite on those localization strategies
- Evaluator dumps the results in a pandas dataframe
- Uses Milvus as the vector database
- Uses OpenAI's embeddings model
- Uses langchain's abstractions for processing

In [1]:
import os
import tempfile
import pandas as pd
import shutil
import yaml

from typing import Dict, List, Tuple, Iterator
from abc import ABC, abstractmethod
from langchain_core.documents import Document
from langchain_milvus import Milvus

from se_agent.localizer import localize_issue
from se_agent.project import Project
from se_agent.project_manager import ProjectManager

## Base interface for localization strategies

In [2]:
class Strategy(ABC):
    @abstractmethod
    def localize(self, issue: Dict[str, str], top_n: int) -> List[Tuple[str, str]]:
        """
        Localizes the issue to a set of relevant packages and files.

        Args:
            issue (Dict[str, str]): A dictionary containing issue details with at least:
                - `title` (str): The title of the issue.
                - `description` (str): The detailed description of the issue.
            top_n (int): The maximum number of localization results to return.

        Returns:
            List[Tuple[str, str]]: A list of tuples representing relevant localization results,
                each containing `package` (str) and `file` (str).
        """
        pass

## Semantic vector search strategy

This implements a simple semantic vector search strategy. It uses Milvus as the vector database and OpenAI's embeddings model. Implementation may be used as-is for multiple strategies by feeding in different types of sources. E.g.,
- **Code file embeddings**: Providing a `source_dir` pointing to code files will directly embed code
- **Code semantics embeddings**: Providing a `source_dir` pointing to semantic descriptions of code files will embed code semantics

In [3]:
class SemanticVectorSearchStrategy(Strategy):
    def __init__(self, source_dir: str, root_package_name: str, embeddings, strategy_name: str):
        self.strategy_name = strategy_name
        self.vector_store = self.create_vector_store(source_dir, root_package_name, embeddings)

    def create_vector_store(self, folder_path: str, root_package_name: str, embeddings) -> Milvus:
        """Creates a Milvus vector store from the files in the specified folder."""
        documents = self.create_documents(folder_path, root_package_name)
        with tempfile.NamedTemporaryFile(suffix='.db', delete=False) as tmp_file:
            uri = tmp_file.name
        return Milvus.from_documents(
            documents,
            embeddings,
            collection_name=root_package_name,
            connection_args={"uri": uri},
        )
    
    def create_documents(self, folder_path: str, root_package_name: str) -> List[Document]:
        """Create a list of Document instances from the files in the specified folder."""
        documents = []
        for root, _, files in os.walk(folder_path):
            for file in files:
                file_path = os.path.join(root, file)
                with open(file_path, "r") as f:
                    page_content = f.read()
                if not page_content.strip():
                    continue
                filename = file.split('.')[0]
                relative_path = os.path.relpath(root, folder_path)
                package = (f"{root_package_name}/{relative_path.replace(os.sep, '/')}"
                           if relative_path != "." else root_package_name)
                document = Document(
                    page_content=page_content,
                    metadata={"file": filename, "package": package}
                )
                documents.append(document)
        return documents

    def localize(self, issue: Dict[str, str], top_n: int) -> List[Tuple[str, str]]:
        query_string = f"{issue['title']}: {issue['description']}"
        results = self.vector_store.similarity_search(query_string, k=top_n)
        return [(res.metadata["package"], res.metadata["file"]) for res in results]

## Hierarchical localization strategy

Instead of semantic vector search, this strategy uses the completion API to generate localization results. This requires inlining the context. Using all the files in the repository as context, far-exceed the permitted token limits of the completion API. Therefore, it uses generated semantic summaries of the code files as context. However, for large repositories, and depending on the model used, this may still exceed the token limits. Therefore, it also generates higher-level summaries at the level of packages. Let us assume that the aggregated package summaries are within the token limits. The strategy operates as follows:

- **Package level**: Given an issue, it first identifies the package that are relevant to the issue query belongs to, using packages summaries in the inline context.
- **File level**: It then identifies the files within the package that are relevant to the issue query, using file summaries for the relevant packages in the inline context.

This strategy is more expensive than the semantic vector search strategy.

In [4]:
class HierarchicalLocalizationStrategy(Strategy):
    def __init__(self, project: Project, strategy_name: str = "Hierarchical Completion"):
        self.project = project
        self.strategy_name = strategy_name

    def localize(self, issue: Dict[str, str], top_n: int) -> List[Tuple[str, str]]:
        """
        Localizes an issue to specific files by first identifying relevant packages
        and then narrowing down to specific files in those packages.
        """
        # issue conversation
        issue_conversation = {
            "title": issue["title"],
            "conversation": [{'role': 'user', 'content': f'Issue: {issue["title"]}\n\nDescription: {issue["description"]}'}]
        }

        # Localize the issue using the hierarchical approach
        localization_suggestions = localize_issue(self.project, issue, issue_conversation)

        if localization_suggestions is None:
            return []  # If localization fails, return an empty list

        # Format the results as (package, file) tuples, sorted by confidence
        return [(suggestion.package, os.path.splitext(suggestion.file)[0]) for suggestion in localization_suggestions[:top_n]]

## Dataset

In [5]:
class Issue:
    def __init__(self, id: str, title: str, content: str, expected_results: List[str]):
        self.id = id
        self.title = title
        self.content = content
        self.expected_results = expected_results

    def to_dict(self) -> Dict[str, str]:
        """Returns the issue data as a dictionary for easy access."""
        return {"title": self.title, "description": self.content}

class Dataset:
    def __init__(self, yaml_path: str):
        self.yaml_dir = os.path.dirname(yaml_path)  # Get the directory containing the YAML file
        with open(yaml_path, 'r') as f:
            data = yaml.safe_load(f)
        self.test_cases = data["test_cases"]

    def __iter__(self) -> Iterator[Issue]:
        """Allows iteration over Issue instances created from test cases."""
        for case in self.test_cases:
            # Construct the full path to the markdown file
            full_path = os.path.join(self.yaml_dir, case["filepath"])
            # Load the content from the markdown file
            with open(full_path, 'r') as f:
                content = f.read()
            # Create an Issue instance for each test case
            yield Issue(
                id=case["id"],
                title=case["title"],
                content=content,
                expected_results=case["expected_results"]
            )

    def __len__(self) -> int:
        """Returns the number of test cases in the dataset."""
        return len(self.test_cases)

dataset = Dataset("test/dataset.yaml")

## Evaluator

In [6]:
class LocalizationEvaluator:
    def __init__(self, dataset: Dataset, strategies_to_evaluate: List[Strategy]):
        self.dataset = dataset
        self.strategies = strategies_to_evaluate

    def calculate_score(self, expected_results: List[str], actual_results: List[str]) -> float:
        """Calculates the score with distance-based penalties for expected results outside the top-k."""
        score = 1.0  # Start with a perfect score of 1

        for expected in expected_results:
            if expected in actual_results:
                index = actual_results.index(expected)
                # Check if expected item is within the top-k
                if index >= len(expected_results):
                    # Distance-based partial penalty if it's outside top-k but present in results
                    distance_factor = index - len(expected_results) + 1
                    penalty = (1 / len(expected_results)) * distance_factor * 0.2
                    score -= penalty
            else:
                # Full penalty if expected item is missing altogether
                score -= 1 / len(expected_results)

        return max(score, 0)  # Ensure score doesn't go below 0

    def evaluate(self) -> pd.DataFrame:
        """Evaluates each strategy on all test issues and returns a DataFrame with results and scores."""
        df = pd.DataFrame(columns=["Issue Title", "Expected Results"] + [f"Results ({strategy.strategy_name})" for strategy in self.strategies])

        # Dictionary to store total scores per strategy
        total_scores = {strategy.strategy_name: 0 for strategy in self.strategies}

        # Iterate over each Issue in the dataset
        for issue in self.dataset:
            issue_data = {"title": issue.title, "description": issue.content}  # Prepare data for localization
            row_data = {
                "Issue Title": issue.title,
                "Expected Results": issue.expected_results
            }

            # Calculate and store results and formatted score+results for each strategy
            for strategy in self.strategies:
                actual_results = [res[1] for res in strategy.localize(issue_data, top_n=5)]
                score = self.calculate_score(issue.expected_results, actual_results)
                total_scores[strategy.strategy_name] += score  # Accumulate score for total

                # Format results with score as requested
                formatted_result = f"{score:.2f} {actual_results}"
                row_data[f"Results ({strategy.strategy_name})"] = formatted_result

            # Append row data to DataFrame
            df = pd.concat([df, pd.DataFrame([row_data])], ignore_index=True)

        # Append total scores row to DataFrame
        total_row = {"Issue Title": "Score", "Expected Results": ""}
        for strategy in self.strategies:
            total_row[f"Results ({strategy.strategy_name})"] = f"{(total_scores[strategy.strategy_name]/len(self.dataset))*100:.2f}%"

        df = pd.concat([df, pd.DataFrame([total_row])], ignore_index=True)
        return df

**Test setup**

In [7]:
os.environ["LLM_PROVIDER_NAME"] = "openai"

projects_store = "/Users/pdhoolia/projects-store"
repo_full_name = "pdhoolia/se-agent"
src_dir = "se_agent"

code_dir = os.path.join(projects_store, repo_full_name, "repo", src_dir)
code_semantics_dir = os.path.join(projects_store, repo_full_name, "metadata", "package_details")

project_manager = ProjectManager(projects_store)
project_info = project_manager.get_project(repo_full_name)
project = Project(os.getenv("GITHUB_TOKEN"), projects_store, project_info)

**Create combinded semantic summary + Code files**

In [8]:
# Create a temporary directory for the combined documents
combined_docs_dir = tempfile.mkdtemp()

# Iterate over the semantic summaries and combine with corresponding code files
for root, _, files in os.walk(code_semantics_dir):
    for file in files:
        if file.endswith(".md"):
            filename_without_extn = file.split('.')[0]
            summary_file_path = os.path.join(root, file)
            # Get corresponding code file path
            relative_path = os.path.relpath(root, code_semantics_dir)
            code_file_path = os.path.join(code_dir, relative_path, f"{filename_without_extn}.py")
            
            # Only proceed if the code file exists
            if os.path.exists(code_file_path):
                # Read content from both summary and code files
                with open(summary_file_path, "r") as summary_file:
                    semantic_summary_content = summary_file.read()
                with open(code_file_path, "r") as code_file:
                    code_content = code_file.read()
                
                # Combine the contents
                combined_content = f"# Semantic summary\n\n{semantic_summary_content}\n\n# Code\n\n```python\n{code_content}\n```"
                
                # Define path for the combined document in the temporary folder
                combined_file_dir = os.path.join(combined_docs_dir, relative_path)
                os.makedirs(combined_file_dir, exist_ok=True)
                combined_file_path = os.path.join(combined_file_dir, f"{filename_without_extn}.md")
                
                # Save the combined content
                with open(combined_file_path, "w") as combined_file:
                    combined_file.write(combined_content)

**Embeddings**

In [9]:
from langchain_openai import OpenAIEmbeddings

embeddings = OpenAIEmbeddings(model="text-embedding-3-large")

**Strategies**

In [10]:
code_file_embeddings = SemanticVectorSearchStrategy(code_dir, src_dir, embeddings, strategy_name="Code File Embeddings")
code_semantics_embeddings = SemanticVectorSearchStrategy(code_semantics_dir, src_dir, embeddings, strategy_name="Code Semantics Embeddings")
combined_embeddings = SemanticVectorSearchStrategy(combined_docs_dir, src_dir, embeddings, strategy_name="Combined Embeddings")
hierarchical_strategy = HierarchicalLocalizationStrategy(project, strategy_name="Hierarchical Localization")

strategies_to_evaluate = [code_file_embeddings, code_semantics_embeddings, combined_embeddings, hierarchical_strategy]

**Evaluate**

In [11]:
evaluator = LocalizationEvaluator(
    dataset=dataset,
    strategies_to_evaluate=strategies_to_evaluate
)

evaluation_results = evaluator.evaluate()

2024-11-17 13:56:59,317 - se-agent - DEBUG - Relevant Packages: ['se_agent']
2024-11-17 13:57:01,136 - se-agent - DEBUG - File Localization Suggestions: [FileLocalizationSuggestion(package='se_agent', file='project_info.py', confidence=0.9, reason='This file defines the ProjectInfo class, which stores metadata like the GitHub token for projects.'), FileLocalizationSuggestion(package='se_agent', file='project_manager.py', confidence=0.85, reason='Manages project data including loading and saving project information, relevant for handling GitHub tokens.'), FileLocalizationSuggestion(package='se_agent', file='onboard_agent.py', confidence=0.8, reason='Facilitates project onboarding and uses environment variables and ProjectManager, relevant for GitHub token management.')]
2024-11-17 13:57:04,540 - se-agent - DEBUG - Relevant Packages: ['llm']
2024-11-17 13:57:12,753 - se-agent - DEBUG - File Localization Suggestions: [FileLocalizationSuggestion(package='llm', file='api.py', confidence=0.8

**Display results**

In [12]:
# Create a copy of the DataFrame for display purposes
display_df = evaluation_results.copy()

# Set the index to start from 1
display_df.index = display_df.index + 1

# Apply left alignment to all columns, including headers
df_style = display_df.style \
    .set_table_attributes("style='width:100%'") \
    .set_properties(**{'text-align': 'left'}) \
    .set_table_styles([{
        'selector': 'th',
        'props': [('text-align', 'left')]
    }])

df_style

Unnamed: 0,Issue Title,Expected Results,Results (Code File Embeddings),Results (Code Semantics Embeddings),Results (Combined Embeddings),Results (Hierarchical Localization)
1,Project level override for github token,"['project', 'project_info', 'onboard_agent']","0.87 ['onboard_agent', 'project_info', 'listener_core', 'lambda_function', 'project']","0.93 ['listener_core', 'onboard_agent', 'project_info', 'project', 'lambda_function']","0.93 ['listener_core', 'onboard_agent', 'project_info', 'project', 'flask_server']","0.67 ['project_info', 'project_manager', 'onboard_agent']"
2,Retry LLM call on Rate Limit Error,"['retry_with_backoff', 'api']","0.90 ['api', 'localizer', 'retry_with_backoff', 'change_suggester', 'project']","1.00 ['api', 'retry_with_backoff', 'lambda_function', 'change_suggester', 'listener_core']","1.00 ['api', 'retry_with_backoff', 'change_suggester', 'localizer', 'lambda_function']","1.00 ['api', 'retry_with_backoff']"
3,Handle issue comments as well,"['listener_core', 'issue_analyzer', 'localizer', 'change_suggester', 'project']","1.00 ['listener_core', 'project', 'issue_analyzer', 'localizer', 'change_suggester']","0.80 ['listener_core', 'issue_analyzer', 'change_suggester', 'localizer', 'onboard_agent']","1.00 ['listener_core', 'issue_analyzer', 'project', 'change_suggester', 'localizer']","0.80 ['listener_core', 'issue_analyzer', 'localizer', 'change_suggester', 'api']"
4,Update semantic understanding on code push to the main branch,"['listener_core', 'project', 'file_analyzer', 'package_summary']","0.50 ['project', 'listener_core', 'localizer', 'change_suggester', 'issue_analyzer']","0.25 ['listener_core', 'change_suggester', 'localizer', 'onboard_agent', 'issue_analyzer']","0.50 ['listener_core', 'localizer', 'change_suggester', 'project', 'issue_analyzer']","1.00 ['file_analyzer', 'package_summary', 'listener_core', 'project']"
5,API based onboarding for a new project,"['listener_core', 'flask_server', 'lambda_function', 'project_manager', 'project']","0.80 ['onboard_agent', 'flask_server', 'listener_core', 'lambda_function', 'project_manager']","0.80 ['onboard_agent', 'flask_server', 'listener_core', 'lambda_function', 'project']","0.80 ['onboard_agent', 'flask_server', 'listener_core', 'lambda_function', 'project']","0.60 ['flask_server', 'lambda_function', 'project_info', 'project_manager', 'onboard_agent']"
6,Move lambda function within the se_agent package structure,['lambda_function'],"1.00 ['lambda_function', 'localizer', 'onboard_agent', 'project', '__init__']","1.00 ['lambda_function', 'onboard_agent', '__init__', 'listener_core', 'localizer']","1.00 ['lambda_function', 'localizer', 'listener_core', 'change_suggester', 'package_summary']","1.00 ['lambda_function', '__init__']"
7,Use structured output for semantic summary generation,"['localizer', 'file_analyzer', 'package_summary', 'project']","0.95 ['package_summary', 'change_suggester', 'localizer', 'project', 'file_analyzer']","0.75 ['change_suggester', 'package_summary', 'file_analyzer', 'localizer', 'api']","0.75 ['package_summary', 'file_analyzer', 'change_suggester', 'localizer', 'api']","0.95 ['file_analyzer', 'package_summary', 'localizer', 'listener_core', 'project']"
8,Tool based (no LLM) code structure name generation,"['package_summary', 'project']","0.70 ['package_summary', 'change_suggester', 'localizer', 'file_analyzer', 'project']","0.50 ['package_summary', 'change_suggester', 'file_analyzer', 'localizer', 'api']","0.50 ['package_summary', 'change_suggester', 'file_analyzer', 'localizer', 'api']","0.50 ['package_summary', 'file_analyzer']"
9,Retrieval based localization,"['localizer', 'project', 'api', 'model_configuration_manager']","0.95 ['localizer', 'change_suggester', 'project', 'api', 'model_configuration_manager']","0.50 ['localizer', 'change_suggester', 'api', 'listener_core', 'package_summary']","0.75 ['localizer', 'change_suggester', 'api', 'project', 'listener_core']","0.75 ['localizer', 'api', 'onboard_agent', 'project']"
10,Checkpoint,['project'],"1.00 ['project', 'onboard_agent', 'model_configuration_manager', 'api', 'listener_core']","0.40 ['api', 'onboard_agent', 'listener_core', 'project', 'localizer']","0.80 ['onboard_agent', 'project', 'api', 'flask_server', 'listener_core']",1.00 ['project']


**Cleanup the temporary directory for combined files**

In [None]:
shutil.rmtree(combined_docs_dir)