# Preview from LangChain

## Step by step code

1. get LangSmith API Key from environment
2. set up anthropic key for chat model
3. set up embedding model for embeddings
4. select vector database
5. set up document loader and split up

## Set up API keys


## Tutorials

LangChain RAG tutorial document
Part 1
https://python.langchain.com/docs/tutorials/rag

Part 2
extends the implementation to accommodate conversation-style interactions and
multi-step retrieval processes.
https://python.langchain.com/docs/tutorials/qa_chat_history/

LangChain document loader for GitHub Repo
https://python.langchain.com/docs/integrations/document_loaders/github/

LangChain document loader for Git Repository
https://python.langchain.com/docs/integrations/document_loaders/git/

LangChain document loader for Source Code (e.g. Python)
https://python.langchain.com/docs/integrations/document_loaders/source_code/

LangSmith evaluation for a chatbot
https://docs.smith.langchain.com/evaluation/tutorials/evaluation

LangSmith evaluation for a rag
https://docs.smith.langchain.com/evaluation/tutorials/rag


- [x] set up LangSmith key


In [1]:
import os
%reload_ext dotenv
%dotenv
os.environ["LANGSMITH_TRACING"] = "true"
os.environ["LANGSMITH_API_KEY"] = os.getenv("LANGSMITH_API_KEY")


- [x] set up Anthropic key


In [261]:
if not os.environ.get("ANTHROPIC_API_KEY"):
    os.environ["ANTHROPIC_API_KEY"] = os.getenv("ANTHROPIC_API_KEY")

from langchain.chat_models import init_chat_model

llm = init_chat_model(
    "claude-3-5-sonnet-latest", model_provider="anthropic", temperature=0.5
)

- [x] set up Google gemini as embedding model


In [3]:
if not os.environ.get("GOOGLE_API_KEY"):
    os.environ["GOOGLE_API_KEY"] = os.getenv("GOOGLE_API_KEY")

from langchain_google_genai import GoogleGenerativeAIEmbeddings

embeddings = GoogleGenerativeAIEmbeddings(model="models/text-embedding-004")

- [x] set up vector database

# Choosing database

Load, chunk, split, embed and vectorize code data and document data into database

## Candidates

1. cassandra
2. open search https://opensearch.org/platform/os-search/vector-database/
3. Pinecone
4. MongoDB
5. PostgreSQL
6. [x] Chroma, locally hosted with sqlite


In [4]:
from langchain_chroma import Chroma

# huggingface_store = Chroma(
#    collection_name="hf_python_tech_credit",
#    embedding_function=embeddings,
#    persist_directory="./chroma_langchain_db",
# )

vector_store = Chroma(
    collection_name="java_tech_credit",
    embedding_function=embeddings,
    persist_directory="./chroma_langchain_db",  # Where to save data locally, remove if not necessary
)

document_store = Chroma(
    collection_name="document_tech_credit",
    embedding_function=embeddings,
    persist_directory="./chroma_langchain_db",
)

- [ ] Import and load a GitHub Repo as a document


In [5]:
from langchain_community.document_loaders import GithubFileLoader
from langchain_community.document_loaders import JSONLoader
from langchain_core.documents import Document
from langgraph.graph import START, StateGraph

# from code_splitter import Language, TiktokenSplitter

if not os.environ.get("GITHUB_PERSONAL_ACCESS_TOKEN"):
    os.environ["GITHUB_PERSONAL_ACCESS_TOKEN"] = os.getenv(
        "GITHUB_PERSONAL_ACCESS_TOKEN"
    )

# Load and chunk contents of the github repo
loader = GithubFileLoader(
    repo="alexsun2/TC-Examples",  # the repo name
    branch="main",  # the branch name
    # access_token=ACCESS_TOKEN, # delete/comment out this argument if you've set the access token as an env var.
    github_api_url="https://api.github.com",
    # parser=LanguageParser(language=Language.JAVA, parser_threshold=200),
    file_filter=lambda file_path: file_path.endswith(".java"),  # load all java files.
)
documents = loader.load()

- [ ] test documents content


In [6]:
print(documents[7].metadata)

{'path': 'Builder/ComputerExample/Computer.java', 'sha': 'cb2675d568a8a155a59e62f7b38d02a30240fe5c', 'source': 'https://api.github.com/alexsun2/TC-Examples/blob/main/Builder/ComputerExample/Computer.java'}


- [ ] map metadata


In [7]:
import json

# Step 1: Load the JSON metadata from a file (adjust path accordingly)
with open("./repo_metadata.json", "r", encoding="utf-8") as f:
    json_metadata_list = json.load(f)

# The sample JSON will be like a list of dicts, for example:
# [
#   {"path": "src/pybreaker.py", "type": "source", "tech-credit": "Circuit Breaker", "tech_credit_description": "good design"},
#   {"path": "test/unitest_pybreaker.py", "type": "test", "tech-credit": "Circuit Breaker", "tech_credit_description": "good design"}
# ]

# Step 2: Create a dictionary mapping from path to metadata (excluding 'path' key)
metadata_map = {
    entry["path"]: {k: v for k, v in entry.items() if k != "path"}
    for entry in json_metadata_list
}

- [x] use code splitter


In [8]:
from langchain_text_splitters import Language, RecursiveCharacterTextSplitter

# from code_splitter import Language, TiktokenSplitter
# use code-splitter
# https://pypi.org/project/code-splitter/
# python_splitter = TiktokenSplitter(Language.Python, max_size=200)
java_splitter = RecursiveCharacterTextSplitter.from_language(
    language=Language.JAVA, chunk_size=200
).from_tiktoken_encoder()

all_splits = [
    Document(
        page_content=(
            # "# ===== code structure =====\n"
            # + "\n".join(f"# {line}" for line in splits.subtree.splitlines())
            # + "\n\n"
            splits
        ),
        metadata=metadata_map.get(doc.metadata.get("path"), {}),
    )
    for doc in documents
    for splits in java_splitter.split_text(doc.page_content)
]

- [ ] print page_content and metadata for a split


In [9]:
print(all_splits[20].page_content)
print(all_splits[20].metadata)

// TC_TYPE: iterator 

package Iterator.JavaExample1;

import java.util.ArrayList;
import java.util.Iterator;
import java.util.List;

/**
 * Aggregate class that holds a collection of books
 * Important to note: an iterator needs to implement the Iterator interface
 * (java.util.Iterator).
 * The iterable needs to implement the Iterable interface and provide an
 * iterator method.
 * If the class does not implement Iterable, it is not an iterable.
 */
public class BookCollection implements Iterable<Book> {

    private List<Book> books = new ArrayList<>();

    public void addBook(Book book) {
        books.add(book);
    }

    @Override
    public Iterator<Book> iterator() {
        return new BookIterator(books);
    }
}
{'type': 'source', 'tech_credit': 'Iterator', 'tech_credit_description': 'Provide a way to access the elements of an aggregate object sequentially without exposing its underlying representation. An iterator needs to implement the Iterator interface, which defines me

- [ ] load, split and embed documents into vector database


- [x] index chunks for code vector db, DO NOT LOAD TWICE!


In [10]:
# Index chunks
code_embed_index = vector_store.add_documents(documents=all_splits)

- [ ] load, chunk, split and embed documents about technical credit


In [11]:
import bs4
from langchain_community.document_loaders import WebBaseLoader
from langchain_text_splitters import RecursiveCharacterTextSplitter

# Load and chunk contents of the blog
bs4_strainer = bs4.SoupStrainer(
    class_=(
        "article-header__section",
        "article-header__topic-and-issue-section",
        "article-header article-header__title",
        "article-header__subtitle",
        "article-header__meta",
        "article-table-of-contents",
        "article-contents",
        "article-footer",
    )
)

doc_loader = WebBaseLoader(
    web_paths=("https://cacm.acm.org/opinion/technical-credit/",),
    bs_kwargs=dict(parse_only=bs4_strainer),
)

text_documents = doc_loader.load()
# recurisive splitter , 7 , all splits

USER_AGENT environment variable not set, consider setting it to identify your requests.


- [x] test the web page document


In [12]:
assert len(text_documents) == 1
print(f"Total characters: {len(text_documents[0].page_content)}")
print(text_documents[0].page_content[:500])

Total characters: 14484
Opinion

Computing Profession 


Balancing initial investment and long-term results in the software development process.


				By Ian Gorton, Alessio Bucaioni, and Patrizio Pelliccione 

Posted Dec 26 2024 



What Is Technical Credit?
Technical Credit in Practice
A Research Agenda for Technical Credit
Conclusion
References
Footnotes




Technical debt (TD) is an established concept in software engineering encompassing an unavoidable side effect of software development.3 It arises due to tight s


- [ ] split the document


In [13]:
text_splitter = RecursiveCharacterTextSplitter(
    chunk_size=1000,  # chunk size (characters)
    chunk_overlap=200,  # chunk overlap (characters)
    add_start_index=True,  # track index in original document
)
text_splits = text_splitter.split_documents(text_documents)

print(f"Split article into {len(text_splits)} sub-documents.")

Split article into 20 sub-documents.


- [x] embed the documents into vector database, DO NOT LOAD TWICE!


In [14]:
document_ids = document_store.add_documents(documents=text_splits)

# RAG System Part

## Customize Prompt

## Define nodes and graphs in the rag system

1. [x] retrieve code and metadata
2. [x] retrieve academic documents
3. [x] send message to LLM


- [ ] design prompt template.


In [15]:
# Define prompt for question-answering
# N.B. for non-US LangSmith endpoints, you may need to specify
# api_url="https://api.smith.langchain.com" in hub.pull.
from langchain_core.prompts import ChatPromptTemplate
from jinja2 import Environment, BaseLoader
import textwrap

jinja2_prompt = """\
{% for part in parts %}
Here is the No. {{ part.ordinal }} part of a tech credit
Descirption:
{{ part.tech_credit }}

Example code for that tech credit:
{{ part.context_code }}

Here is the code from user:
{{ part.user_code }}
{% endfor %}
"""

user_prompt_template = Environment(loader=BaseLoader).from_string(jinja2_prompt)

prompt = ChatPromptTemplate(
    [
        (
            "system",
            "You are an assistant for identifying technical credit. Use the following pieces \
                of retrieved context to answer the question. If you don't know the answer, just \
                say that you don't know. Please only use the provided technical credit categories. For each code snippet, keep your answer as \
                    concise as possible, while identifying as many technical credits as possible.",
        ),
        (
            "user",
            textwrap.dedent(
                """\
                The list of technical credit categories you must follow:
                {tech_credit_list}
                
                Some documentation about tech credit:
                {context_doc}

                The following are snippets of codes that are most similar to example codes of 
                tech credits.
                {rendered}
                
                Question: {question}
                Answer:
                """
            ),
        ),
    ]
)

- [ ] collect metadata from code


In [16]:
def collect_unique_pairs(documents):
    """
    Collect unique concatenated 'tech_credit: description' strings from document metadata.

    Args:
        documents (list[dict]): A list of Document objects, each with a 'metadata' field.

    Returns:
        list[str]: A list of unique 'tech_credit: description' strings.
    """
    seen = set()

    for doc in documents:
        metadata = doc.metadata
        credit = metadata.get("tech_credit")
        description = metadata.get("tech_credit_description")
        if credit and description:
            combined = f"{credit}: {description}"
            seen.add(combined)

    return list(seen)

In [249]:
from urllib.parse import urlparse
from typing import List


def load_repo(url: str, branch: str, folder: str = "") -> List[Document]:
    """
    Loads all Python files from a GitHub repository using the repository URL.

    Args:
        url (str): The full URL of the GitHub repository (e.g., "https://github.com/org/project").

    Returns:
        List[Document]: A list of loaded document objects (the format depends on GithubFileLoader).

    Raises:
        ValueError: If the URL is not a valid GitHub repo URL.
    """
    parsed = urlparse(url)
    if parsed.netloc != "github.com":
        raise ValueError(f"URL is not a github.com repo: {url}")

    # The path is like '/org/project' or '/org/project/'
    path_parts = parsed.path.strip("/").split("/")
    if len(path_parts) < 2:
        raise ValueError(f"Invalid GitHub repository URL: {url}")
    repo_name = "/".join(path_parts[:2])  # Only org/project, ignore any deeper paths

    loader = GithubFileLoader(
        repo=repo_name,
        branch=branch,
        repo_path=folder,
        # access_token=ACCESS_TOKEN,
        github_api_url="https://api.github.com",
        file_filter=lambda file_path: not file_path.startswith("tests/")
        and file_path.endswith(".java")
        and file_path.startswith(folder+"/"),
    )
    documents = loader.load()
    return documents


def split_documents(documents: List[Document]) -> List[str]:
    """
    Splits Python code documents into code snippets and prepends the code structure as a comment header.

    Args:
        documents (List[Document]): List of Document objects containing Python code.

    Returns:
        List[Document]: List of new strings, each with a code structure comment followed by the code snippet.
    """

    all_splits = []
    for doc in documents:
        splits = java_splitter.split_text(doc.page_content)
        for snippet in splits:
            # delete the header for now, only splitting the literal source code
            # header = (
            #    "# ===== code structure =====\n" +
            #    "\n".join(f"# {line}" for line in snippet.subtree.splitlines()) +
            #    "\n\n"
            # )
            all_splits.append(snippet)
    return all_splits

In [250]:
import heapq
from typing import Callable
from statistics import mean, median  # for later usage of different similarity score


def default_min_score_fn(results: list[tuple[Document, float]]) -> float:
    """Default scoring function: returns the minimum score."""
    return min(score for _, score in results)


def top_k_similar_queries(
    queries: list[str],
    vectorstore,
    k: int = 3,
    scoring_fn: Callable[[list[tuple[Document, float]]], float] = default_min_score_fn,
    top_docs_per_query: int = 4,
) -> list[tuple[str, list[Document], float]]:
    """
    Executes similarity_search_with_score for each query and returns top-k queries
    sorted by a user-defined scoring function.

    Args:
        queries: List of query strings.
        vectorstore: A LangChain-compatible vector store.
        k: Number of top results to return.
        scoring_fn: Function that maps a list of (Document, score) to a float score.
        top_docs_per_query: Number of documents to retrieve for each query.

    Returns:
        A list of (query, documents, aggregated_score) sorted by aggregated_score descending.
    """
    heap = []

    for query in queries:
        results = vectorstore.similarity_search_with_score(query, k=top_docs_per_query)
        if not results:
            continue

        agg_score = scoring_fn(results)

        # if agg_score > 0.65:
        #    continue

        docs = [doc for doc, _ in results]

        # Use negative score to simulate a max-heap
        heapq.heappush(heap, (-agg_score, query, docs))
        if len(heap) > k:
            heapq.heappop(heap)

    # Return sorted results: highest score first
    top_k = sorted([(-score, query, docs) for score, query, docs in heap], reverse=True)
    return [(query, docs, score) for score, query, docs in top_k]

In [251]:
from typing_extensions import List, TypedDict


# Define state for application
class State(TypedDict):
    question: str
    context_doc: List[Document]
    # a part that contains example codes, user code, tech credit and index order
    parts: list[dict]
    url: str  # user code
    branch: str
    folder: str
    answer: str


# Define application steps
def retrieve(state: State):
    repo_splits = split_documents(
        load_repo(state["url"], state["branch"], state.get("folder"))
    )
    retrieved_docs = top_k_similar_queries(repo_splits, vector_store)
    parts = [
        {
            "ordinal": i + 1,
            "tech_credit": "\n".join(collect_unique_pairs(context_code)),
            "user_code": user_code,
            "context_code": "\n\n".join(doc.page_content for doc in context_code),
        }
        for i, (user_code, context_code, _) in enumerate(retrieved_docs)
    ]
    return {"parts": parts}


def retrieve_doc(state: State):
    retrieved_doc = document_store.similarity_search(state["question"])
    return {"context_doc": retrieved_doc}


def load_tech_credit_categories() -> list[str]:
    # Define the loader for the JSON file in the GitHub repo
    loader = GithubFileLoader(
        repo="alexsun2/TC-Examples",  # The repo name
        branch="main",  # The branch name
        github_api_url="https://api.github.com",
        file_filter=lambda file_path: file_path.endswith("tech_credit_patterns.json"),
    )

    # Load the JSON document from the repo
    documents = loader.load()

    # Assuming the content is in the first document (if it's only one file)
    tech_credit_patterns_json = json.loads(documents[0].page_content)

    # Extract pattern names
    pattern_names = [entry["pattern_name"] for entry in tech_credit_patterns_json]

    return pattern_names


def generate(state: State):
    doc_content = "\n\n".join(doc.page_content for doc in state["context_doc"])
    user_prompt = user_prompt_template.render(parts=state["parts"])
    java_splitter = RecursiveCharacterTextSplitter(
        Language.JAVA, chunk_size=200
    ).from_tiktoken_encoder()
    tech_credit_categories = load_tech_credit_categories()

    messages = prompt.invoke(
        {
            "question": state["question"],
            "rendered": user_prompt,
            "context_doc": doc_content,
            "tech_credit_list": tech_credit_categories,
        }
    )

    print(messages)

    with open("context_doc_content.txt", "w", encoding="utf-8") as f:
        f.write(messages.to_string())

    response = llm.invoke(messages)
    return {"answer": response.content}


# Compile application and test
graph_builder = StateGraph(State).add_sequence([retrieve, generate])
graph_builder.add_node(retrieve_doc)
graph_builder.add_edge(START, "retrieve")
graph_builder.add_edge(START, "retrieve_doc")
graph = graph_builder.compile()

# Ask Question Part

## Circuit Breaker

## MVC model

## Iterator pattern


- [ ] Ask questions and test RAG


In [263]:
response = graph.invoke(
    {
        "question": "Tell me what tech credits does the repo possibly use?",
        "url": "https://github.com/alexsun2/cs3500lab9",
        "branch": "main",
        "folder": "src",
    }
)
print(response["answer"])

messages=[SystemMessage(content="You are an assistant for identifying technical credit. Use the following pieces                 of retrieved context to answer the question. If you don't know the answer, just                 say that you don't know. Please only use the provided technical credit categories. For each code snippet, keep your answer as                     concise as possible, while identifying as many technical credits as possible.", additional_kwargs={}, response_metadata={}), HumanMessage(content='The list of technical credit categories you must follow:\n[\'Circuit Breaker\', \'Strategy Pattern\', \'Adapter Pattern\', \'Builder Pattern\', \'Model View Controller\', \'Visitor Pattern\', \'Decorator Pattern\', \'Iterator\', \'Platform Abstraction Layers\', \'Proxy\', \'Chain of responsibility\', \'Template method\', \'Front controller\', \'Command\', \'Interpreter\', \'Mediator\', \'Memento\', \'Observer or Publish/subscribe\', \'Servant\', \'Active Object\', \'Event-based

In [264]:
response = graph.invoke(
    {
        "question": "Tell me what tech credits does the repo possibly use?",
        "url": "https://github.com/iluwatar/java-design-patterns/",
        "branch": "master",
        "folder": "command",
    }
)
print(response["answer"])

messages=[SystemMessage(content="You are an assistant for identifying technical credit. Use the following pieces                 of retrieved context to answer the question. If you don't know the answer, just                 say that you don't know. Please only use the provided technical credit categories. For each code snippet, keep your answer as                     concise as possible, while identifying as many technical credits as possible.", additional_kwargs={}, response_metadata={}), HumanMessage(content='The list of technical credit categories you must follow:\n[\'Circuit Breaker\', \'Strategy Pattern\', \'Adapter Pattern\', \'Builder Pattern\', \'Model View Controller\', \'Visitor Pattern\', \'Decorator Pattern\', \'Iterator\', \'Platform Abstraction Layers\', \'Proxy\', \'Chain of responsibility\', \'Template method\', \'Front controller\', \'Command\', \'Interpreter\', \'Mediator\', \'Memento\', \'Observer or Publish/subscribe\', \'Servant\', \'Active Object\', \'Event-based

In [None]:
response = graph.invoke(
    {
        "question": "Tell me what tech credits does the repo possibly use?",
        "url": "https://github.com/iluwatar/java-design-patterns/",
        "branch": "master",
        "folder": "iterator",
    }
)
print(response["answer"])

messages=[SystemMessage(content="You are an assistant for identifying technical credit. Use the following pieces                 of retrieved context to answer the question. If you don't know the answer, just                 say that you don't know. Please only use the provided technical credit categories. For each code snippet, keep your answer as                     concise as possible, while identifying as many technical credits as possible.", additional_kwargs={}, response_metadata={}), HumanMessage(content='The list of technical credit categories you must follow:\n[\'Circuit Breaker\', \'Strategy Pattern\', \'Adapter Pattern\', \'Builder Pattern\', \'Model View Controller\', \'Visitor Pattern\', \'Decorator Pattern\', \'Iterator\', \'Platform Abstraction Layers\', \'Proxy\', \'Chain of responsibility\', \'Template method\', \'Front controller\', \'Command\', \'Interpreter\', \'Mediator\', \'Memento\', \'Observer or Publish/subscribe\', \'Servant\', \'Active Object\', \'Event-based

In [269]:
response = graph.invoke(
    {
        "question": "What are all of the possible technical credits in this repo?",
        "url": "https://github.com/alexsun2/cs3500hw7",
        "branch": "master",
        "folder": "src"
    }
)
print(response["answer"])

messages=[SystemMessage(content="You are an assistant for identifying technical credit. Use the following pieces                 of retrieved context to answer the question. If you don't know the answer, just                 say that you don't know. Please only use the provided technical credit categories. For each code snippet, keep your answer as                     concise as possible, while identifying as many technical credits as possible.", additional_kwargs={}, response_metadata={}), HumanMessage(content='The list of technical credit categories you must follow:\n[\'Circuit Breaker\', \'Strategy Pattern\', \'Adapter Pattern\', \'Builder Pattern\', \'Model View Controller\', \'Visitor Pattern\', \'Decorator Pattern\', \'Iterator\', \'Platform Abstraction Layers\', \'Proxy\', \'Chain of responsibility\', \'Template method\', \'Front controller\', \'Command\', \'Interpreter\', \'Mediator\', \'Memento\', \'Observer or Publish/subscribe\', \'Servant\', \'Active Object\', \'Event-based

In [270]:
response = graph.invoke(
    {
        "question": "What are all of the possible technical credits in this repo?",
        "url": "https://github.com/dreifusjack/ThreeTrios",
        "branch": "main",
        "folder": "src"
    }
)
print(response["answer"])

messages=[SystemMessage(content="You are an assistant for identifying technical credit. Use the following pieces                 of retrieved context to answer the question. If you don't know the answer, just                 say that you don't know. Please only use the provided technical credit categories. For each code snippet, keep your answer as                     concise as possible, while identifying as many technical credits as possible.", additional_kwargs={}, response_metadata={}), HumanMessage(content='The list of technical credit categories you must follow:\n[\'Circuit Breaker\', \'Strategy Pattern\', \'Adapter Pattern\', \'Builder Pattern\', \'Model View Controller\', \'Visitor Pattern\', \'Decorator Pattern\', \'Iterator\', \'Platform Abstraction Layers\', \'Proxy\', \'Chain of responsibility\', \'Template method\', \'Front controller\', \'Command\', \'Interpreter\', \'Mediator\', \'Memento\', \'Observer or Publish/subscribe\', \'Servant\', \'Active Object\', \'Event-based

In [271]:
with open("repos.json", "r", encoding="utf-8") as f:
    data = json.load(f)
responses = {}
for key in data.keys():
    response = graph.invoke(data[key])
    responses[key] = response["answer"]
with open("responses.json", "w", encoding="utf-8") as f:
    json.dump(responses, f, indent=4)

messages=[SystemMessage(content="You are an assistant for identifying technical credit. Use the following pieces                 of retrieved context to answer the question. If you don't know the answer, just                 say that you don't know. Please only use the provided technical credit categories. For each code snippet, keep your answer as                     concise as possible, while identifying as many technical credits as possible.", additional_kwargs={}, response_metadata={}), HumanMessage(content='The list of technical credit categories you must follow:\n[\'Circuit Breaker\', \'Strategy Pattern\', \'Adapter Pattern\', \'Builder Pattern\', \'Model View Controller\', \'Visitor Pattern\', \'Decorator Pattern\', \'Iterator\', \'Platform Abstraction Layers\', \'Proxy\', \'Chain of responsibility\', \'Template method\', \'Front controller\', \'Command\', \'Interpreter\', \'Mediator\', \'Memento\', \'Observer or Publish/subscribe\', \'Servant\', \'Active Object\', \'Event-based

In [272]:
from langchain_anthropic import ChatAnthropic
from openevals.llm import create_llm_as_judge
from openevals.prompts import CORRECTNESS_PROMPT
from openevals.prompts import HALLUCINATION_PROMPT

correctness_evaluator = create_llm_as_judge(
    prompt=CORRECTNESS_PROMPT,
    continuous=True,
    feedback_key="correctness",
    judge=ChatAnthropic(model="claude-3-5-sonnet-latest", temperature=0),
)

hallucination_evaluator = create_llm_as_judge(
    prompt=HALLUCINATION_PROMPT,
    continuous=True,
    feedback_key="hallucination",
    judge=ChatAnthropic(model="claude-3-5-sonnet-latest", temperature=0),
)

with open("repos.json", "r", encoding="utf-8") as f:
    inputs = json.load(f)
with open("responses.json", "r", encoding="utf-8") as f:
    outputs = json.load(f)
with open("references.json", "r", encoding="utf-8") as f:
    reference_outputs = json.load(f)
context = load_tech_credit_categories()

c_eval_result = correctness_evaluator(
    inputs=inputs, outputs=outputs, reference_outputs=reference_outputs
)
h_eval_result = hallucination_evaluator(inputs=inputs, outputs=outputs, context=context, reference_outputs=reference_outputs)

print(c_eval_result)
print(h_eval_result)

{'key': 'correctness', 'score': 0.8, 'comment': 'Let me evaluate the response against the reference outputs:\n\n1. Accuracy of Pattern Identification:\n- For most of the iluwatar and rg (RefactoringGuru) examples, the response correctly identified the single primary pattern implemented in each case (e.g., Adapter, Builder, Chain of Responsibility, etc.)\n- However, for the project examples (reversi-proj, aaron-reversi, zane-reversi, three-trios-proj, alex-three-trios), the response missed several patterns:\n  - Failed to identify Adapter pattern in reversi-proj and aaron-reversi\n  - Missed Adapter, Observer, and Decorator patterns in zane-reversi\n  - Missed Adapter and Observer patterns in three-trios-proj\n  - Missed MVC in alex-three-trios\n\n2. Completeness:\n- The response provided detailed explanations for each pattern identified\n- However, it was not comprehensive in identifying all patterns present in the project examples\n- About 80% of the patterns were correctly identified

# Roadmap:

1. Use a text embedding model.
   - [x] gemini text-embedding-004
2. Code splitter may not work properly.
   - [x] change to code-splitter that works better.
3. [x] Switch in memory vector storage to a vector database
   - [x] code vector database with additional metadata for tech credit
   - [x] document vector database with academic context about tech credit
4. [ ] Load more standard example codes for tech credit (~20 more examples)
5. [ ] Load more documents (academic contexts) for technical credit (3~5 related academic paper)
6. [x] Allow user to ask about a repository, instead of small snippets of codes
   - [x] load, chunk, split, embed and vectorize a repo
   - [x] similarity search and filter user code by similarity score compared with example code
   - [ ] batch process to LLM
   - [ ] organize response and answers
7. [ ] validate the results
   - [ ] Test our trained data in TC-Examples
   - [ ] use different embedding and LLM to test five example repos
   - Gemini,
   - claude-3.5-haiku,
8. Jun 6:
   1. [ ] choose five test repos
   2. [ ] two more LLM and one more embeddings models
   3. [ ] expected output:
      1. [ ] human expert
      2. [ ] LLM as judge
