In [12]:
response = {
  "user_query": "what kind of inspection required for this project?",
  "answer": "Special inspections are required for this project. Specifically:\n*   Where fabrication of a structural assembly is being performed on the premises of a fabricator's shop.\n*   Prior to the placement of fill, the special inspector shall determine that the site has been prepared in accordance with the contract documents.",
  "reasoning": "Document-1\nRelevance: Yes\nReasoning: The document specifies several types of inspections required, such as \"special inspections\" for structural assembly fabrication, \"Grade and mill test reports\" for steel elements, and site preparation inspection by a \"special inspector.\"\n\nDocument-2\nRelevance: No\nReasoning: The document discusses the reporting, maintenance, and content requirements for inspection reports, but it does not specify what kind of inspections are required for the project.\n\nDocument-3\nRelevance: No\nReasoning: The document focuses on the documentation and reporting procedures for inspections, including discrepancies and corrections, but it does not specify what kind of inspections are required.",
  "extracted_requirements": [
    {
      "id": None,
      "metadata": {
        "project": "Controlled Testing & Inspections for District Wide Additions & Alterations",
        "company": "Edgemont Union Free School District",
        "file_name": "rfp3"
      },
      "page_content": "e.​ Where fabrication of a structural assembly is being performed on the premises of a\nfabricators shop, special inspections shall be required.​\nf.​ Grade and mill test reports are required for main stress carrying steel elements​\ng.​ Prior to the placement of fill, the special inspector shall determine that the site has\nbeen prepared in accordance with the contract documents.​",
      "type": "Document"
    },
    {
      "id": None,
      "metadata": {
        "company": "Edgemont Union Free School District",
        "project": "Controlled Testing & Inspections for District Wide Additions & Alterations",
        "file_name": "rfp3"
      },
      "page_content": "b.​ Copies of all special inspection reports shall be provided to the Architect and\nOwner’s Representative/Construction Manager on a weekly basis. Any inspection/\nreport that indicates a failure or a deviation from the contract documents shall be\nprovided to the Architect and Owner’s representative immediately within (24\nhours of the inspection).​\nc.​ The testing agency shall be required to maintain a special inspection book. The\nbook shall be a three-ring binder which contains copies of all inspection reports.\nAt the completion of the project three (3) complete copies of the inspection report\nbook shall be turned over to the Owner.​\nd.​ All inspection reports shall include the date of the inspection, the specific location\nwhere the inspection was conducted and that the work inspected was done in\nconformance with the construction documents or shall be corrected, along with all\nother pertinent information.​",
      "type": "Document"
    },
    {
      "id": None,
      "metadata": {
        "company": "Edgemont Union Free School District",
        "project": "Controlled Testing & Inspections for District Wide Additions & Alterations",
        "file_name": "rfp3"
      },
      "page_content": "proposal. Proof of qualifications must be submitted to the Owner as part of the proposal.\nAll inspections must be documented with reports indicating that all work was done in conformance to\napproved construction documents, and be furnished to the Code Enforcement Official, Owner, its\ndesignee, and Architect. Any discrepancies must be documented appropriately and reported, as well as\ncorrections must also be documented and reported to the Code Enforcement Official, Owner, its\ndesignee, and Architect.",
      "type": "Document"
    }
  ]
}

# calculate the retrieved relevancy score 
use reasoning output from system response, responing was generated using how retrieved are relevant with respect to user query. 


In [5]:
print(response['reasoning'])

Document-1
Relevance: Yes
Reasoning: The document specifies several types of inspections required, such as "special inspections" for structural assembly fabrication, "Grade and mill test reports" for steel elements, and site preparation inspection by a "special inspector."

Document-2
Relevance: No
Reasoning: The document discusses the reporting, maintenance, and content requirements for inspection reports, but it does not specify what kind of inspections are required for the project.

Document-3
Relevance: No
Reasoning: The document focuses on the documentation and reporting procedures for inspections, including discrepancies and corrections, but it does not specify what kind of inspections are required.


In [6]:
total_retrieved_docs = len(response['extracted_requirements'])
total_retrieved_docs

3

In [8]:
# count relevanance "yes" in reasoning text
relevant_count = response['reasoning'].lower().count("relevance: yes")
relevant_count

1

In [9]:
relevant_doc_percentage = (relevant_count / total_retrieved_docs) * 100
relevant_doc_percentage

33.33333333333333

# Eval reasoning Quality

In [51]:
eval_reasoning_quality_prompt = """You are expert in evaluating the quality of reasoning provided by a system \
in response to a user query and relevant retrieved documents.
Given the reasoning text, user query and retrieved documents below, assess its quality based on the following criteria:
1. Clarity: Is the reasoning clearly articulated and easy to understand?
2. Relevance: Does the reasoning directly address the user query and the context provided?
3. Depth: Does the reasoning provide sufficient depth and detail to justify the conclusions drawn?
Provide a score from 1 to 10 for each criterion, where 1 is poor and 10 is excellent. Additionally, provide a brief explanation for each score.

<user_query>
{user_query}
</user_query?

<retrieved_documents>
{retrieved_documents}
</retrieved_documents>

<reasoning_text>
{reasoning_text}
</reasoning_text>

Please provide your evaluation in the following format:
Clarity Score: <score>
Clarity Explanation: <explanation>
Relevance Score: <score>
Relevance Explanation: <explanation>
Depth Score: <score>
Depth Explanation: <explanation>"""

In [52]:
from typing import List

def format_retrieved_document(retrieved_docs: List[dict]):
    relevant_document = ""
    for i, doc in enumerate(retrieved_docs):
        relevant_document += f"Document-{i+1}:\n{doc['page_content']}\n\n"
    return relevant_document

formatted_retrieved_doc = format_retrieved_document(response["extracted_requirements"])

In [53]:
import os, getpass



def _set_env(var: str):
    if not os.environ.get(var):
        os.environ[var] = getpass.getpass(f"{var}: ")

_set_env("GOOGLE_API_KEY")

In [54]:
response["reasoning"]

'Document-1\nRelevance: Yes\nReasoning: The document specifies several types of inspections required, such as "special inspections" for structural assembly fabrication, "Grade and mill test reports" for steel elements, and site preparation inspection by a "special inspector."\n\nDocument-2\nRelevance: No\nReasoning: The document discusses the reporting, maintenance, and content requirements for inspection reports, but it does not specify what kind of inspections are required for the project.\n\nDocument-3\nRelevance: No\nReasoning: The document focuses on the documentation and reporting procedures for inspections, including discrepancies and corrections, but it does not specify what kind of inspections are required.'

In [55]:
from langchain_google_genai import ChatGoogleGenerativeAI
from langchain_core.messages import HumanMessage


llm = ChatGoogleGenerativeAI(
    model="gemini-2.5-flash",
    temperature=0,
    max_tokens=None,
    timeout=None,
    max_retries=2,
)
# System message
message = eval_reasoning_quality_prompt.format(
    user_query=response['user_query'], 
    reasoning_text=response["reasoning"],
    retrieved_documents=format_retrieved_document(response["extracted_requirements"])
    ) 

# Generate question 
resoning_eval_response = llm.invoke([
        HumanMessage(content=message)
    ]
)

In [56]:
print(resoning_eval_response.content)

Clarity Score: 9
Clarity Explanation: The reasoning is very clear and easy to understand. Each document's relevance is explicitly stated ("Yes" or "No"), followed by a concise and direct explanation that justifies the assessment.

Relevance Score: 10
Relevance Explanation: The reasoning directly addresses the user query "what kind of inspection required for this project?". For Document-1, it accurately extracts and lists the specific types of inspections mentioned. For Documents 2 and 3, it correctly identifies that they discuss reporting and documentation procedures rather than the *kind* of inspections, thus appropriately deeming them irrelevant to the specific query.

Depth Score: 9
Depth Explanation: The reasoning provides sufficient depth. For Document-1, it lists the specific types of inspections, which is adequate detail to answer the query. For Documents 2 and 3, it explains *why* they are not relevant by summarizing their content (reporting, documentation, etc.), which provide

In [57]:
# count relevanance "yes" in reasoning text
relevant_count = response['reasoning'].lower().count("Clarity Score: yes")
relevant_count

0

In [58]:
import re
import json

def extract_scores(text: str):
    """
    Extract Score + Explanation pairs and convert to JSON.
    """
    pattern = r"""
        (?P<name>\w+)\s+Score:\s*(?P<score>\d+)\s*
        (?P<name2>\w+)\s+Explanation:\s*(?P<explanation>.*?)(?=\n\w+\s+Score:|\Z)
    """

    matches = re.finditer(pattern, text, re.DOTALL | re.VERBOSE)

    result = {}

    for m in matches:
        name = m.group("name").strip().lower()  # clarity, relevance, depth
        score = int(m.group("score"))
        explanation = m.group("explanation").strip()

        result[name] = {
            "score": score,
            "explanation": explanation
        }

    return result


# Example input text
text = """
Clarity Score: 9
Clarity Explanation: The reasoning is very clear and easy to understand. Each document's relevance is explicitly stated ("Relevance: Yes/No") followed by a concise explanation.

Relevance Score: 10
Relevance Explanation: The reasoning directly addresses the user query by identifying which documents specify the *kind* of inspections required and which ones do not, explaining why. The distinction between "kind of inspections" and "reporting/procedures" is accurately made.

Depth Score: 9
Depth Explanation: The reasoning provides sufficient detail to justify its conclusions. For Document-1, it lists specific examples of inspections. For Documents 2 and 3, it clearly explains what those documents *do* cover and why that doesn't answer the user's specific question about the *kind* of inspections. The level of detail is appropriate for evaluating document relevance.
"""

# Extract JSON
output = extract_scores(text)

# Pretty print JSON
print(json.dumps(output, indent=2))


{
  "clarity": {
    "score": 9,
    "explanation": "The reasoning is very clear and easy to understand. Each document's relevance is explicitly stated (\"Relevance: Yes/No\") followed by a concise explanation."
  },
  "relevance": {
    "score": 10,
    "explanation": "The reasoning directly addresses the user query by identifying which documents specify the *kind* of inspections required and which ones do not, explaining why. The distinction between \"kind of inspections\" and \"reporting/procedures\" is accurately made."
  },
  "depth": {
    "score": 9,
    "explanation": "The reasoning provides sufficient detail to justify its conclusions. For Document-1, it lists specific examples of inspections. For Documents 2 and 3, it clearly explains what those documents *do* cover and why that doesn't answer the user's specific question about the *kind* of inspections. The level of detail is appropriate for evaluating document relevance."
  }
}


In [59]:
# Extract individual scores
clarity = output["clarity"]["score"]
relevance = output["relevance"]["score"]
depth = output["depth"]["score"]

# Calculate combined score out of 100
total_score = clarity + relevance + depth
combined_score = (total_score / 30) * 100

combined_score = round(combined_score, 2)

print("Combined Score (out of 100):", combined_score)

Combined Score (out of 100): 93.33


# RAG answer eval 
given rag answer is relevant to user query and retrieved documents

In [61]:
answer_relevancy_prompt = """You are expert in evaluating the relevancy of an answer provided by a system \
in response to a user query and relevant retrieved documents.
Given the answer, user query and retrieved documents below, assess its relevancy based on the following criteria:
1. Accuracy: Does the answer accurately address the user query based on the information from the retrieved documents?
2. Completeness: Does the answer provide a complete response, covering all necessary aspects of the user query?
3. Clarity: Is the answer clearly articulated and easy to understand?
Provide a score from 1 to 10 for each criterion, where 1 is poor and 10 is excellent. Additionally, provide a brief explanation for each score.
<user_query>
{user_query}
</user_query?

<relevant_documents>
{relevant_documents}
</relevant_documents>

<answer_text>
{answer}
</answer_text>

Please provide your evaluation in the following format:
Accuracy Score: <score>
Accuracy Explanation: <explanation>
Completeness Score: <score>
Completeness Explanation: <explanation>
Clarity Score: <score>
Clarity Explanation: <explanation>
"""

In [64]:
from langchain_google_genai import ChatGoogleGenerativeAI
from langchain_core.messages import HumanMessage


llm = ChatGoogleGenerativeAI(
    model="gemini-2.5-flash",
    temperature=0,
    max_tokens=None,
    timeout=None,
    max_retries=2,
)
# System message
message = answer_relevancy_prompt.format(
    user_query=response['user_query'], 
    relevant_documents=format_retrieved_document(response["extracted_requirements"]),
    answer=response["answer"]
    ) 

# Generate question 
answer_eval_response = llm.invoke([
        HumanMessage(content=message)
    ]
)

In [67]:
print(answer_eval_response.content)

Accuracy Score: 10
Accuracy Explanation: The answer accurately extracts information directly from Document-1 regarding the types of special inspections required. All points mentioned in the answer are verifiable and correct based on the provided documents.

Completeness Score: 7
Completeness Explanation: The answer covers two specific types of special inspections mentioned in Document-1. However, it misses one additional type of required report/inspection mentioned in Document-1: "Grade and mill test reports are required for main stress carrying steel elements." Including this would make the answer more complete.

Clarity Score: 10
Clarity Explanation: The answer is very clearly articulated, starting with a general statement and then using bullet points to list the specific requirements. It is easy to read and understand.


In [71]:
output = extract_scores(answer_eval_response.content)

In [72]:
# Extract individual scores
clarity = output["clarity"]["score"]
completeness = output["completeness"]["score"]
accuracy = output["accuracy"]["score"]

# Calculate combined score out of 100
total_score = clarity + completeness + accuracy
combined_score = (total_score / 30) * 100

combined_score = round(combined_score, 2)

print("Combined Score (out of 100):", combined_score)

Combined Score (out of 100): 90.0


# test evaluation

In [1]:
ls

eval.ipynb


In [2]:
import sys


sys.path.append("../../../")

In [3]:
response = {
  "user_query": "what kind of inspection required for this project?",
  "answer": "Special inspections are required for this project. Specifically:\n*   Where fabrication of a structural assembly is being performed on the premises of a fabricator's shop.\n*   Prior to the placement of fill, the special inspector shall determine that the site has been prepared in accordance with the contract documents.",
  "reasoning": "Document-1\nRelevance: Yes\nReasoning: The document specifies several types of inspections required, such as \"special inspections\" for structural assembly fabrication, \"Grade and mill test reports\" for steel elements, and site preparation inspection by a \"special inspector.\"\n\nDocument-2\nRelevance: No\nReasoning: The document discusses the reporting, maintenance, and content requirements for inspection reports, but it does not specify what kind of inspections are required for the project.\n\nDocument-3\nRelevance: No\nReasoning: The document focuses on the documentation and reporting procedures for inspections, including discrepancies and corrections, but it does not specify what kind of inspections are required.",
  "extracted_requirements": [
    {
      "id": None,
      "metadata": {
        "project": "Controlled Testing & Inspections for District Wide Additions & Alterations",
        "company": "Edgemont Union Free School District",
        "file_name": "rfp3"
      },
      "page_content": "e.​ Where fabrication of a structural assembly is being performed on the premises of a\nfabricators shop, special inspections shall be required.​\nf.​ Grade and mill test reports are required for main stress carrying steel elements​\ng.​ Prior to the placement of fill, the special inspector shall determine that the site has\nbeen prepared in accordance with the contract documents.​",
      "type": "Document"
    },
    {
      "id": None,
      "metadata": {
        "company": "Edgemont Union Free School District",
        "project": "Controlled Testing & Inspections for District Wide Additions & Alterations",
        "file_name": "rfp3"
      },
      "page_content": "b.​ Copies of all special inspection reports shall be provided to the Architect and\nOwner’s Representative/Construction Manager on a weekly basis. Any inspection/\nreport that indicates a failure or a deviation from the contract documents shall be\nprovided to the Architect and Owner’s representative immediately within (24\nhours of the inspection).​\nc.​ The testing agency shall be required to maintain a special inspection book. The\nbook shall be a three-ring binder which contains copies of all inspection reports.\nAt the completion of the project three (3) complete copies of the inspection report\nbook shall be turned over to the Owner.​\nd.​ All inspection reports shall include the date of the inspection, the specific location\nwhere the inspection was conducted and that the work inspected was done in\nconformance with the construction documents or shall be corrected, along with all\nother pertinent information.​",
      "type": "Document"
    },
    {
      "id": None,
      "metadata": {
        "company": "Edgemont Union Free School District",
        "project": "Controlled Testing & Inspections for District Wide Additions & Alterations",
        "file_name": "rfp3"
      },
      "page_content": "proposal. Proof of qualifications must be submitted to the Owner as part of the proposal.\nAll inspections must be documented with reports indicating that all work was done in conformance to\napproved construction documents, and be furnished to the Code Enforcement Official, Owner, its\ndesignee, and Architect. Any discrepancies must be documented appropriately and reported, as well as\ncorrections must also be documented and reported to the Code Enforcement Official, Owner, its\ndesignee, and Architect.",
      "type": "Document"
    }
  ]
}

In [5]:
from src.eval.eval_rag import evaluate_rag_response


evaluate_rag_response(response)

{'answer_relevancy_score': 33.33,
 'reasoning_quality_score': 93.33,
 'retrieved_relevancy_score': 33.33333333333333}