In [23]:
from langchain.prompts import PromptTemplate
from langchain_openai import ChatOpenAI
from langchain.chains import LLMChain

In [24]:
import os
os.environ['LANGCHAIN_TRACING_V2'] = 'true'
os.environ['LANGCHAIN_ENDPOINT'] = 'https://api.smith.langchain.com'
os.environ['LANGCHAIN_API_KEY'] = 'langchain_api_key'

In [25]:
os.environ['OPENAI_API_KEY'] = 'model_api_key'

In [41]:
llm = ChatOpenAI(model_name="gpt-4o-mini", temperature=0)

In [42]:
prompt_template = PromptTemplate(
    input_variables=["source", "answer"],
    template="""
You are an evaluator for a knowledge model. 
Your task is to assess the COMPLETENESS of the provided ANSWER in relation to the SOURCE text.

SOURCE:
{source}

ANSWER:
{answer}

### Evaluation Criteria:
Determine how well the ANSWER covers all important information from the SOURCE. 
Consider whether it includes key facts, context, and relevant details. Give a score from 1 to 5 and briefly justify.

### Scoring Guidelines:
- **1 – Very Incomplete**: Minimal response, missing most key information from the source.  
- **2 – Somewhat Incomplete**: Contains some relevant information but omits MOST important details, context, or clarity.  
- **3 – Moderately Complete**: Covers a fair portion of the important information but misses a FEW significant points.  
- **4 – Mostly Complete**: Thorough and addresses most aspects, but leaves a small gap or could use minor additional detail.  
- **5 – Fully Complete**: Comprehensive, covering all important aspects of the source with no gaps in explanation.

Respond in JSON format:
{{
"score": <number>,
"justification": "<reason>"
}}
"""
)

In [43]:
evaluation_chain = (prompt_template | llm)

In [45]:
source_text = """
It is argued that the sports facilities should be increased in number to improve citizens’ health, while others claim that other initiatives are more essential to be conducted. While I support the idea that installing more sports facilities would help ordinary people to enhance their general health, I am more convinced that other effective measures should be taken. 

On the one hand, people’s general health status could have been improved greatly via exercising. It is proven that working out fastens the amount of oxygen to the brain, helping people be more concentrative and optimistic. Therefore, lack of physical exercise or insufficient physical movements one’s working performance may be impacted and less productive. For example, Hanoi citizens are reported to be healthier than they were because of the availability of exercise equipment right at the local parts. However, I believe that this measure just improves partially not whole the public’s health. 

On the other hand, there is a wide range of conducts to prevents poor health conditions. Improving diet quality is one of the effective measures that should not be neglected. A good physical health is indeed contributed by many elements, and a full nutrient meal makes consumers stronger and strongly resistant to some diseases. In Vietnam, there used to be a program of introducing milk into daily meals to deter malnutrition for children. After 2 years of conducting this campaign, the number of underweight children was minimised noticeably. Therefore, I completely advocate other solutions to implement to warrant the public’s general health. 

In conclusion, although launching more sports facilities would benefit the overall health of citizens, I think that this matter could be addressed better by other methods.
"""

In [46]:
answer_text = """
According to the author, overall health of citizens could be addressed by other methods because health is influenced by many factors, not just exercise. 
"""

# answer_text = """
# According to the author, the overall health of citizens can be addressed by methods other than launching more sports facilities because health depends on multiple factors, not solely on access to exercise. While the author acknowledges that increasing sports facilities can encourage physical activity and improve concentration, optimism, and productivity—as seen in Hanoi, where citizens became healthier due to local exercise equipment—this approach only partially improves public health. 

# """

In [47]:
result = evaluation_chain.invoke({"source": source_text, "answer": answer_text})

In [48]:
print(result.content)

{
  "score": 3,
  "justification": "The ANSWER captures the main idea that health is influenced by multiple factors beyond just exercise, which aligns with the SOURCE's argument. However, it lacks specific details and examples provided in the SOURCE, such as the benefits of exercise, the impact of diet on health, and the specific example of the milk program in Vietnam. This omission of key facts and context results in a moderately complete response."
}
