# Overview
This notebook walks through the current RAG retrieval method used on an Obsidian notes directory containing knowledge on growing Cannabis in Living Soil.



# Load Obsidian Notes
The `load_obsidian_notes` method of the `IngestService` class takes in a directory path to a folder within an Obsidian vault. A note within an Obsidian vault can have a lot of metadata that adds additional context to the text of the note.  Langchain's `ObsidianLoader` class will populate the metadata of nodes with the Obsidian note's frontmatter, tags, dataview fields, and file metadata.
- Frontmatter/tags transfer into the metadata of the nodes.
- The Headers provide natural splitting points for the text.

While testing, the`load_obsidian_notes` method can take in a list of strings where each string is considered a markdown file. This way, documents and nodes can be easily created as needed.


In [None]:
# Example of an obsidian note:
doc = """#Calcium_additive #raise_ph #Wollastonite #Silicon_additive #buffer_pH #Calcium
Growers  turn to Wollastonite for:
- Its **liming** capability.  Wollastonite's dissolution rate is slower than agricultural lime, offering a buffering effect against rapid pH changes. This makes Wollastonite beneficial in areas with fluctuating acidity levels.
- Adding **Silicon**.
- Adding **Calcium**.
Wollastonite's pH buffering effect and Silicon content contribute to pest control and powdery mildew suppression, although the exact mechanisms are not fully understood.

# What is Wollastonite?

## Formation
Wollastonite is formed when Limestone is subjected to heat and pressure during metamorphism if surrounding silicate minerals are present. It's chemical formula is CaSiO₃.
### Basic Reaction:
Given high pressure and high temperature:
- CaCO3 (Limestone) + SiO2 (silica) → CaSiO3 (Wollastonite) + CO2 (carbon Dioxide)
## Sources
China is the largest producer of Wollastonite. Other areas where Wollastonite is mined include the United States (although it was originally mined in California, the only active mining in the U.S. is now in New York State), India, Mexico, Canada, and Finland.

## Industrial Applications of Wollastonite

|Industry|Application|
|---|---|
|Ceramics|Smoother and more durable ceramics, reinforcement agent|
|Plastics and Rubber|Cost-effective strengthening agent|
|Paints and Coatings|Reinforcement, improved durability and impact resistance|
|Construction|Improved strength and durability of building materials, safe alternative to asbestos|
##  How Wollastonite Provides Plants with Ca and Si

Wollastonite reacts with Water and Carbon Dioxide in the soil to form Calcium Bicarbonate and Silicon Dioxide.
- CaSiO₃ (Wollastonite)+2CO₂ (carbon Dioxide,)+H₂O (Water)→Ca(HCO₃)₂ (Calcium bicarbonate)+SiO₂ (silica)

### Calcium
- Calcium bicarbonate  (Ca(HCO₃)₂) is unstable and fairly easily decomposes to Limestone (CaCO₃):
		- Ca(HCO₃)₂ (Calcium bicarbonate)→CaCO₃ (Limestone)+  CO₂ (carbon Dioxide) + H₂O (Water)

- Soils with a pH below 7 (acidic soils) contain hydrogen ions (H+). These hydrogen ions react with the Limestone (CaCO3) to form Calcium ions (Ca2+), Water (H2O), and Carbon Dioxide (CO2).
	- CaCO3 (Limestone) + 2H+ (hydrogen ions) → Ca2+ (Calcium ions) + H2O (Water) + CO2 (carbon Dioxide)
### Silicon
- Silicon Dioxide slowly breaks down into Silicic Acid, which plants absorb. This process is influenced by soil pH, temperature, and microbial activity.
	- SiO2 (Silicon Dioxide) + 2H2O (Water) → H4SiO4 (Silicic Acid)

- Plants absorb Silicic Acid from the soil solution through their roots.


"""

In [None]:
# --->: Read in the markdown files in the Obsidian vault directory

from src.doc_stats import DocStats
from src.ingest_service import IngestService
# The Directory containing the knowledge documents used by the AI to do the analysis on the soil tests.
# soil_knowledge_directory = r"G:\My Drive\Audios_To_Knowledge\knowledge\AskGrowBuddy\AskGrowBuddy\Knowledge\soil_test_knowlege"
# Load the documents
ingest_service = IngestService()
# loaded_documents = ingest_service.load_obsidian_notes(soil_knowledge_directory)
loaded_documents = ingest_service.load_obsidian_notes([doc])

DocStats.print_llama_index_docs_summary_stats(loaded_documents)

# Split the Obsidian notes into nodes
LlamaIndex's `MarkdownNodeParser` class is used to split the documents into smaller nodes.  This allows for natural splitting of the text using the headers in the markdown files.     


In [None]:
# --->: Chunk the documents
# LlamaIndex likes to call these nodes.
# limited_docs = loaded_documents[:2]
limited_docs = loaded_documents
nodes = ingest_service.chunk_text(limited_docs)
print(f"Number of documents used for chunking: {len(limited_docs)}")
print(f"Number of nodes created: {len(nodes)}")

# Set up Ollama
We are using Ollama to provide both the embedding model and the LLM.

In [None]:
# --->: Set up the local embedding model and LLM
# Set embedding model
# from llama_index.core import Settings
# from llama_index.llms.ollama import Ollama
# from llama_index.embeddings.ollama import OllamaEmbedding

# Settings.embed_model = OllamaEmbedding(
#     model_name='nomic-embed-text',
#     base_url="http://localhost:11434",
#     ollama_additional_kwargs={"mirostat": 0},
# )
# # Choose your LLM...
# Settings.llm = Ollama(model='mistral', request_timeout=1000.0)


# Set Up Multi-Index Fusion Retrieval
RAG will use three index/retrieval methods:
1. Vector Store Index. This provides a similarity search on vector embeddings of the nodes.
2. BM25 Index. This offers keyword-based retrieval, ranking documents based on term frequency and inverse document frequency.
3. Knowledge Graph Index. This captures relationships between entities and concepts, enabling context-aware and relationship-based retrieval.

The retrieved nodes are then passed through a Cohere's Reranker to rerank the nodes.  

By using a combination of these methods, we create a retrieval system that integrates semantic similarity, keyword relevance, and relational context. This approach aims to improve the likelihood of retrieving relevant information compared to using vector similarity search alone.


# Create and Persist the Vector Store Index
Vector indexes convert text into dense vector embeddings, which are numerical representations that capture the semantic meaning of the text. These vector embeddings are then used to compute similarity scores between the text and a query, enabling the identification of semantically similar documents.

Chroma is used to store and query the vector embeddings of the nodes.  Ollama embeddings are used to vectorize the text.
## Simple Testing       
For simple testing, the overhead of using a Chroma db can be avoided.  This code proved useful:
```
from llama_index.core import VectorStoreIndex
vector_index = VectorStoreIndex(nodes)
cache_dir = "./vector_index_cache"
vector_index.storage_context.persist(persist_dir=cache_dir)
```

In [None]:
# Create a vector_index.
from src.ingest_service import IngestService
ingest_service = IngestService()
vector_index = ingest_service.build_vector_index(nodes,embed_model_name='nomic-embed-text', collection_name='soil_test_knowledge', persist_dir="soil_test_knowledge")

In [8]:
# Get an existing vector_index.
from src.ingest_service import IngestService
ingest_service = IngestService()
vector_index = ingest_service.get_vector_index('soil_test_knowledge')


2024-10-09 15:20:47,570 - src.ingest_service - INFO - Attempting to get vector index for collection 'soil_test_knowledge' - c:\Users\happy\Documents\Projects\askgrowbuddy\src\ingest_service.py:123
2024-10-09 15:20:47,935 - src.ingest_service - INFO - Successfully loaded vector index for collection 'soil_test_knowledge' - c:\Users\happy\Documents\Projects\askgrowbuddy\src\ingest_service.py:133


In [None]:
# Query test
query_engine = vector_index.as_query_engine(include_text=True)
response = query_engine.query("What is the chemical formula for Wollastonite?")
print(response)

# Create and Persist a Knowledge Graph Index
Another view into documents is through a knowledge graph. Incorporating knowledge graphs into the retrieval process can help retrieve nodes or documents that might be overlooked by similarity searches.

Knowledge graphs represent entities and the intricate relationships between them, providing a structured and interconnected view of information. For example, the triplet "Wollastonite -> is made of -> Limestone and Silica" illustrates the composition of Wollastonite and its formation through metamorphism


In [9]:
import nest_asyncio
nest_asyncio.apply()

In [None]:
# Create a knowledge graph index.
from src.ingest_service import IngestService
ingest_service = IngestService()
kg_index = ingest_service.build_knowledge_graph(loaded_documents)

In [10]:
# Get a knowledge graph index.
from src.ingest_service import IngestService
ingest_service = IngestService()
kg_index = ingest_service.get_knowledge_graph()

2024-10-09 15:20:57,989 - src.ingest_service - INFO - Attempting to retrieve existing knowledge graph - c:\Users\happy\Documents\Projects\askgrowbuddy\src\ingest_service.py:180
2024-10-09 15:21:00,111 - neo4j.notifications - INFO - Received notification from DBMS server: {severity: INFORMATION} {code: Neo.ClientNotification.Schema.IndexOrConstraintAlreadyExists} {category: SCHEMA} {title: `CREATE CONSTRAINT IF NOT EXISTS FOR (e:__Node__) REQUIRE (e.id) IS UNIQUE` has no effect.} {description: `CONSTRAINT constraint_ec67c859 FOR (e:__Node__) REQUIRE (e.id) IS UNIQUE` already exists.} {position: None} for query: 'CREATE CONSTRAINT IF NOT EXISTS FOR (n:`__Node__`)\n            REQUIRE n.id IS UNIQUE;' - c:\Users\happy\Documents\Projects\askgrowbuddy\.venv\Lib\site-packages\neo4j\_sync\work\result.py:332
2024-10-09 15:21:00,120 - neo4j.notifications - INFO - Received notification from DBMS server: {severity: INFORMATION} {code: Neo.ClientNotification.Schema.IndexOrConstraintAlreadyExists} 

In [None]:
# Test...
retriever = kg_index.as_retriever(
    include_text=False,  # include source text in returned nodes, default True
)

nodes = retriever.retrieve("What is the chemical formula for Wollastonite?")

for node in nodes:
    print(node.text)

In [None]:
# Test...
query_engine = kg_index.as_query_engine(include_text=True)

response = query_engine.query("What is the chemical formula for Wolastonite?")

print(str(response))

# Create and Persist a bm25_retriever

BM, which stands for "Best Match," is a ranking algorithm that analyzes each document to determine the frequency of a given term, known as the term frequency (TF). Additionally, it considers the number of documents in the collection that contain the term, which is referred to as the inverse document frequency (IDF). The algorithm combines these two metrics to retrieve the most relevant documents.

Let's consider two example documents:
Document 1: "Wollastonite is a mineral used in agriculture for its liming capability."
Document 2: "Wollastonite is formed when Limestone is subjected to heat and pressure during metamorphism. It is used in ceramics, plastics, and construction."

BM25 would analyze the term frequency of "Wollastonite" in both documents, as well as the IDF of the term across the entire collection.

## Term Frequency (TF)
The term frequency (TF) of 'Wollastonite' in each document is calculated as follows:
Document 1: The TF for "Wollastonite" is 1/11. There are 11 words in the document, and one of them is "Wollastonite".
Document 2: The TF for "Wollastonite" is 1/21. There are 21 words in the document, and "Wollastonite" occurs once.

## Inverse Document Frequency (IDF)
IDF measures the rarity of a term, in this case 'Wollastonite'. The IDF is calculated as follows:
IDF = log(number of documents/number of documents containing term)
IDF = log(2/2) = 0
As a result, the IDF of 'Wollastonite' is 0, indicating that it is a common term.

Based on the BM25 ranking algorithm, Document 1 will be picked first because it has a higher term frequency (TF) for the term "Wollastonite" compared to Document 2. 

In [None]:
from src.ingest_service import IngestService

ingest_service = IngestService()
bm25_retriever = ingest_service.build_bm25_retriever(nodes, similarity_top_k=5)

In [11]:
# Retrieve from storage.
from llama_index.retrievers.bm25 import BM25Retriever
bm25_retriever = BM25Retriever.from_persist_dir("bm25_index")

In [None]:
# Test...
# retriever = bm25_retriever.as_retriever(
#     include_text=False,  # include source text in returned nodes, default True
# )

nodes = bm25_retriever.retrieve("What is the chemical formula for Wollastonite?")

for node in nodes:
    print(node.text)

# Set up the Retriever
Nodes of text will be retrieved from three different spaces:
- a vector index based on semantic similarity.
- a bm25 index based on keyword matching.
- a knowledge graph index based on entities and their relationships.
Duplicates are removed and then a Cohere rerank is applied.

In [12]:
# Build Retriever
from src.hybrid_graph_retriever import HybridGraphRetriever
retriever = HybridGraphRetriever(
    vector_index=vector_index,
    kg_index=kg_index,
    bm25_retriever=bm25_retriever,
)

# Run the Query and Count Tokens
We can finally ask our question! 

I used the Ollama class to query instead of a QueryEngine class.  The Chat interface of the Ollama class returns more useful information than just the answer.  The response also includes token count information.  It is far easier to get the token information this way than through the callback pattern used by LlamaIndex.  The callback pattern makes it difficult (impossible?) to use Ollama as an LLM provider for token counting.

A prompt template is used to provide a targeted prompt that includes context details, the context nodes, as well as the question.

In [19]:

from llama_index.core.schema import QueryBundle
question = "What is the chemical formula of Wollastonite?"
# Retrieve the nodes
query_bundle = QueryBundle(query_str=question)
retrieved_nodes = retriever.retrieve(query_bundle)
context = "\n".join([node.node.text for node in retrieved_nodes])
prompt = f"""
    Question: {question}
    Context: {context}

    You are an expert soil analyst that specializes in growing Cannabis. Your methods most closely align with the William Albrecht "school" of soil science.  A Cannabis grower has come to your with a Question.  Given the Context provided, your goal is to provide a 100% factual answer based solely on the Context.  Do not elaborate on the answer. Growers appreciate concise, short answers. Make sure to review your answer and think carefully on the response. You are known for your thoughtful and factual answers.
    """

Vector results: 5
BM25 results: 5
KG results: 2
All results: 12
Unique results: 8
Reranked results: 5


In [14]:
from llama_index.llms.ollama import Ollama


# Function to run a query with the retriever and custom prompt
def ask_question(query: str, model_name='mistral'):
    # Initialize the Ollama LLM
    # We are directly using the Ollama class in order to get to the tokens.
    ollama_llm = Ollama(model=model_name)
    # Use Ollama's chat method with the formatted prompt
    from llama_index.core.base.llms.types import ChatMessage, MessageRole

    messages = [ChatMessage(role=MessageRole.USER, content=query)]
    ollama_response = ollama_llm.chat(messages)

    return {
        "query": query,
        "answer": ollama_response.message.content,
        "contexts": [node.node.text for node in retrieved_nodes],
        "token_info": {
            "prompt_tokens": ollama_response.raw.get('prompt_eval_count', 0),
            "completion_tokens": ollama_response.raw.get('eval_count', 0),
            "total_tokens": ollama_response.raw.get('prompt_eval_count', 0) + ollama_response.raw.get('eval_count', 0)
        },
        "other_info": {
            "model": ollama_response.raw.get('model'),
            "total_duration": ollama_response.raw.get('total_duration'),
            "load_duration": ollama_response.raw.get('load_duration'),
            "eval_duration": ollama_response.raw.get('eval_duration')
        }
    }

In [20]:
ask_question(prompt)

{'query': '\n    Question: What is the chemical formula of Wollastonite?\n    Context: Formation\nWollastonite is formed when Limestone is subjected to heat and pressure during metamorphism if surrounding silicate minerals are present. It\'s chemical formula is CaSiO₃.\n#Calcium_additive #raise_ph #Wollastonite #Silicon_additive #buffer_pH #Calcium\nGrowers  turn to Wollastonite for:\n- Its **liming** capability.  Wollastonite\'s dissolution rate is slower than agricultural lime, offering a buffering effect against rapid pH changes. This makes Wollastonite beneficial in areas with fluctuating acidity levels.\n- Adding **Silicon**.\n- Adding **Calcium**.\nWollastonite\'s pH buffering effect and Silicon content contribute to pest control and powdery mildew suppression, although the exact mechanisms are not fully understood.\n\n# What is Wollastonite?\n\n## Formation\nWollastonite is formed when Limestone is subjected to heat and pressure during metamorphism if surrounding silicate minera

# Evaluate Responses
Now that we can ask a Large Language Model (LLM) questions based on our curated knowledge, it's time to evaluate the quality of the answers. The methods we will use to evaluate the responses include:
- Faithfulness: The answer is analyzed against the context text from the curated knowledge to assess its accuracy. If the answer deviates from the context text, it is considered partially or completely a hallucination, indicating that the model has generated information not grounded in the original text.
- Contextual relevance: how relevant the retrieved context is to the original query
- Answer relevance: how relevant the LLM's response is to a given user's query


In [None]:

import re

class SimpleFaithfulness:
    def __init__(self):
        pass

    def evaluate(self, question, answer, context):
        prompt = f"""
        Question: {question}
        Answer: {answer}
        Context: {context}

        Evaluate the faithfulness of the answer based on the given context. Consider the following:
        1. Does the answer contain information not present in the context?
        2. Does the answer contradict any information in the context?
        3. Is the answer a fair representation of the information in the context?

        Respond with:
        1. A score from 0 to 1, where 0 is completely unfaithful and 1 is completely faithful.
        2. A brief explanation of your scoring.
        3. Any hallucinations or discrepancies found, if any.

        Format your response exactly as follows:
        Score: [Your score here]
        Explanation: [Your explanation here]
        Hallucinations: [List any hallucinations or discrepancies, or 'None' if none found]
        """

        try:
            response = ask_question(prompt)
            return self._parse_response(response)
        except Exception as e:
            print(f"An error occurred during evaluation: {str(e)}")
            return {"faithfulness": 0, "explanation": "Error occurred", "hallucinations": "Unable to evaluate"}

    def _parse_response(self, response):
        score_match = re.search(r'Score:\s*([\d.]+)', response['answer'])
        explanation_match = re.search(r'Explanation:\s*(.+?)(?:\n|$)', response['answer'], re.DOTALL)
        hallucinations_match = re.search(r'Hallucinations:\s*(.+?)(?:\n|$)', response['answer'], re.DOTALL)

        score = float(score_match.group(1)) if score_match else 0
        explanation = explanation_match.group(1).strip() if explanation_match else "No explanation provided"
        hallucinations = hallucinations_match.group(1).strip() if hallucinations_match else "Unable to determine"

        return {
            "faithfulness": score,
            "explanation": explanation,
            "hallucinations": hallucinations
        }

# Usage example
simple_faithfulness = SimpleFaithfulness()
# Example usage of simple_faithfulness.evaluate()
question = "What are the benefits of using wollastonite in agriculture?"
# answer = "Wollastonite is beneficial in agriculture due to its liming capability, silicon content, and calcium content. It can help improve soil pH and provide essential nutrients to plants."
answer = "Wollastonite is a calcium silicate mineral used in industrial applications like ceramics as well as in agriculture for its calcium and silicon content."
context = "Wollastonite is a calcium silicate mineral. It is used in agriculture for its liming capability, silicon content, and calcium content. These properties can help improve soil structure and provide nutrients to plants."

result = simple_faithfulness.evaluate(question, answer, context)
print(f"Faithfulness score: {result['faithfulness']}")
print(f"Explanation: {result['explanation']}")
print(f"Hallucinations: {result['hallucinations']}")


In [None]:

def parse_response(response):
    score_match = re.search(r'Score:\s*([\d.]+)', response['answer'])
    explanation_match = re.search(r'Explanation:\s*(.+?)(?:\n|$)', response['answer'], re.DOTALL)

    score = float(score_match.group(1)) if score_match else 0
    explanation = explanation_match.group(1).strip() if explanation_match else "No explanation provided"

    return {
        "score": score,
        "explanation": explanation
    }

In [None]:
def evaluate_answer(question, ground_truth, generated_answer):
        prompt = f"""
        Question: {question}
        Ground Truth: {ground_truth}
        Generated Answer: {generated_answer}

        Evaluate the relevancy and correctness of the generated answer compared to the ground truth.
        Consider the following:
        1. How well does the generated answer address the question?
        2. How accurate is the generated answer compared to the ground truth?
        3. Are there any missing or extra pieces of information in the generated answer?

        Respond with:
        1. A score from 0 to 1, where 0 is completely irrelevant/incorrect and 1 is perfectly relevant/correct.
        2. A brief explanation of your scoring.

        Format your response exactly as follows:
        Score: [Your score here]
        Explanation: [Your explanation here]
        """

        try:
            response = ask_question(prompt)
            return parse_response(response)
        except Exception as e:
            print(f"An error occurred during evaluation: {str(e)}")
            return {"relevancy": 0, "explanation": "Error occurred"}


# Example usage
question = "What is wollastonite and how is it used in agriculture?"
ground_truth = "Wollastonite is a calcium silicate mineral. It is used in agriculture for its liming capability, silicon content, and calcium content. These properties help improve soil pH, enhance soil structure, and provide essential nutrients to plants."
generated_answer = "Wollastonite is a mineral used in agriculture for its calcium content and ability to improve soil structure."

result = evaluate_answer(question, ground_truth, generated_answer)
print(f"Question: {question}")
print(f"Ground True: {ground_truth}")
print(f"Generated Answer: {generated_answer}")
print(f"Answer Relevancy score: {result['score']}")
print(f"Explanation: {result['explanation']}")

In [None]:
def evaluate_context(query, context):

    prompt = f"""
    Query: {query}
    Context: {context}

    Evaluate if the retrieved context is relevant to the query. Consider the following:
    1. Does the context match the subject matter of the query?
    2. Can the context be used to fully answer the query?

    Respond with:
    1. A score from 0 to 1, where 0 is completely irrelevant and 1 is highly relevant.
    2. A brief explanation of your scoring.

    Format your response exactly as follows:
    Score: [Your score here]
    Explanation: [Your explanation here]
    """

    try:
        response = ask_question(prompt)
        return parse_response(response)
    except Exception as e:
        print(f"An error occurred during evaluation: {str(e)}")
        return {"relevancy": 0, "explanation": "Error occurred"}



# Usage example

query = "What are the benefits of using wollastonite in agriculture?"
context = "Wollastonite is a calcium silicate mineral. It is used in agriculture for its liming capability, silicon content, and calcium content. These properties can help improve soil structure and provide nutrients to plants."

result = evaluate_context(query, context)
print(f"Query: {query}")
print(f"Context: {context}")
print(f"Score: {result['score']}")
print(f"Explanation: {result['explanation']}")

In [None]:
def evaluate_faithfulness(question: str, answer: str, context: str) -> dict:

    prompt = f"""
    Question: {question}
    Answer: {answer}
    Context: {context}

    Evaluate the faithfulness of the answer based on the given context. Consider the following:
    1. Does the answer contain information not present in the context?
    2. Does the answer contradict any information in the context?
    3. Is the answer a fair representation of the information in the context?

    Respond with:
    1. A score from 0 to 1, where 0 is completely unfaithful and 1 is completely faithful.
    2. A brief explanation of your scoring. Include and hallucinations or discrepencies found, if any.

    Format your response exactly as follows:
    Score: [Your score here]
    Explanation: [Your explanation here]
    """

    try:
        response = ask_question(prompt)
        return parse_response(response)
    except Exception as e:
        print(f"An error occurred during evaluation: {str(e)}")
        return {"faithfulness": 0, "explanation": "Error occurred", "hallucinations": "Unable to evaluate"}

# Usage example
question = "What is wollastonite?"
answer = "Wollastonite is a calcium silicate mineral used in agriculture for its liming capability and silicon content."
context = "Wollastonite is a calcium silicate mineral. It is used in agriculture for its liming capability, silicon content, and calcium content."

result = evaluate_faithfulness(question, answer, context)
print(f"Faithfulness score: {result['score']}")
print(f"Explanation: {result['explanation']}")

In [None]:
from langchain_ollama.llms import OllamaLLM
from ragas.metrics import Faithfulness
from datasets import Dataset

def evaluate_faithfulness(question, answer, contexts):
    # Create an instance of OllamaLLM
    ollama_llm = OllamaLLM(model="mistral")

    # Create an instance of the Faithfulness metric with your LLM
    custom_faithfulness = Faithfulness(llm=ollama_llm)

def evaluate_single_response(question, answer, contexts):
    dataset = Dataset.from_dict({
        "user_input": question,
        "response": answer,
        "context": contexts,
        "ground_truths": [""]  # Empty string instead of empty list
    })

    # Use the custom faithfulness metric
    result = custom_faithfulness.score(dataset)

    return result['faithfulness'][0]

# Example usage
question = "What is wollastonite?"
answer = "Wollastonite is a calcium silicate mineral used in agriculture for its liming capability and silicon content."
contexts = "Wollastonite is a calcium silicate mineral. It is used in agriculture for its liming capability, silicon content, and calcium content."

faithfulness_score = evaluate_single_response(question, answer, contexts)
print(f"Faithfulness score: {faithfulness_score}")

In [None]:
evaluation_data = [
    {
        "question": "What is wollastonite and how does it relate to plant nutrition?",
        "ground_truth": "Wollastonite is a calcium silicate mineral used in agriculture for its liming capability, silicon content, and calcium content. It relates to plant nutrition by providing calcium and silicon, buffering soil pH, and potentially contributing to pest control and powdery mildew suppression."
    },
    # Add more question-answer pairs here
]