# Evaluate with ROUGE

In [1]:
# Define RetrievalQA chain
import utils  # Import the whole module so you can reload it
import importlib

importlib.reload(utils)

from utils import init_qa_chain
qa_chain = init_qa_chain()
print("✅ RetrievalQA chain created")

  embedding = OpenAIEmbeddings(openai_api_key=keys["OPENAI_API_KEY"])


✅ RetrievalQA chain created


  llm = ChatOpenAI(temperature=0, model="gpt-4", openai_api_key=keys["OPENAI_API_KEY"])


In [2]:
import evaluate
from nltk.tokenize import sent_tokenize
import nltk

# Ensure required tokenizer
nltk.download('punkt')

# Load ROUGE metric
rouge = evaluate.load("rouge")

def compute_rouge(generated_text, reference_text):
    """
    Compute ROUGE score between a reference answer and a generated answer.
    """
    # Ensure text is split into sentences with newlines
    generated = "\n".join(sent_tokenize(generated_text.strip()))
    reference = "\n".join(sent_tokenize(reference_text.strip()))

    result = rouge.compute(
        predictions=[generated],
        references=[reference],
        use_stemmer=True,
    )
    return result


[nltk_data] Downloading package punkt to /Users/test/nltk_data...
[nltk_data]   Package punkt is already up-to-date!


In [3]:
def answer_with_rouge(input_text: str, reference_answer: str = None):
    print(f"\n📝 Query: {input_text}")
    result = qa_chain({"query": input_text})
    answer = result["result"]
    sources = list(set(doc.metadata.get("source_file", "Unknown") for doc in result["source_documents"]))
    print(f"🗒️ Retrieved {len(sources)} source documents")
    
    output = f"{answer}\n\nSources:\n" + "\n".join(sources)
    
    if reference_answer:
        rouge_scores = compute_rouge(answer, reference_answer)
        print(f"🔴 ROUGE scores:")
        for k, v in rouge_scores.items():
            print(f"  {k.upper()}: {v:.4f}")
        output += "\n\nROUGE scores:\n" + "\n".join(f"{k.upper()}: {v:.4f}" for k, v in rouge_scores.items())
        
    return output


### Testing with references

In [4]:
query = "What is at the center of a black hole?"

reference = "The singularity at the center of a black hole is the ultimate no man's land: a place where matter is compressed down to an infinitely tiny point"
response = answer_with_rouge(query, reference)
print(response)


📝 Query: What is at the center of a black hole?


  result = qa_chain({"query": input_text})


🗒️ Retrieved 5 source documents
🔴 ROUGE scores:
  ROUGE1: 0.3614
  ROUGE2: 0.1481
  ROUGEL: 0.2651
  ROUGELSUM: 0.2651
The center of a black hole is thought to contain a singularity, a point where gravity is so intense that spacetime curves infinitely and the laws of physics as we know them cease to operate. However, this is still a topic of ongoing research and there is much we don't know about black holes.

Sources:
52 - The Sounds of Space ｜ StarTalk Live! at Guild Hall.en.txt
31 - Neil and a Particle Physicist Discuss Why There’s Something Instead of Nothing.en.txt
06 - Tackling the Biggest Unsolved Problems in Math with 3Blue1Brown.en.txt
12 - Unpacking Einstein’s Greatest Papers, with Janna Levin.en.txt
41 - The Science of Interstellar with Science Advisor, Kip Thorne.en.txt

ROUGE scores:
ROUGE1: 0.3614
ROUGE2: 0.1481
ROUGEL: 0.2651
ROUGELSUM: 0.2651


In [8]:
query = "What is Bill Nye Famous for?"

reference = "He is best known as the host of the science education television show Bill Nye the Science Guy (1993–1999) and as a science educator in pop culture"
response = answer_with_rouge(query, reference)
print(response)


📝 Query: What is Bill Nye Famous for?
🗒️ Retrieved 1 source documents
🔴 ROUGE scores:
  ROUGE1: 0.4074
  ROUGE2: 0.1923
  ROUGEL: 0.3333
  ROUGELSUM: 0.3333
Bill Nye is famous for his educational television program, "Bill Nye the Science Guy," where he made science accessible and entertaining for children and adults alike.

Sources:
40 - Neil deGrasse Tyson and Bill Nye Catch Up.en.txt

ROUGE scores:
ROUGE1: 0.4074
ROUGE2: 0.1923
ROUGEL: 0.3333
ROUGELSUM: 0.3333


In [9]:
query = "Who is Sara Imari Walker?"

reference = "Sara Imari Walker is an American theoretical physicist and astrobiologist with research interests in the origins of life, astrobiology, physics of life, emergence, complex and dynamical systems, and artificial life"
response = answer_with_rouge(query, reference)
print(response)


📝 Query: Who is Sara Imari Walker?
🗒️ Retrieved 2 source documents
🔴 ROUGE scores:
  ROUGE1: 0.3333
  ROUGE2: 0.2353
  ROUGEL: 0.3333
  ROUGELSUM: 0.3333
Sara Imari Walker is an astrobiologist.

Sources:
50 - Neil & Sara Imari Walker Discuss New Theories on The Origins of Life in the Universe.en.txt
06 - Tackling the Biggest Unsolved Problems in Math with 3Blue1Brown.en.txt

ROUGE scores:
ROUGE1: 0.3333
ROUGE2: 0.2353
ROUGEL: 0.3333
ROUGELSUM: 0.3333


In [10]:
query = "What is Sara Imari Walker known for?"

reference = "Sara Imari Walker is recognized for developing assembly theory, an informational framework for identifying life based on its complexity. "
response = answer_with_rouge(query, reference)
print(response)


📝 Query: What is Sara Imari Walker known for?
🗒️ Retrieved 3 source documents
🔴 ROUGE scores:
  ROUGE1: 0.3636
  ROUGE2: 0.1290
  ROUGEL: 0.2424
  ROUGELSUM: 0.2424
The text doesn't provide information on what Sarah Imari Walker is known for.

Sources:
33 - The Elements of Marie Curie with Dava Sobel.en.txt
50 - Neil & Sara Imari Walker Discuss New Theories on The Origins of Life in the Universe.en.txt
06 - Tackling the Biggest Unsolved Problems in Math with 3Blue1Brown.en.txt

ROUGE scores:
ROUGE1: 0.3636
ROUGE2: 0.1290
ROUGEL: 0.2424
ROUGELSUM: 0.2424
