# Overview metrics for text generation
## *Lexical-based similarity* (ngrams-based comparison)
1. ROUGE: [Recall-Oriented Understudy for Gisting Evaluation](https://medium.com/nlplanet/two-minutes-nlp-learn-the-rouge-metric-by-examples-f179cc285499)
2. BLEU: [BiLingual Evaluation Understudy](https://www.geeksforgeeks.org/nlp-bleu-score-for-evaluating-neural-machine-translation-python/)

In [12]:
reference = ["The sun set behind the hills"]
candidate = ["The moon set behind the hills"]

In [13]:
import evaluate

### ROUGE: [Recall-Oriented Understudy for Gisting Evaluation](https://medium.com/nlplanet/two-minutes-nlp-learn-the-rouge-metric-by-examples-f179cc285499)

In [27]:
rouge_metric = evaluate.load("rouge")

# ROUGE expects plain text inputs
rouge_results = rouge_metric.compute(predictions=candidate, references=reference)

# Access ROUGE scores (no need for indexing into the result)
print(f'"{reference[0]}" <<-->> "{candidate[0]}"')
print(f"--> ROUGE-1 F1 Score: {rouge_results['rouge1']:.2f}")
print(f"--> ROUGE-2 F1 Score: {rouge_results['rouge2']:.2f}")
print(f"--> ROUGE-L F1 Score: {rouge_results['rougeL']:.2f}")

"The sun set behind the hills" <<-->> "The moon set behind the hills"
--> ROUGE-1 F1 Score: 0.83
--> ROUGE-2 F1 Score: 0.60
--> ROUGE-L F1 Score: 0.83


### BLEU: [BiLingual Evaluation Understudy](https://www.geeksforgeeks.org/nlp-bleu-score-for-evaluating-neural-machine-translation-python/)

In [26]:
bleu_metric = evaluate.load("bleu")

# BLEU expects plain text inputs
bleu_results = bleu_metric.compute(predictions=candidate, references=reference)
print(f'"{reference[0]}" <<-->> "{candidate[0]}"')
print(f"--> BLEU Score: {bleu_results['bleu'] * 100:.2f}")

"The sun set behind the hills" <<-->> "The moon set behind the hills"
--> BLEU Score: 53.73


## Semantic-based similarity (embedding-based comparison)
1. [BertSCORE](https://medium.com/@abonia/bertscore-explained-in-5-minutes-0b98553bfb71)
2. Cosine similarity 

### BertSCORE

In [25]:
bertscore = evaluate.load("bertscore")

# BERTScore
bert_result = bertscore.compute(predictions=candidate, references=reference, model_type="bert-base-uncased")
print(f'"{reference[0]}" <<-->> "{candidate[0]}"')
print(f"--> BERTScore (F1): {bert_result['f1'][0]:.2f}")

"The sun set behind the hills" <<-->> "The moon set behind the hills"
--> BERTScore (F1): 0.92


### Cosine similarity 

In [29]:
from sentence_transformers import SentenceTransformer

In [30]:
embedding_model = SentenceTransformer("all-MiniLM-L6-v2")

In [None]:
referece_embedding = embedding_model.encode(reference)
candidate_embedding = embedding_model.encode(candidate)

similarity = embedding_model.similarity(referece_embedding, candidate_embedding)
print(f'"{reference[0]}" <<-->> "{candidate[0]}"')
print(f"--> Cosine Similarity: {similarity[0][0]:.2f}")

"The sun set behind the hills" <<-->> "The moon set behind the hills"
--> Cosine Similarity: 0.71


#### Information retrival based on cosine similarity

In [39]:
query = "How many people live in London?"

In [40]:
docs = [
    "London is known for its financial district",
    "London has 9,787,426 inhabitants at the 2011 census",
    "The United Kingdom is the fourth largest exporter of goods in the world",
]

In [42]:
query_embedding = embedding_model.encode(query)
doc_embeddings = embedding_model.encode(docs)

# Compute cosine similarities
similarities = embedding_model.similarity(query_embedding, doc_embeddings).squeeze()
similarities = dict(sorted(zip(docs, similarities), key=lambda x: x[1], reverse=True))

for doc, score in similarities.items():
    print(f"[Score: {score.item():.2f}] {doc}")

[Score: 0.75] London has 9,787,426 inhabitants at the 2011 census
[Score: 0.46] London is known for its financial district
[Score: 0.26] The United Kingdom is the fourth largest exporter of goods in the world
