# Cosine Similarity using embeddings

In [1]:
from IPython.display import display, Markdown

When we are dealing with applications that generate text and we want to have more flexibility when evaluating the quality of the generated text.

While we can use "statistical" metrics, such as BLEU and ROUGE, they are not the only option, and most of the time they are not the best option.

In this lesson we will use cosine similarity to evaluate the quality of the generated text. As you may know, cosine similarity is a metric that measures the similarity between two vectors.

Thanks to the embedding models, we can represent the text as a vector, and then use cosine similarity to measure the similarity between the generated text and the "golden" reference.

For example, imagine we have the following text:

> Policy Lab has experimented with Artificial Intelligence (AI) in policy development with teams across government, and beyond, for a number of years. In 2019 we worked with the Department for Transportâ€™s data science team to consider the role that AI could play in improving the efficiency and effectiveness of the policy consultation process. In 2022 we used AI to create a vision for the future of Hounslow with the local authority. In 2023, we commissioned the creation of the Ecological Intelligence Agency, a speculative artefact to help experience the role AI might have in future decision-making in environmental policy. 

And we have asked the original writer to provide a summary of the text, **This is our golden reference**:

> Policy Lab has explored the use of AI in policy development across various government projects, including improving policy consultation processes, envisioning future urban planning, and investigating AI's potential role in environmental policy decision-making.

### What we want to evaluate

And we have requested different language models to provide a summary of the text using a variety of prompts.


In [2]:
reference_summary = "Policy Lab has explored the use of AI in policy development across various government projects, including improving policy consultation processes, envisioning future urban planning, and investigating AI's potential role in environmental policy decision-making."
summary_1 = "Policy Lab has explored the application of Artificial Intelligence in diverse government policy initiatives, including enhancing policy consultation, envisioning local community futures, and examining AI's potential role in environmental policy decisions."
summary_2 = "Policy Lab has explored AI applications in governmental policy creation, collaborating with various agencies to enhance consultation procedures, generate urban forecasts, and envision futuristic environmental decision-making tools."
summary_3 = "Policy Lab has been exploring the application of Artificial Intelligence in various government policy initiatives for several years, including improving policy consultations, envisioning the future of local communities, and speculating on the role of AI in environmental decision-making."
summary_4 = "Policy Lab has been using Artificial Intelligence to replace human decision-making in government policy development, creating automated systems to make key decisions without input from policymakers or the public."

display(Markdown(f"""
## Reference summary\n{reference_summary}\n
## Model 1\n{summary_1}\n
## Model 2\n{summary_2}\n
## Model 3\n{summary_3}\n
## Model 4 (worst)\n{summary_4}\n
"""))


## Reference summary
Policy Lab has explored the use of AI in policy development across various government projects, including improving policy consultation processes, envisioning future urban planning, and investigating AI's potential role in environmental policy decision-making.

## Model 1
Policy Lab has explored the application of Artificial Intelligence in diverse government policy initiatives, including enhancing policy consultation, envisioning local community futures, and examining AI's potential role in environmental policy decisions.

## Model 2
Policy Lab has explored AI applications in governmental policy creation, collaborating with various agencies to enhance consultation procedures, generate urban forecasts, and envision futuristic environmental decision-making tools.

## Model 3
Policy Lab has been exploring the application of Artificial Intelligence in various government policy initiatives for several years, including improving policy consultations, envisioning the future of local communities, and speculating on the role of AI in environmental decision-making.

## Model 4 (worst)
Policy Lab has been using Artificial Intelligence to replace human decision-making in government policy development, creating automated systems to make key decisions without input from policymakers or the public.



## Generate embeddings

The first step for this evaluation is to generate the embeddings for the reference and the generated summaries.

We will use the `sentence-transformers` library to generate the embeddings as it works locally and it is essentially free, but there are other options to generate embeddings, some of them are even offered as a service, such as OpenAI or Anthropic.


In [3]:
from sentence_transformers import SentenceTransformer

model = SentenceTransformer('paraphrase-multilingual-mpnet-base-v2')

  from tqdm.autonotebook import tqdm, trange


In [4]:
reference_embedding = model.encode(reference_summary).reshape(1,-1  )
summary_1_embedding = model.encode(summary_1).reshape(1,-1)
summary_2_embedding = model.encode(summary_2).reshape(1,-1)
summary_3_embedding = model.encode(summary_3).reshape(1,-1)
summary_4_embedding = model.encode(summary_4).reshape(1,-1)

In [5]:
similarity_1 = cosine_similarity(reference_embedding, summary_1_embedding)[0][0]
similarity_2 = cosine_similarity(reference_embedding, summary_2_embedding)[0][0]
similarity_3 = cosine_similarity(reference_embedding, summary_3_embedding)[0][0]
similarity_4 = cosine_similarity(reference_embedding, summary_4_embedding)[0][0]

print(similarity_1, similarity_2, similarity_3, similarity_4)

NameError: name 'cosine_similarity' is not defined