# Semantic Search

This notebook shows hot to make use of pre-trained Transformer models for document embeddings.
These models are trained using *supervised* or *unsupervised* approaches to encode text at sentence-level, paragraph-level or even entire document-level.
They can be applied to many different tasks
- Semantic similarity
- Paraphrase detection
- Natural Langage Inference
- Question Answering
- Information retreival
- Clustering

Most of the material is based on the tuorial and examples on [Sentence Transformers](https://www.sbert.net/docs/)

## Prepare environment

!pip install -U sentence-transformers

## Sentence transformer models

https://www.sbert.net/examples/applications/semantic-search/README.html

https://www.sbert.net/examples/applications/retrieve_rerank/README.html

In [None]:
from sentence_transformers import SentenceTransformer, util
import torch

#### Embedding text

In [None]:
corpus = [
    'A man is eating food.',
    'A man is eating a piece of bread.',
    'The girl is carrying a baby.',
    'A man is riding a horse.',
    'A woman is playing violin.',
    'Two men pushed carts through the woods.',
    'A man is riding a white horse on an enclosed ground.',
    'A monkey is playing drums.',
    'A cheetah is running behind its prey.'
]

In [None]:
corpus_embeddings = embedder.encode(corpus, convert_to_tensor=True)
corpus_embeddings.size()

Speed-up by by nomalising the embeddings

In [None]:
corpus_embeddings = corpus_embeddings.to('cuda')
corpus_embeddings = util.normalize_embeddings(corpus_embeddings)

query_embeddings = query_embeddings.to('cuda')
query_embeddings = util.normalize_embeddings(query_embeddings)
hits = util.semantic_search(query_embeddings, corpus_embeddings, score_function=util.dot_score)

#### Visualising the embedding space

#### Cosine similarity between embeddings

We can compute the similarity of embedding pairs in our corpus

In [None]:
similarity_matrix = util.cos_sim(corpus_embeddings, corpus_embeddings)[0].reshape(corpus_embeddings.size(1), corpus_embeddings.size(1))
similarity_matrix.size()

### Cross-encoder

https://www.sbert.net/examples/applications/cross-encoder/README.html

## Semantic search

### Load data

### Computing document-query similarity

In [None]:
queries = [
    'A man is eating pasta.', 
    'Someone in a gorilla costume is playing a set of drums.', 
    'A cheetah chases prey on across a field.'
]

Find the closest 5 sentences of the corpus for each query sentence based on cosine similarity

In [None]:

top_k = min(5, len(corpus))
for query in queries:
    query_embedding = embedder.encode(query, convert_to_tensor=True)

    # We use cosine-similarity and torch.topk to find the highest 5 scores
    cos_scores = util.cos_sim(query_embedding, corpus_embeddings)[0]
    top_results = torch.topk(cos_scores, k=top_k)

    print("\n\n======================\n\n")
    print("Query:", query)
    print("\nTop 5 most similar sentences in corpus:")

    for score, idx in zip(top_results[0], top_results[1]):
        print(corpus[idx], "(Score: {:.4f})".format(score))

#### Cross-encoder

### Approximated Nearest Neighbor search

When we use a bi-encoder model, we can pre-compute the embeddings in our data set and index them to speed-up the search.
There are techniques for Approximate Nearest Neighbor (ANN), which use clustering to index the embedding space and speed-up the search process.

Note that this is not applicable to cross-encoder models, which encode document and query together.

Sentence transformers support different libraries for ANN:
- [HNSWLIB](https://github.com/nmslib/hnswlib/)
- [Annoy](https://github.com/spotify/annoy)
- [FAISS](https://github.com/spotify/annoy)

Let's try indexing with **HNSWLIB** and compare search time

https://www.sbert.net/examples/applications/semantic-search/README.html#approximate-nearest-neighbor

### Re-ranking

Cross-encoder models empirically yield better results, but are slow at inference.
Bi-encoder models, on the other side, are less precise, but are also faster at inference.

We can take advantage of both: We can do a first search with bi-encoder models and then re-rank the top-$k$ results with a cross-encoder.
We call this approach *retrieve and re-rank*.

https://www.sbert.net/examples/applications/retrieve_rerank/README.html

## Question answering

### QA data

#### Load

#### Prepare

### Training

### Search response

## Retrieval-based chatbots

### Dialogue data

#### Load 

#### Index

### Search for response

We can appraoch the problem of searching for a response in two ways:
1. Search for a similar context (i.e., last message in dialogue history) and return the associated response.
2. Search for a response as the most similar messagr to the context.

Let's try chatting with our retreival system

In [None]:
print("Press [Ctrl-C] to stop")
# Initialise dialogue history
dialogue_history = ["Hello, how are you?"]
print(f"Chatbot: {dialogue_hisotry[0]}")
# Keep talking until stop
running = True
while running:
    try:
        # Read user message
        message = input("User: ")
        # Append message to dialogue history
        dialogue_history.append(message)
        # Search for a chatbot response
        response = ...
        # Append chatbot response to dialogue history
        dialogue_history.append(message)
        # Print chatbot response
        print(f"Chatbot: {response}")
    except KeyboardInterrupt():
        running = False