# Semantic Search and Retrieval-Augmented Generation



# 1. Applications of Semantic Search 

**Semantic search** is when we want to retrieve results that have a similar **meaning** to our query.

Contrast this with *keyword search*, where we retrieve results that have the same **words** as our query.

As an example, say our query is "dog".
A keyword search might retrieve results related to dogs, hot dogs, and dogging.
Whereas a semantic search would results related to dogs, chihuahuas, dachshunds and puppies.

Semantic search is helpful in the contexts of:

- **Dense retrieval** - where we want to retrieve a small number of relevant documents from a large corpus based on the nearest neighbours in embedding space.
- **Re-ranking** - where we already have a list of results from another step in the pipelines (e.g. keyword search) and want to re-order the results based on relevance.
-  **Retrieval-Augmented Generation (RAG)**, where a user's prompt to an LLM is used to determine
related documents, and those documents are fed to the LLM to generate an improved response.



# 2. Dense Retrieval

When we create an embedding for a document, we are essentially creating a vector that in some way encodes its meaning.
Each dimension corresponds to some aspect of the document. We can think of N-dimensional embedding vectors as points in N-dimensional space.

Points which are close together are similar in some way, and points which are far apart are different.

This is the premise behind semantic search systems. We **embed the query**, then find the **documents which are the nearest neighbors** to the query in the embedding space.
These are our search results as they should be similar in meaning.

![Query embedding](query_embedding.png)

Some points worth noting on this nearest neighbours approach:

- We may want to apply a minimum threshold to the similarity score to account for cases when there are no relevant documents in the corpus.
- Queries and their answers aren't always semantically similar. This is why language models need ot be trained on question-answer pairs.

The steps to create a dense retrieval system are:

1. Collect and pre-process the text data.
2. **Chunk** it into *documents*
3. Create **embeddings** for each document
4. Build a search index
5. Search for results (nearest neighbours)


# TODO: from page 323

In [1]:
import faiss

# References

- Chapter 8 of Hands-On Large Language Models by Jay Alammar & Marten Grootendoorst
