# Adding a semantic search engine

## 📌 **Overview**
A **semantic search engine** retrieves documents **based on meaning, not just keywords**.  
It ranks results **by relevance** using **text embeddings** and **cosine similarity**.

---

## **🛠️ Steps**
   - Documents and queries are transformed into numerical vectors using **SBERT** or another model.  
   - The query is **converted into an embedding**.  
   - We compute **cosine similarity** between the query and all document embeddings.
   - Documents with the **highest similarity scores** are ranked and displayed.  
   - Even if a document doesn’t contain exact words from the query, it can still be retrieved if the meaning is similar.


In [1]:
import importlib
import embedding_models
importlib.reload(embedding_models)
from embedding_models import EmbeddingModel
import tqdm as notebook_tqdm

texts = [
    "Artificial intelligence is revolutionizing finance",
    "Stock markets drop after FED announcement",
    "Tesla unveils new battery with record range",
    "Apple launches iPhone 16 with new features",
    "The job market in France is seeing an increase in hiring",
    "AI and cybersecurity: A new era of data protection",
    "Artificial intelligence is revolutionizing economy",
]

  from .autonotebook import tqdm as notebook_tqdm


In [2]:
model = EmbeddingModel(method="sbert")
model.fit(texts)

In [3]:
# Index the texts' embeddings so I can look them up
model.index_texts(texts)

Indexed 7 texts.


In [4]:
# Semantic research
query = "How is AI changing the finance industry?"
results = model.search_similar(query, how_much_results=3)

In [5]:
print("\n🔍 Search results:\n")
for text, score in results:
    print(f"{text} (Similarity: {score:.2f})")


🔍 Search results:

Artificial intelligence is revolutionizing finance (Similarity: 0.69)
Artificial intelligence is revolutionizing economy (Similarity: 0.54)
AI and cybersecurity: A new era of data protection (Similarity: 0.39)


In [6]:
# makes sense