# 03 ‚Äì Semantic Search Demo

This notebook:
- Loads the FAISS index created in Step 2 notebook.
- Defines helper functions for semantic paper search.
- Allows interactive querying to inspect retrieval quality.
- Maybe: Optionally filters results by year or category.

In [1]:
import os
from dotenv import load_dotenv
from langchain_openai import OpenAIEmbeddings
from langchain_community.vectorstores import FAISS

load_dotenv()

OPENAI_API_KEY = os.getenv("OPENAI_API_KEY")
assert OPENAI_API_KEY, "Please set your OPENAI_API_KEY in a .env file"

FAISS_PATH = "../data/faiss_index"
EMBEDDING_MODEL = "text-embedding-3-small"

Load FAISS Index

In [2]:
embeddings = OpenAIEmbeddings(model=EMBEDDING_MODEL, api_key=OPENAI_API_KEY)

# Load stored FAISS index
vectorstore = FAISS.load_local(
    FAISS_PATH, 
    embeddings, 
    allow_dangerous_deserialization=True
)

print("FAISS index loaded successfully.")

FAISS index loaded successfully.


Search Function Definition

In [3]:
def search_papers(query: str, k: int = 5, min_year: int = None, category_filter: str = None):
    """
    Perform semantic search and optionally filter by publication year or category.
    """
    results = vectorstore.similarity_search(query, k=50)  # retrieve top 50 then filter
    
    filtered = []
    for doc in results:
        meta = doc.metadata
        if min_year and meta.get("year"):
            if int(meta["year"]) < min_year:
                continue
        if category_filter and meta.get("category_code"):
            if category_filter.lower() not in meta["category_code"].lower():
                continue
        filtered.append(doc)
        if len(filtered) >= k:
            break

    return filtered

Example Queries to check function works

In [4]:
examples = ["recent advances in recommender systems"]

for q in examples:
    print(f"\nüîç Query: {q}")
    results = search_papers(q, k=3)
    for doc in results:
        print("‚Ä¢ TITLE:", doc.metadata["title"])
        print("  CATEGORY:", doc.metadata["category_code"])
        print("  YEAR:", doc.metadata["year"])
        print("  SNIPPET:", doc.page_content[:200].replace("\n", " "), "...")
        print("-" * 100)


üîç Query: recent advances in recommender systems
‚Ä¢ TITLE: Recommender Systems: A Primer
  CATEGORY: cs.IR
  YEAR: 2023
  SNIPPET: Recommender Systems: A Primer. Personalized recommendations have become a common feature of modern online services, including most major e-commerce sites, media platforms and social networks. Today, d ...
----------------------------------------------------------------------------------------------------
‚Ä¢ TITLE: A Troubling Analysis of Reproducibility and Progress in Recommender
  Systems Research
  CATEGORY: cs.IR
  YEAR: 2019
  SNIPPET: A Troubling Analysis of Reproducibility and Progress in Recommender   Systems Research. The design of algorithms that generate personalized ranked item lists is a central topic of research in the fiel ...
----------------------------------------------------------------------------------------------------
‚Ä¢ TITLE: Sequence-aware item recommendations for multiply repeated user-item
  interactions
  CATEGORY: cs.IR
 

In [5]:
query = "deep learning for computer vision"
results = search_papers(query, k=5, min_year=2020)

print(f"\n{len(results)} recent papers found for query: '{query}' (year ‚â• 2020)\n")

for doc in results:
    print("‚Ä¢", doc.metadata["title"], f"({doc.metadata['year']})")


5 recent papers found for query: 'deep learning for computer vision' (year ‚â• 2020)

‚Ä¢ Deep Reinforcement Learning in Computer Vision: A Comprehensive Survey (2021)
‚Ä¢ Deep Learning Computer Vision Algorithms for Real-time UAVs On-board
  Camera Image Processing (2022)
‚Ä¢ Classic versus deep learning approaches to address computer vision
  challenges (2021)
‚Ä¢ Universal Object Detection with Large Vision Model (2022)
‚Ä¢ On The State of Data In Computer Vision: Human Annotations Remain
  Indispensable for Developing Deep Learning Models (2021)


In [6]:
query = "natural language processing"
results = search_papers(query, k=5, category_filter="cs.CL")

print(f"\n{len(results)} NLP papers (category=cs.CL):\n")
for doc in results:
    print("‚Ä¢", doc.metadata["title"], f"‚Äì {doc.metadata['category_code']}")


5 NLP papers (category=cs.CL):

‚Ä¢ Natural Language Processing: State of The Art, Current Trends and
  Challenges ‚Äì cs.CL
‚Ä¢ New Approaches for Natural Language Understanding based on the Idea that
  Natural Language encodes both Information and its Processing Procedures ‚Äì cs.CL
‚Ä¢ Unnatural Language Processing: Bridging the Gap Between Synthetic and
  Natural Language Data ‚Äì cs.CL
‚Ä¢ Evolution of Natural Language Processing Technology: Not Just Language
  Processing Towards General Purpose AI ‚Äì cs.CL
‚Ä¢ Natural Language Processing for Dialects of a Language: A Survey ‚Äì cs.CL


In [7]:
# Inspect similarity scores using similarity_search_with_score
query = "recommender systems"
docs_scores = vectorstore.similarity_search_with_score(query, k=5)

for doc, score in docs_scores:
    print(f"{doc.metadata['title'][:70]}...  ‚Üí  similarity score = {score:.4f}\n")

Recommender Systems: A Primer...  ‚Üí  similarity score = 0.7242

A Survey of Recommender System Techniques and the Ecommerce Domain...  ‚Üí  similarity score = 0.8289

A Machine-Learning Item Recommendation System for Video Games...  ‚Üí  similarity score = 0.8503

An Artificial Immune System as a Recommender System for Web Sites...  ‚Üí  similarity score = 0.8516

Recommending with Recommendations...  ‚Üí  similarity score = 0.8521

