PDF → Text → Chunking → Embedding → Indexing → Q&A → Model Save/Load

In [1]:
# Extract text from PDF
import pymupdf

def extract_text_from_pdf(pdf_path):
    doc = pymupdf.open(pdf_path)
    return "\n".join([page.get_text() for page in doc])

text = extract_text_from_pdf("the-illusion-of-thinking.pdf")

In [2]:
# Chunk text

def chunk_text(text, chunk_size = 500000):
    return [text[i:i+chunk_size] for i in range(0, len(text), chunk_size)]

chunks = chunk_text(text=text)

In [3]:
# Generate Embeddings

from sentence_transformers import SentenceTransformer

model = SentenceTransformer("intfloat/e5-large-v2")

# IMPORTANT: e5 models expect "passage: ..." or "query: ..." prefixes
embeddings = model.encode(["passage: " + chunk for chunk in chunks], show_progress_bar=True)

  from .autonotebook import tqdm as notebook_tqdm
Batches: 100%|██████████| 1/1 [00:00<00:00,  5.23it/s]


In [4]:
import numpy as np

In [5]:
import faiss
import numpy as np
import pickle

# Determine dimensionality from the first vector
dimension = embeddings[0].shape[0]

# Create FAISS index
index = faiss.IndexFlatL2(dimension)

# Convert to correct format
embedding_matrix = np.array(embeddings).astype("float32")
index.add(embedding_matrix)

# Save index and chunks
faiss.write_index(index, "faiss.index")
with open("doc_chunks.pkl", "wb") as f:
    pickle.dump(chunks, f)


In [6]:
# Questioning Answering over Retrieved Context
import numpy as np

query_embedding = model.encode("query: What is the architecture described?")
query_embedding = np.array([query_embedding]).astype("float32")

D, I = index.search(query_embedding, k=1)  # D = distances, I = indices
context = chunks[I[0][0]]

In [7]:
from sentence_transformers import SentenceTransformer
import numpy as np

query = "What is the architecture described?"
query_embedding = model.encode("query: " + query)
query_embedding = np.array([query_embedding]).astype("float32")

D, I = index.search(query_embedding, k=1)
context = chunks[I[0][0]]

print("Most Relevant Context:\n", context)

Most Relevant Context:
 The Illusion of Thinking:
Understanding the Strengths and Limitations of Reasoning Models
via the Lens of Problem Complexity
Parshin Shojaee∗†
Iman Mirzadeh∗
Keivan Alizadeh
Maxwell Horton
Samy Bengio
Mehrdad Farajtabar
Apple
Abstract
Recent generations of frontier language models have introduced Large Reasoning Models
(LRMs) that generate detailed thinking processes before providing answers. While these models
demonstrate improved performance on reasoning benchmarks, their fundamental capabilities, scal-
ing properties, and limitations remain insufficiently understood. Current evaluations primarily fo-
cus on established mathematical and coding benchmarks, emphasizing final answer accuracy. How-
ever, this evaluation paradigm often suffers from data contamination and does not provide insights
into the reasoning traces’ structure and quality. In this work, we systematically investigate these
gaps with the help of controllable puzzle environments that allow preci

In [8]:
query = "Who are the authors?"
query_embedding = model.encode("query: " + query)
query_embedding = np.array([query_embedding]).astype("float32")

D, I = index.search(query_embedding, k=1)
context = chunks[I[0][0]]

print("Most Relevant Context:\n", context)

Most Relevant Context:
 The Illusion of Thinking:
Understanding the Strengths and Limitations of Reasoning Models
via the Lens of Problem Complexity
Parshin Shojaee∗†
Iman Mirzadeh∗
Keivan Alizadeh
Maxwell Horton
Samy Bengio
Mehrdad Farajtabar
Apple
Abstract
Recent generations of frontier language models have introduced Large Reasoning Models
(LRMs) that generate detailed thinking processes before providing answers. While these models
demonstrate improved performance on reasoning benchmarks, their fundamental capabilities, scal-
ing properties, and limitations remain insufficiently understood. Current evaluations primarily fo-
cus on established mathematical and coding benchmarks, emphasizing final answer accuracy. How-
ever, this evaluation paradigm often suffers from data contamination and does not provide insights
into the reasoning traces’ structure and quality. In this work, we systematically investigate these
gaps with the help of controllable puzzle environments that allow preci