# 📚 Ask This Book: Kafka’s *The Metamorphosis*

This notebook demonstrates a LangChain-based RAG (Retrieval-Augmented Generation) system applied to a classic novel.

🧠 It loads *The Metamorphosis* by Franz Kafka, splits it into semantic chunks, embeds them into a FAISS vector index, and allows users to ask questions using a local LLM (`flan-t5-large`) — no API key required.

---

## ⚙️ How to Use This Notebook with Other Books

To adapt this notebook to a different `.txt` book:

1. Replace the uploaded `.txt` file with your new book.
2. Update the file path in **Step 2** like this:
   ```python
   loader = TextLoader("/content/YourNewBook.txt")

---

## 💡 How to Ask Questions to This Notebook

This system uses a lightweight LLM (`flan-t5-large`) and retrieves context chunks from the book *The Metamorphosis* using FAISS and LangChain.

Because this is an offline, reproducible demo with limited memory, follow these tips:

### ✅ Ask Like This
- "What happens to Gregor Samsa?"
- "Describe Gregor’s transformation."
- "What job did Gregor have before he changed?"

### ❌ Avoid Asking
- Why-questions (e.g., "Why did he transform?")
- Symbolic or interpretive prompts (e.g., "What does his transformation represent?")
- List formats (e.g., "List 3 events...")
- Long compound queries

> 📏 Try to keep questions short and fact-based. For deep literary analysis, consider using an advanced API-based model (like GPT-4 or Mistral) instead.

---

In [8]:
# Step 1: Install LangChain, FAISS, and Hugging Face tools
!pip install --quiet langchain langchain-community langchain-openai openai faiss-cpu sentence-transformers tiktoken

# Step 1b: Import core components
from langchain_community.document_loaders import TextLoader
from langchain.text_splitter import RecursiveCharacterTextSplitter
from langchain.embeddings import HuggingFaceEmbeddings
from langchain.vectorstores import FAISS
from langchain.chains import RetrievalQA
from langchain.llms import HuggingFacePipeline

from transformers import AutoTokenizer, AutoModelForSeq2SeqLM, AutoModelForCausalLM, pipeline
import os

[?25l   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m0.0/62.4 kB[0m [31m?[0m eta [36m-:--:--[0m[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m62.4/62.4 kB[0m [31m3.6 MB/s[0m eta [36m0:00:00[0m
[?25h

In [3]:
# Step 2: Load the Metamorphosis text file
file_path = "/content/Metamorphosis by Franz Kafka.txt"
loader = TextLoader(file_path)
documents = loader.load()

# Preview the start of the book
print(documents[0].page_content[:500])

Metamorphosis
by Franz Kafka
Translated by David Wyllie
I
One morning, when Gregor Samsa woke from troubled dreams, he found himself transformed in his bed into a horrible vermin. He lay on his armour-like back, and if he lifted his head a little he could see his brown belly, slightly domed and divided by arches into stiff sections. The bedding was hardly able to cover it and seemed ready to slide off any moment. His many legs, pitifully thin compared with the size of the rest of him, waved abou


In [4]:
# Step 3: Split the text into overlapping chunks
splitter = RecursiveCharacterTextSplitter(chunk_size=500, chunk_overlap=100)
chunks = splitter.split_documents(documents)

# Preview how many chunks we have and show one
print(f"✅ Total chunks: {len(chunks)}")
print("\n📄 Sample chunk:\n")
print(chunks[0].page_content)

✅ Total chunks: 319

📄 Sample chunk:

Metamorphosis
by Franz Kafka
Translated by David Wyllie
I


In [5]:
# Step 4: Convert book chunks to embeddings and store in FAISS
embedding_model = HuggingFaceEmbeddings(model_name="sentence-transformers/all-MiniLM-L6-v2")

# Create the vector store
vectorstore = FAISS.from_documents(chunks, embedding_model)

# Save the index locally (so I can reload it later)
vectorstore.save_local("faiss_kafka_index")

print("✅ Book embedded and FAISS index saved!")

  embedding_model = HuggingFaceEmbeddings(model_name="sentence-transformers/all-MiniLM-L6-v2")
The secret `HF_TOKEN` does not exist in your Colab secrets.
To authenticate with the Hugging Face Hub, create a token in your settings tab (https://huggingface.co/settings/tokens), set it as secret in your Google Colab and restart your session.
You will be able to reuse this secret in all of your notebooks.
Please note that authentication is recommended but still optional to access public models or datasets.


✅ Book embedded and FAISS index saved!


In [17]:
# Step 5: RetrievalQA using flan-t5-large

from transformers import AutoTokenizer, AutoModelForSeq2SeqLM, pipeline
from langchain_community.vectorstores import FAISS
from langchain.chains import RetrievalQA
from langchain.llms import HuggingFacePipeline

# Reload retriever
retriever = FAISS.load_local(
    "faiss_kafka_index",
    embedding_model,
    allow_dangerous_deserialization=True
).as_retriever()
retriever.search_kwargs['k'] = 5

# Load local model
model_name = "google/flan-t5-large"

tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForSeq2SeqLM.from_pretrained(model_name)

# Build generation pipeline
pipe = pipeline(
    "text2text-generation",
    model=model,
    tokenizer=tokenizer,
    max_new_tokens=512,
    do_sample=False
)

# Wrap in LangChain
llm = HuggingFacePipeline(pipeline=pipe)

# Build QA chain
qa_chain = RetrievalQA.from_chain_type(llm=llm, retriever=retriever)

# Ask questions
questions = [
    "What happens to Gregor Samsa in The Metamorphosis?",
    "Who is Gregor Samsa?",
    "Describe Gregor's physical transformation.",
    "What job did Gregor have before he changed?",
    "What does Gregor try to do when he wakes up transformed?",
    "How does Gregor’s sister initially react to his condition?",
]

# Run QA
for q in questions:
    answer = qa_chain.run(q)
    print("❓ Question:", q)
    print("🤖 Answer:", answer)
    print("-" * 60)

Device set to use cpu
Token indices sequence length is longer than the specified maximum sequence length for this model (683 > 512). Running this sequence through the model will result in indexing errors


❓ Question: What happens to Gregor Samsa in The Metamorphosis?
🤖 Answer: he finds himself transformed in his bed into a horrible vermin
------------------------------------------------------------
❓ Question: Who is Gregor Samsa?
🤖 Answer: Gregor Samsa is a vermin
------------------------------------------------------------
❓ Question: Describe Gregor's physical transformation.
🤖 Answer: Gregor's only concern at that time had been to arrange things so that they could all forget
------------------------------------------------------------
❓ Question: What job did Gregor have before he changed?
🤖 Answer: junior salesman
------------------------------------------------------------
❓ Question: What does Gregor try to do when he wakes up transformed?
🤖 Answer: a little bit longer and forget all this nonsense
------------------------------------------------------------
❓ Question: How does Gregor’s sister initially react to his condition?
🤖 Answer: She tried as far as possible to pretend the

In [13]:
# Step 5: RetrievalQA using flan-t5-large

# Load retriever
retriever = FAISS.load_local(
    "faiss_kafka_index",
    embedding_model,
    allow_dangerous_deserialization=True
).as_retriever()
retriever.search_kwargs['k'] = 5

# Load LLM
from transformers import AutoTokenizer, AutoModelForSeq2SeqLM, pipeline
from langchain.llms import HuggingFacePipeline
from langchain.chains import RetrievalQA

model_name = "google/flan-t5-large"
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForSeq2SeqLM.from_pretrained(model_name)

pipe = pipeline("text2text-generation", model=model, tokenizer=tokenizer, max_new_tokens=256)
llm = HuggingFacePipeline(pipeline=pipe)

# Create QA chain
qa_chain = RetrievalQA.from_chain_type(llm=llm, retriever=retriever)

# Test questions
questions = [
    "What happens to Gregor Samsa in The Metamorphosis?",
    "What job did Gregor have before his transformation?",
    "What does Gregor try to do when he wakes up transformed?",
    "How does Gregor’s sister initially react to his condition?"
]

# Run all questions
for q in questions:
    print(f"\n❓ Question: {q}")
    print(f"🤖 Answer: {qa_chain.run(q)}")

Device set to use cpu
Token indices sequence length is longer than the specified maximum sequence length for this model (683 > 512). Running this sequence through the model will result in indexing errors



❓ Question: What happens to Gregor Samsa in The Metamorphosis?
🤖 Answer: he finds himself transformed in his bed into a horrible vermin

❓ Question: What job did Gregor have before his transformation?
🤖 Answer: junior salesman

❓ Question: What does Gregor try to do when he wakes up transformed?
🤖 Answer: a little bit longer and forget all this nonsense

❓ Question: How does Gregor’s sister initially react to his condition?
🤖 Answer: She tried as far as possible to pretend there was nothing burdensome about it


In [16]:
# Step 6: Ask Your Own Questions (Interactive Loop)
print("📖 Ask questions about the book. Type 'exit' to stop.\n")

while True:
    user_question = input("❓ Your question: ")
    if user_question.lower() in ["exit", "quit"]:
        print("👋 Exiting. Thanks for reading with KafkaBot!")
        break

    # Get relevant chunks
    docs = retriever.get_relevant_documents(user_question)
    combined_context = "\n\n".join([doc.page_content for doc in docs])

    # Build prompt
    prompt = f"""Use the context below to answer the question. If unsure, say so.
Context:
{combined_context}

Question: {user_question}
Answer:"""

    # Get answer
    answer = llm.invoke(prompt)
    print(f"🤖 Answer: {answer}\n")

📖 Ask questions about the book. Type 'exit' to stop.

❓ Your question: What happens to Gregor Samsa in The Metamorphosis?
🤖 Answer: transformed into a horrible vermin

❓ Your question: What job did Gregor have before his transformation?
🤖 Answer: junior salesman

❓ Your question: What does Gregor try to do when he wakes up transformed?
🤖 Answer: (iii)

❓ Your question: How does Gregor’s sister initially react to his condition?
🤖 Answer: tried as far as possible to pretend there was nothing burdensome about it

❓ Your question: What food does Gregor like after the transformation?
🤖 Answer: a dry roll and some bread spread with butter and salt

❓ Your question: What is Gregor's room like?
🤖 Answer: a cave

❓ Your question: How does Gregor die?
🤖 Answer: moving about in that way left him sad and tired to death

❓ Your question: What does Gregor’s boss think of him?
🤖 Answer: he isn’t well

❓ Your question: exit
👋 Exiting. Thanks for reading with KafkaBot!
