In [3]:
!git clone https://github.com/kartickkt/dl-teaching-agent.git
%cd dl-teaching-agent
!pip install -r requirements.txt


Cloning into 'dl-teaching-agent'...
remote: Enumerating objects: 20, done.[K
remote: Counting objects: 100% (7/7), done.[K
remote: Compressing objects: 100% (5/5), done.[K
remote: Total 20 (delta 1), reused 7 (delta 1), pack-reused 13 (from 1)[K
Receiving objects: 100% (20/20), 41.91 MiB | 21.14 MiB/s, done.
Resolving deltas: 100% (1/1), done.
/content/dl-teaching-agent/dl-teaching-agent


In [6]:
from huggingface_hub import login
login()  # Then enter your token

model = AutoModelForCausalLM.from_pretrained(
    "mistralai/Mistral-7B-Instruct-v0",
    use_auth_token=True,  # or token="your_token_here"
    load_in_4bit=True,
    device_map="auto",
    trust_remote_code=True,
)


VBox(children=(HTML(value='<center> <img\nsrc=https://huggingface.co/front/assets/huggingface_logo-noborder.sv…



OSError: mistralai/Mistral-7B-Instruct-v0 is not a local folder and is not a valid model identifier listed on 'https://huggingface.co/models'
If this is a private repository, make sure to pass a token having permission to this repo either by logging in with `hf auth login` or by passing `token=<your_token>`

In [9]:
!hf auth login


    _|    _|  _|    _|    _|_|_|    _|_|_|  _|_|_|  _|      _|    _|_|_|      _|_|_|_|    _|_|      _|_|_|  _|_|_|_|
    _|    _|  _|    _|  _|        _|          _|    _|_|    _|  _|            _|        _|    _|  _|        _|
    _|_|_|_|  _|    _|  _|  _|_|  _|  _|_|    _|    _|  _|  _|  _|  _|_|      _|_|_|    _|_|_|_|  _|        _|_|_|
    _|    _|  _|    _|  _|    _|  _|    _|    _|    _|    _|_|  _|    _|      _|        _|    _|  _|        _|
    _|    _|    _|_|      _|_|_|    _|_|_|  _|_|_|  _|      _|    _|_|_|      _|        _|    _|    _|_|_|  _|_|_|_|

    A token is already saved on your machine. Run `hf auth whoami` to get more information or `hf auth logout` if you want to log out.
    Setting a new token will erase the existing one.
    To log in, `huggingface_hub` requires a token generated from https://huggingface.co/settings/tokens .
Enter your token (input will not be visible): 
Add token as git credential? (Y/n) n
Token is valid (permission: read).
The token `col

In [12]:
!pip install pypdf sentence-transformers faiss-cpu transformers bitsandbytes --quiet

import os
import faiss
import pickle
from pypdf import PdfReader
from sentence_transformers import SentenceTransformer
from typing import List
import numpy as np
import torch
from transformers import AutoTokenizer, AutoModelForCausalLM


class PDFTeachingAgent:
    def __init__(self,
                 embed_model_name="all-MiniLM-L6-v2",
                 index_path="faiss_index.bin",
                 chunks_path="chunks.pkl"):
        self.embed_model_name = embed_model_name
        self.index_path = index_path
        self.chunks_path = chunks_path
        self.embed_model = SentenceTransformer(embed_model_name)
        self.faiss_index = None
        self.chunks = None
        self.model = None
        self.tokenizer = None

    def extract_text_from_pdf(self, pdf_path: str) -> str:
        reader = PdfReader(pdf_path)
        text = ""
        for page in reader.pages:
            page_text = page.extract_text()
            if page_text:
                text += page_text + "\n"
        return text

    def chunk_text(self, text: str, chunk_size: int = 500, overlap: int = 50) -> List[str]:
        words = text.split()
        chunks = []
        start = 0
        while start < len(words):
            end = min(start + chunk_size, len(words))
            chunk = " ".join(words[start:end])
            chunks.append(chunk)
            start += chunk_size - overlap
        return chunks

    def build_faiss_index(self, chunks: List[str], save=True):
        embeddings = self.embed_model.encode(chunks, batch_size=32, show_progress_bar=True)
        dimension = embeddings.shape[1]
        index = faiss.IndexFlatL2(dimension)
        index.add(embeddings)
        self.faiss_index = index
        self.chunks = chunks
        if save:
            self.save_index()

    def save_index(self):
        faiss.write_index(self.faiss_index, self.index_path)
        with open(self.chunks_path, "wb") as f:
            pickle.dump(self.chunks, f)

    def load_index(self):
        self.faiss_index = faiss.read_index(self.index_path)
        with open(self.chunks_path, "rb") as f:
            self.chunks = pickle.load(f)

    def load_llm(self, model_name="mistralai/Mistral-7B-Instruct-v0.3", token=None):
        print(f"Loading model {model_name} in 4-bit quantized mode...")
        self.tokenizer = AutoTokenizer.from_pretrained(model_name, use_auth_token=token)
        self.model = AutoModelForCausalLM.from_pretrained(
            model_name,
            load_in_4bit=True,
            device_map="auto",
            trust_remote_code=True,
            use_auth_token=token,
        )
        self.model.eval()
        print("Model loaded and ready.")


    def query(self, query_text: str, k=3, max_new_tokens=256):
        if self.faiss_index is None or self.chunks is None:
            raise ValueError("FAISS index or chunks not loaded. Build or load index first.")
        if self.model is None or self.tokenizer is None:
            raise ValueError("LLM model/tokenizer not loaded. Call load_llm() first.")

        query_embedding = self.embed_model.encode(query_text)
        D, I = self.faiss_index.search(np.array([query_embedding]), k=k)
        retrieved_chunks = [self.chunks[i] for i in I[0]]

        context = "\n\n".join(retrieved_chunks)
        prompt = f"Use the following context to answer the question.\n\nContext:\n{context}\n\nQuestion: {query_text}\nAnswer:"

        inputs = self.tokenizer(prompt, return_tensors="pt").to(self.model.device)
        outputs = self.model.generate(**inputs, max_new_tokens=max_new_tokens)
        answer = self.tokenizer.decode(outputs[0], skip_special_tokens=True)
        return answer


# ========== Example usage ==========

agent = PDFTeachingAgent()

# 1. Try to load saved FAISS index, otherwise build it
if os.path.exists(agent.index_path) and os.path.exists(agent.chunks_path):
    print("🔹 Loading FAISS index from disk...")
    agent.load_index()
else:
    print("🔹 Building FAISS index from scratch...")
    pdf_path = "/content/dl-teaching-agent/data/d2l.pdf"
    pdf_text = agent.extract_text_from_pdf(pdf_path)
    chunks = agent.chunk_text(pdf_text)
    agent.build_faiss_index(chunks)  # this also saves automatically

# 2. Load the quantized LLM
agent.load_llm(
    model_name="mistralai/Mistral-7B-Instruct-v0.3",
    token="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"
)

# 3. Query your agent
query = "Summarize the main concepts in chapter 2."
answer = agent.query(query)
print("\n--- ANSWER ---\n", answer)



Batches:   0%|          | 0/23 [00:00<?, ?it/s]

Loading model mistralai/Mistral-7B-Instruct-v0.3 in 4-bit quantized mode...


tokenizer_config.json:   0%|          | 0.00/141k [00:00<?, ?B/s]

tokenizer.model:   0%|          | 0.00/587k [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/1.96M [00:00<?, ?B/s]

special_tokens_map.json:   0%|          | 0.00/414 [00:00<?, ?B/s]



config.json:   0%|          | 0.00/601 [00:00<?, ?B/s]

The `load_in_4bit` and `load_in_8bit` arguments are deprecated and will be removed in the future versions. Please, pass a `BitsAndBytesConfig` object in `quantization_config` argument instead.


model.safetensors.index.json:   0%|          | 0.00/23.9k [00:00<?, ?B/s]

Fetching 3 files:   0%|          | 0/3 [00:00<?, ?it/s]

model-00001-of-00003.safetensors:   0%|          | 0.00/4.95G [00:00<?, ?B/s]

model-00002-of-00003.safetensors:   0%|          | 0.00/5.00G [00:00<?, ?B/s]

model-00003-of-00003.safetensors:   0%|          | 0.00/4.55G [00:00<?, ?B/s]

Loading checkpoint shards:   0%|          | 0/3 [00:00<?, ?it/s]

generation_config.json:   0%|          | 0.00/116 [00:00<?, ?B/s]

Setting `pad_token_id` to `eos_token_id`:2 for open-end generation.


Model loaded and ready.

--- ANSWER ---
 Use the following context to answer the question.

Context:
Whileoursystemisnotperfect,thesechoicesstrikeacompromise amongthecompetingconcerns. Webelievethat DiveintoDeepLearning mightbethefirst book published using such an integrated workflow. Learning by Doing Many textbooks present concepts in succession, covering each in exhaustive detail. For example, the excellent textbook of Bishop (2006), teaches each topic so thoroughly that getting to the chapter on linear regression requires a nontrivial amount of work. While expertslovethisbookpreciselyforitsthoroughness, fortruebeginners, thispropertylimits its usefulness as an introductory text. In this book, we teach most conceptsjust in time. In other words, you will learn concepts at the very moment that they are needed to accomplish some practical end. While we take some time at the outset to teach fundamental preliminaries, like linear algebra and probability,wewantyoutotastethesatisfactionoft