# ‡∏™‡∏≥‡∏´‡∏£‡∏±‡∏ö ‚Äú‡πÅ‡∏¢‡∏Å‡πÇ‡∏Ñ‡∏£‡∏á‡∏™‡∏£‡πâ‡∏≤‡∏á PDF

In [11]:
from pypdf import PdfReader

# ‡∏ï‡∏±‡∏ß‡∏ó‡∏î‡∏•‡∏≠‡∏á‡∏ó‡∏µ‡πà 1: ‡πÅ‡∏¢‡∏Å Page ‚Üí Paragraph (‡∏û‡∏∑‡πâ‡∏ô‡∏ê‡∏≤‡∏ô‡∏ó‡∏µ‡πà‡∏™‡∏∏‡∏î)
def extract_paragraphs(pdf_path: str):
    reader = PdfReader(pdf_path)
    results = []

    for page_idx, page in enumerate(reader.pages):
        text = page.extract_text()
        if not text:
            continue

        paragraphs = [
            p.strip()
            for p in text.split("\n")
            if len(p.strip()) > 5
        ]

        for p in paragraphs:
            results.append({
                "doc_id": pdf_path,
                "page": page_idx + 1,
                "block_type": "paragraph",
                "section": None,
                "text": p
            })

    return results


In [12]:
import re
from pypdf import PdfReader

def extract_with_sections(pdf_path: str):
    reader = PdfReader(pdf_path)
    results = []
    current_section = "Unknown"

    for page_idx, page in enumerate(reader.pages):
        text = page.extract_text()
        if not text:
            continue

        lines = text.split("\n")

        for line in lines:
            clean = line.strip()

            # heuristic: heading ‡∏ï‡∏±‡∏ß‡πÉ‡∏´‡∏ç‡πà / ‡∏°‡∏µ‡πÄ‡∏•‡∏Ç‡∏ô‡∏≥
            if re.match(r"^[0-9]+\.", clean) or clean.isupper():
                current_section = clean
                results.append({
                    "doc_id": pdf_path,
                    "page": page_idx + 1,
                    "block_type": "heading",
                    "section": current_section,
                    "text": clean
                })
                continue

            if len(clean) > 5:
                results.append({
                    "doc_id": pdf_path,
                    "page": page_idx + 1,
                    "block_type": "paragraph",
                    "section": current_section,
                    "text": clean
                })

    return results


In [13]:
import pdfplumber

def extract_tables(pdf_path: str):
    results = []

    with pdfplumber.open(pdf_path) as pdf:
        for page_idx, page in enumerate(pdf.pages):
            tables = page.extract_tables() or []

            for t_idx, table in enumerate(tables):
                flat_rows = []

                for row in table:
                    if not row:
                        continue

                   
                    clean_cells = [
                        str(cell).strip()
                        for cell in row
                        if cell is not None and str(cell).strip() != ""
                    ]

                    if clean_cells:
                        flat_rows.append(" | ".join(clean_cells))

                if flat_rows:
                    results.append({
                        "doc_id": pdf_path,
                        "page": page_idx + 1,
                        "block_type": "table",
                        "section": None,
                        "text": " ; ".join(flat_rows)
                    })

    return results


In [14]:
import pandas as pd

pdf_path = "dsi2566.pdf"

paragraphs = extract_with_sections(pdf_path)
tables = extract_tables(pdf_path)

structured = paragraphs + tables
df_str = pd.DataFrame(structured)

print(df_str.head())


        doc_id  page block_type  section                                  text
0  dsi2566.pdf     1  paragraph  Unknown              ‡∏´‡∏•‡∏±‡∏Å‡∏™‡∏π‡∏ï‡∏£‡∏ß‡∏¥‡∏ó‡∏¢‡∏≤‡∏®‡∏≤‡∏™‡∏ï‡∏£‡∏ö‡∏±‡∏ì‡∏ë‡∏¥‡∏ï
1  dsi2566.pdf     1  paragraph  Unknown  ‡∏™‡∏≤‡∏Ç‡∏≤‡∏ß‡∏¥‡∏ä‡∏≤‡∏ß‡∏¥‡∏ó‡∏¢‡∏≤‡∏®‡∏≤‡∏™‡∏ï‡∏£‡πå‡πÅ‡∏•‡∏∞‡∏ô‡∏ß‡∏±‡∏ï‡∏Å‡∏£‡∏£‡∏°‡∏Ç‡πâ‡∏≠‡∏°‡∏π‡∏•
2  dsi2566.pdf     1  paragraph  Unknown          (‡∏´‡∏•‡∏±‡∏Å‡∏™‡∏π‡∏ï‡∏£‡∏õ‡∏£‡∏±‡∏ö‡∏õ‡∏£‡∏∏‡∏á ‡∏û.‡∏®. 2566)
3  dsi2566.pdf     1  paragraph  Unknown                   ‡∏´‡∏•‡∏±‡∏Å‡∏™‡∏π‡∏ï‡∏£‡∏û‡∏´‡∏∏‡∏ß‡∏¥‡∏ó‡∏¢‡∏≤‡∏Å‡∏≤‡∏£
4  dsi2566.pdf     1  paragraph  Unknown            ‡∏´‡∏•‡∏±‡∏Å‡∏™‡∏π‡∏ï‡∏£‡∏Ñ‡∏ß‡∏≤‡∏°‡∏£‡πà‡∏ß‡∏°‡∏°‡∏∑‡∏≠‡∏£‡∏∞‡∏´‡∏ß‡πà‡∏≤‡∏á


In [15]:
df_str

Unnamed: 0,doc_id,page,block_type,section,text
0,dsi2566.pdf,1,paragraph,Unknown,‡∏´‡∏•‡∏±‡∏Å‡∏™‡∏π‡∏ï‡∏£‡∏ß‡∏¥‡∏ó‡∏¢‡∏≤‡∏®‡∏≤‡∏™‡∏ï‡∏£‡∏ö‡∏±‡∏ì‡∏ë‡∏¥‡∏ï
1,dsi2566.pdf,1,paragraph,Unknown,‡∏™‡∏≤‡∏Ç‡∏≤‡∏ß‡∏¥‡∏ä‡∏≤‡∏ß‡∏¥‡∏ó‡∏¢‡∏≤‡∏®‡∏≤‡∏™‡∏ï‡∏£‡πå‡πÅ‡∏•‡∏∞‡∏ô‡∏ß‡∏±‡∏ï‡∏Å‡∏£‡∏£‡∏°‡∏Ç‡πâ‡∏≠‡∏°‡∏π‡∏•
2,dsi2566.pdf,1,paragraph,Unknown,(‡∏´‡∏•‡∏±‡∏Å‡∏™‡∏π‡∏ï‡∏£‡∏õ‡∏£‡∏±‡∏ö‡∏õ‡∏£‡∏∏‡∏á ‡∏û.‡∏®. 2566)
3,dsi2566.pdf,1,paragraph,Unknown,‡∏´‡∏•‡∏±‡∏Å‡∏™‡∏π‡∏ï‡∏£‡∏û‡∏´‡∏∏‡∏ß‡∏¥‡∏ó‡∏¢‡∏≤‡∏Å‡∏≤‡∏£
4,dsi2566.pdf,1,paragraph,Unknown,‡∏´‡∏•‡∏±‡∏Å‡∏™‡∏π‡∏ï‡∏£‡∏Ñ‡∏ß‡∏≤‡∏°‡∏£‡πà‡∏ß‡∏°‡∏°‡∏∑‡∏≠‡∏£‡∏∞‡∏´‡∏ß‡πà‡∏≤‡∏á
...,...,...,...,...,...
2748,dsi2566.pdf,73,table,,‡∏´‡∏°‡∏ß‡∏î‡∏ß‡∏¥‡∏ä‡∏≤ | ‡∏£‡∏´‡∏±‡∏™‡∏ß‡∏¥‡∏ä‡∏≤ | ‡∏£‡∏≤‡∏¢‡∏ß‡∏¥‡∏ä‡∏≤ | ‡∏ú‡∏•‡∏•‡∏±‡∏û‡∏ò‡πå‡∏Å‡∏≤‡∏£‡πÄ‡∏£‡∏µ‡∏¢...
2749,dsi2566.pdf,74,table,,‡∏´‡∏°‡∏ß‡∏î‡∏ß‡∏¥‡∏ä‡∏≤ | ‡∏£‡∏´‡∏±‡∏™‡∏ß‡∏¥‡∏ä‡∏≤ | ‡∏£‡∏≤‡∏¢‡∏ß‡∏¥‡∏ä‡∏≤ | ‡∏ú‡∏•‡∏•‡∏±‡∏û‡∏ò‡πå‡∏Å‡∏≤‡∏£‡πÄ‡∏£‡∏µ‡∏¢...
2750,dsi2566.pdf,75,table,,‡∏´‡∏°‡∏ß‡∏î‡∏ß‡∏¥‡∏ä‡∏≤ | ‡∏£‡∏´‡∏±‡∏™‡∏ß‡∏¥‡∏ä‡∏≤ | ‡∏£‡∏≤‡∏¢‡∏ß‡∏¥‡∏ä‡∏≤ | ‡∏ú‡∏•‡∏•‡∏±‡∏û‡∏ò‡πå‡∏Å‡∏≤‡∏£‡πÄ‡∏£‡∏µ‡∏¢...
2751,dsi2566.pdf,76,table,,‡∏´‡∏°‡∏ß‡∏î‡∏ß‡∏¥‡∏ä‡∏≤ | ‡∏£‡∏´‡∏±‡∏™‡∏ß‡∏¥‡∏ä‡∏≤ | ‡∏£‡∏≤‡∏¢‡∏ß‡∏¥‡∏ä‡∏≤ | ‡∏ú‡∏•‡∏•‡∏±‡∏û‡∏ò‡πå‡∏Å‡∏≤‡∏£‡πÄ‡∏£‡∏µ‡∏¢...


In [50]:
# ‡∏ö‡∏±‡∏ô‡∏ó‡∏∂‡∏Å‡πÄ‡∏õ‡πá‡∏ô CSV
df.to_csv("dsi2566_structured.csv", index=False, encoding="utf-8-sig")

# ‡∏ó‡∏î‡∏•‡∏≠‡∏á DSPy + LLM

In [None]:
# ============================================================
# Curriculum STRUCTURE-AWARE QA Agent
# DSPy + Ollama (FINAL + PROGRESS BAR)
# ============================================================

import json
import requests
import numpy as np
import pandas as pd
from sklearn.metrics.pairwise import cosine_similarity
from tqdm import tqdm
import dspy

# ============================================================
# 1) CONFIG
# ============================================================

EMBEDDING_API = "http://localhost:11434/api/embeddings"
EMBED_MODEL = "bge-m3:567m"

LLM_URL = "http://localhost:11434"
MODEL_NAME = "gpt-oss:20b"

TOP_K = 6

# ============================================================
# 2) EMBEDDING (WITH PROGRESS BAR)
# ============================================================

def embed_texts(texts: list[str]) -> np.ndarray:
    vectors = []

    for t in tqdm(
        texts,
        desc="Creating embeddings",
        unit="chunk"
    ):
        r = requests.post(
            EMBEDDING_API,
            json={
                "model": EMBED_MODEL,
                "prompt": t
            },
            timeout=120
        )
        r.raise_for_status()
        vectors.append(r.json()["embedding"])

    return np.array(vectors)

# ============================================================
# 3) BUILD CORPUS FROM DF (WITH PROGRESS BAR)
# ============================================================

def build_corpus(df: pd.DataFrame):
    texts, meta = [], []

    for _, r in tqdm(
        df.iterrows(),
        total=len(df),
        desc="Building retrieval corpus",
        unit="block"
    ):
        blob = f"""
[SECTION] {r['section']}
[TYPE] {r['block_type']}
[TEXT]
{r['text']}
""".strip()

        texts.append(blob)
        meta.append({
            "section": r["section"],
            "page": r["page"],
            "block_type": r["block_type"]
        })

    return texts, meta

# ============================================================
# 4) RETRIEVER
# ============================================================

class Retriever:
    def __init__(self, texts, meta, embeddings):
        self.texts = texts
        self.meta = meta
        self.embeddings = embeddings

    def search(self, query: str, k: int = TOP_K):
        q_emb = embed_texts([query])
        sims = cosine_similarity(q_emb, self.embeddings)[0]
        idx = sims.argsort()[-k:][::-1]

        return [
            {
                "text": self.texts[i],
                "meta": self.meta[i],
                "score": float(sims[i])
            }
            for i in idx
        ]

# ============================================================
# 5) PASS 1 ‚Äî STRUCTURE EXTRACTION AGENT
# ============================================================

class CurriculumStructureSignature(dspy.Signature):
    """
    Extract curriculum structure as JSON.
    JSON ONLY. No explanation.
    """
    context = dspy.InputField()
    structure = dspy.OutputField()


class CurriculumStructureAgent(dspy.Module):
    def __init__(self):
        super().__init__()
        self.predict = dspy.Predict(CurriculumStructureSignature)

    def forward(self, context: str):
        return self.predict(context=context)

# ============================================================
# 6) BUILD GLOBAL CONTEXT (WITH PROGRESS BAR)
# ============================================================

def build_global_context(df: pd.DataFrame, max_chars=12000):
    chunks = []

    for _, r in tqdm(
        df.iterrows(),
        total=len(df),
        desc="Building global curriculum context",
        unit="block"
    ):
        chunks.append(
            f"[{r['block_type']} | {r['section']}]\n{r['text']}"
        )

    return "\n\n".join(chunks)[:max_chars]

# ============================================================
# 7) PASS 2 ‚Äî QA SIGNATURE
# ============================================================

class CurriculumQASignature(dspy.Signature):
    """
   ‡∏Ñ‡∏∏‡∏ì‡∏Ñ‡∏∑‡∏≠ ‚Äú‡∏ö‡∏£‡∏£‡∏ì‡∏≤‡∏£‡∏±‡∏Å‡∏©‡πå‡∏î‡∏¥‡∏à‡∏¥‡∏ó‡∏±‡∏• (Digital Librarian Agent)‚Äù
‡∏´‡∏ô‡πâ‡∏≤‡∏ó‡∏µ‡πà‡∏Ç‡∏≠‡∏á‡∏Ñ‡∏∏‡∏ì‡∏Ñ‡∏∑‡∏≠:
- ‡∏Ñ‡πâ‡∏ô‡∏´‡∏≤
- ‡∏Ñ‡∏±‡∏î‡∏Å‡∏£‡∏≠‡∏á
- ‡∏≠‡πâ‡∏≤‡∏á‡∏≠‡∏¥‡∏á
- ‡πÅ‡∏•‡∏∞‡∏™‡∏£‡∏∏‡∏õ‡∏Ç‡πâ‡∏≠‡∏°‡∏π‡∏•
‡∏à‡∏≤‡∏Å‡πÄ‡∏≠‡∏Å‡∏™‡∏≤‡∏£‡∏ó‡∏µ‡πà‡πÉ‡∏´‡πâ‡∏°‡∏≤‡∏≠‡∏¢‡πà‡∏≤‡∏á‡πÄ‡∏õ‡πá‡∏ô‡∏£‡∏∞‡∏ö‡∏ö‡πÅ‡∏•‡∏∞‡∏ï‡∏£‡∏ß‡∏à‡∏™‡∏≠‡∏ö‡πÑ‡∏î‡πâ

‡∏Å‡∏é‡∏Å‡∏≤‡∏£‡∏ï‡∏≠‡∏ö:
1. ‡∏ï‡∏≠‡∏ö‡∏Ñ‡∏≥‡∏ñ‡∏≤‡∏°‡πÇ‡∏î‡∏¢‡πÉ‡∏ä‡πâ‡∏Ç‡πâ‡∏≠‡∏°‡∏π‡∏•‡∏à‡∏≤‡∏Å CONTEXT ‡∏ó‡∏µ‡πà‡πÉ‡∏´‡πâ‡∏°‡∏≤‡πÄ‡∏ó‡πà‡∏≤‡∏ô‡∏±‡πâ‡∏ô
2. ‡∏´‡πâ‡∏≤‡∏°‡πÉ‡∏ä‡πâ‡∏Ñ‡∏ß‡∏≤‡∏°‡∏£‡∏π‡πâ‡∏†‡∏≤‡∏¢‡∏ô‡∏≠‡∏Å ‡∏´‡∏£‡∏∑‡∏≠‡∏Ñ‡∏≤‡∏î‡πÄ‡∏î‡∏≤‡πÄ‡∏û‡∏¥‡πà‡∏°‡πÄ‡∏ï‡∏¥‡∏°
3. ‡∏´‡∏≤‡∏Å‡πÑ‡∏°‡πà‡∏û‡∏ö‡∏Ç‡πâ‡∏≠‡∏°‡∏π‡∏•‡∏ó‡∏µ‡πà‡πÄ‡∏Å‡∏µ‡πà‡∏¢‡∏ß‡∏Ç‡πâ‡∏≠‡∏á‡πÉ‡∏ô‡πÄ‡∏≠‡∏Å‡∏™‡∏≤‡∏£ ‡πÉ‡∏´‡πâ‡∏ï‡∏≠‡∏ö‡∏ß‡πà‡∏≤:
   "‡πÑ‡∏°‡πà‡∏û‡∏ö‡∏Ç‡πâ‡∏≠‡∏°‡∏π‡∏•‡πÉ‡∏ô‡πÄ‡∏≠‡∏Å‡∏™‡∏≤‡∏£"
4. ‡∏ï‡∏≠‡∏ö‡πÄ‡∏õ‡πá‡∏ô‡∏†‡∏≤‡∏©‡∏≤‡πÑ‡∏ó‡∏¢
5. ‡∏à‡∏±‡∏î‡∏Ñ‡∏≥‡∏ï‡∏≠‡∏ö‡∏≠‡∏¢‡πà‡∏≤‡∏á‡πÄ‡∏õ‡πá‡∏ô‡∏£‡∏∞‡∏ö‡∏ö ‡∏ä‡∏±‡∏î‡πÄ‡∏à‡∏ô ‡πÅ‡∏•‡∏∞‡πÄ‡∏õ‡πá‡∏ô‡∏Å‡∏•‡∏≤‡∏á‡πÉ‡∏ô‡πÄ‡∏ä‡∏¥‡∏á‡∏ß‡∏¥‡∏ä‡∏≤‡∏Å‡∏≤‡∏£
6. ‡πÉ‡∏ä‡πâ‡∏†‡∏≤‡∏©‡∏≤‡∏ó‡∏µ‡πà‡∏™‡∏∏‡∏†‡∏≤‡∏û ‡πÅ‡∏°‡πà‡∏ô‡∏¢‡∏≥ ‡πÅ‡∏•‡∏∞‡∏°‡∏µ‡∏•‡∏±‡∏Å‡∏©‡∏ì‡∏∞‡πÄ‡∏ä‡∏¥‡∏á‡∏™‡∏≤‡∏£‡∏™‡∏ô‡πÄ‡∏ó‡∏®‡πÅ‡∏ö‡∏ö‡∏ö‡∏£‡∏£‡∏ì‡∏≤‡∏£‡∏±‡∏Å‡∏©‡πå

‡∏£‡∏π‡∏õ‡πÅ‡∏ö‡∏ö‡∏Ñ‡∏≥‡∏ï‡∏≠‡∏ö‡∏ó‡∏µ‡πà‡πÅ‡∏ô‡∏∞‡∏ô‡∏≥:
- ‡∏´‡∏±‡∏ß‡∏Ç‡πâ‡∏≠ / ‡∏õ‡∏£‡∏∞‡πÄ‡∏î‡πá‡∏ô‡∏´‡∏•‡∏±‡∏Å
- ‡∏£‡∏≤‡∏¢‡∏•‡∏∞‡πÄ‡∏≠‡∏µ‡∏¢‡∏î‡∏à‡∏≤‡∏Å‡πÄ‡∏≠‡∏Å‡∏™‡∏≤‡∏£
- (‡∏ñ‡πâ‡∏≤‡∏°‡∏µ) ‡∏Å‡∏≤‡∏£‡∏à‡∏±‡∏î‡∏Å‡∏•‡∏∏‡πà‡∏°‡∏´‡∏£‡∏∑‡∏≠‡πÄ‡∏£‡∏µ‡∏¢‡∏á‡∏•‡∏≥‡∏î‡∏±‡∏ö‡∏ï‡∏≤‡∏°‡πÇ‡∏Ñ‡∏£‡∏á‡∏™‡∏£‡πâ‡∏≤‡∏á‡πÄ‡∏≠‡∏Å‡∏™‡∏≤‡∏£
    """
    question = dspy.InputField()
    context = dspy.InputField()
    answer = dspy.OutputField()

# ============================================================
# 8) STRUCTURE-AWARE QA AGENT (ReAct)
# ============================================================

class CurriculumQAAgent(dspy.Module):
    def __init__(self, retriever, curriculum_structure: str):
        super().__init__()
        self.retriever = retriever
        self.curriculum_structure = curriculum_structure

        self.react = dspy.ReAct(
            CurriculumQASignature,
            tools=[]
        )

    def forward(self, question: str):
        docs = self.retriever.search(question)

        retrieved = "\n\n".join(
            f"[{d['meta']['section']} | page {d['meta']['page']}]\n{d['text']}"
            for d in docs
        )

        context = f"""
### CURRICULUM STRUCTURE (GROUND TRUTH)
{self.curriculum_structure}

### RELEVANT DOCUMENTS
{retrieved}
""".strip()

        return self.react(
            question=question,
            context=context
        )

# ============================================================
# 9) MAIN PIPELINE (WITH PROGRESS BAR)
# ============================================================

def main(df: pd.DataFrame):
    # -----------------------------
    # Init LLM
    # -----------------------------
    print("Initializing LLM...")
    lm = dspy.LM(
        f"ollama/{MODEL_NAME}",
        api_base=LLM_URL,
        cache=False
    )
    dspy.configure(lm=lm)

    # -----------------------------
    # PASS 1: Extract Structure
    # -----------------------------
    print("\nExtracting curriculum structure...")
    global_context = build_global_context(df)

    structure_agent = CurriculumStructureAgent()
    with tqdm(total=1, desc="Curriculum structure reasoning") as pbar:
        structure_result = structure_agent(context=global_context)
        pbar.update(1)

    curriculum_structure = structure_result.structure

    print("\nCURRICULUM STRUCTURE (JSON):")
    print(curriculum_structure)

    # -----------------------------
    # Build Retriever
    # -----------------------------
    print("\nBuilding retrieval index...")
    corpus_texts, corpus_meta = build_corpus(df)
    embeddings = embed_texts(corpus_texts)

    retriever = Retriever(
        corpus_texts,
        corpus_meta,
        embeddings
    )

    # -----------------------------
    # PASS 2: QA Agent
    # -----------------------------
    agent = CurriculumQAAgent(
        retriever,
        curriculum_structure
    )

    print("\nStructure-aware Curriculum QA Agent ‡∏û‡∏£‡πâ‡∏≠‡∏°‡πÉ‡∏ä‡πâ‡∏á‡∏≤‡∏ô")
    print("‡∏û‡∏¥‡∏°‡∏û‡πå‡∏Ñ‡∏≥‡∏ñ‡∏≤‡∏°‡πÄ‡∏Å‡∏µ‡πà‡∏¢‡∏ß‡∏Å‡∏±‡∏ö‡∏´‡∏•‡∏±‡∏Å‡∏™‡∏π‡∏ï‡∏£ (exit ‡πÄ‡∏û‡∏∑‡πà‡∏≠‡∏≠‡∏≠‡∏Å)\n")

    while True:
        q = input("‡∏Ñ‡∏≥‡∏ñ‡∏≤‡∏°: ").strip()
        if q.lower() in ["exit", "quit"]:
            break

        result = agent(question=q)
        print("\n‡∏Ñ‡∏≥‡∏ï‡∏≠‡∏ö:")
        print(result.answer)
        print("-" * 60)

# ============================================================
# 10) ENTRY
# ============================================================

if __name__ == "__main__":
    """
    df ‡∏ï‡πâ‡∏≠‡∏á‡∏°‡∏µ column:
    - text
    - section
    - page
    - block_type
    """

    df = df_str   # ‚Üê ‡πÉ‡∏™‡πà DataFrame ‡∏Ç‡∏≠‡∏á‡∏Ñ‡∏∏‡∏ì‡∏ï‡∏£‡∏á‡∏ô‡∏µ‡πâ
    main(df)


Initializing LLM...

Extracting curriculum structure...


üìê Building global curriculum context: 100%|‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà| 2753/2753 [00:00<00:00, 27631.81block/s]
Curriculum structure reasoning: 100%|‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà| 1/1 [00:58<00:00, 58.69s/it]



CURRICULUM STRUCTURE (JSON):
{
  "title": "‡∏´‡∏•‡∏±‡∏Å‡∏™‡∏π‡∏ï‡∏£‡∏ß‡∏¥‡∏ó‡∏¢‡∏≤‡∏®‡∏≤‡∏™‡∏ï‡∏£‡∏ö‡∏±‡∏ì‡∏ë‡∏¥‡∏ï",
  "institution": "‡∏°‡∏´‡∏≤‡∏ß‡∏¥‡∏ó‡∏¢‡∏≤‡∏•‡∏±‡∏¢‡∏ò‡∏£‡∏£‡∏°‡∏®‡∏≤‡∏™‡∏ï‡∏£‡πå",
  "faculty": "‡∏ß‡∏¥‡∏ó‡∏¢‡∏≤‡∏•‡∏±‡∏¢‡∏™‡∏´‡∏ß‡∏¥‡∏ó‡∏¢‡∏≤‡∏Å‡∏≤‡∏£",
  "sections": {
    "1": {
      "title": "‡∏Ç‡πâ‡∏≠‡∏°‡∏π‡∏•‡∏ó‡∏±‡πà‡∏ß‡πÑ‡∏õ",
      "subsections": {
        "1.1": {
          "title": "‡∏£‡∏´‡∏±‡∏™‡πÅ‡∏•‡∏∞‡∏ä‡∏∑‡πà‡∏≠‡∏´‡∏•‡∏±‡∏Å‡∏™‡∏π‡∏ï‡∏£",
          "content": [
            {
              "‡∏£‡∏´‡∏±‡∏™‡∏´‡∏•‡∏±‡∏Å‡∏™‡∏π‡∏ï‡∏£": "20182067117526"
            },
            {
              "‡∏ä‡∏∑‡πà‡∏≠‡∏´‡∏•‡∏±‡∏Å‡∏™‡∏π‡∏ï‡∏£ (‡∏†‡∏≤‡∏©‡∏≤‡πÑ‡∏ó‡∏¢)": "‡∏´‡∏•‡∏±‡∏Å‡∏™‡∏π‡∏ï‡∏£‡∏ß‡∏¥‡∏ó‡∏¢‡∏≤‡∏®‡∏≤‡∏™‡∏ï‡∏£‡∏ö‡∏±‡∏ì‡∏ë‡∏¥‡∏ï ‡∏™‡∏≤‡∏Ç‡∏≤‡∏ß‡∏¥‡∏ä‡∏≤‡∏ß‡∏¥‡∏ó‡∏¢‡∏≤‡∏®‡∏≤‡∏™‡∏ï‡∏£‡πå‡πÅ‡∏•‡∏∞‡∏ô‡∏ß‡∏±‡∏ï‡∏Å‡∏£‡∏£‡∏°‡∏Ç‡πâ‡∏≠‡∏°‡∏π‡∏•"
            },
            {
              "‡∏ä‡∏∑‡πà‡∏≠‡∏´‡∏•‡∏±‡∏Å‡∏™‡∏π‡∏ï‡∏£ (‡∏†‡∏≤‡∏©‡∏≤‡∏≠‡∏±‡∏á‡∏Å‡∏§‡∏©)": "Bachelor

Building retrieval corpus: 100%|‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà| 2753/2753 [00:00<00:00, 21800.87block/s]
Creating embeddings: 100%|‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà| 2753/2753 [16:52<00:00,  2.72chunk/s]


Structure-aware Curriculum QA Agent ‡∏û‡∏£‡πâ‡∏≠‡∏°‡πÉ‡∏ä‡πâ‡∏á‡∏≤‡∏ô
‡∏û‡∏¥‡∏°‡∏û‡πå‡∏Ñ‡∏≥‡∏ñ‡∏≤‡∏°‡πÄ‡∏Å‡∏µ‡πà‡∏¢‡∏ß‡∏Å‡∏±‡∏ö‡∏´‡∏•‡∏±‡∏Å‡∏™‡∏π‡∏ï‡∏£ (exit ‡πÄ‡∏û‡∏∑‡πà‡∏≠‡∏≠‡∏≠‡∏Å)






‡∏Ñ‡∏≥‡∏ñ‡∏≤‡∏°:  ‡∏´‡∏•‡∏±‡∏Å‡∏™‡∏π‡∏ï‡∏£‡∏ô‡∏µ‡πâ‡∏ä‡∏∑‡πà‡∏≠‡∏≠‡∏∞‡πÑ‡∏£ ‡πÅ‡∏•‡∏∞‡πÄ‡∏õ‡∏¥‡∏î‡∏™‡∏≠‡∏ô‡πÇ‡∏î‡∏¢‡∏´‡∏ô‡πà‡∏ß‡∏¢‡∏á‡∏≤‡∏ô‡πÉ‡∏î


Creating embeddings: 100%|‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà| 1/1 [00:00<00:00,  5.65chunk/s]



‡∏Ñ‡∏≥‡∏ï‡∏≠‡∏ö:
**‡∏ä‡∏∑‡πà‡∏≠‡∏´‡∏•‡∏±‡∏Å‡∏™‡∏π‡∏ï‡∏£**  
‡∏´‡∏•‡∏±‡∏Å‡∏™‡∏π‡∏ï‡∏£‡∏ß‡∏¥‡∏ó‡∏¢‡∏≤‡∏®‡∏≤‡∏™‡∏ï‡∏£‡∏ö‡∏±‡∏ì‡∏ë‡∏¥‡∏ï ‡∏™‡∏≤‡∏Ç‡∏≤‡∏ß‡∏¥‡∏ä‡∏≤‡∏ß‡∏¥‡∏ó‡∏¢‡∏≤‡∏®‡∏≤‡∏™‡∏ï‡∏£‡πå‡πÅ‡∏•‡∏∞‡∏ô‡∏ß‡∏±‡∏ï‡∏Å‡∏£‡∏£‡∏°‡∏Ç‡πâ‡∏≠‡∏°‡∏π‡∏•  

**‡∏´‡∏ô‡πà‡∏ß‡∏¢‡∏á‡∏≤‡∏ô‡∏ó‡∏µ‡πà‡πÄ‡∏õ‡∏¥‡∏î‡∏™‡∏≠‡∏ô**  
‡∏°‡∏´‡∏≤‡∏ß‡∏¥‡∏ó‡∏¢‡∏≤‡∏•‡∏±‡∏¢‡∏ò‡∏£‡∏£‡∏°‡∏®‡∏≤‡∏™‡∏ï‡∏£‡πå ‚Äì ‡∏ß‡∏¥‡∏ó‡∏¢‡∏≤‡∏•‡∏±‡∏¢‡∏™‡∏´‡∏ß‡∏¥‡∏ó‡∏¢‡∏≤‡∏Å‡∏≤‡∏£
------------------------------------------------------------


‡∏Ñ‡∏≥‡∏ñ‡∏≤‡∏°:  ‡∏õ‡∏µ 1 ‡πÄ‡∏£‡∏µ‡∏¢‡∏ô‡∏≠‡∏∞‡πÑ‡∏£‡∏ö‡πâ‡∏≤‡∏á


Creating embeddings: 100%|‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà| 1/1 [00:00<00:00,  6.40chunk/s]



‡∏Ñ‡∏≥‡∏ï‡∏≠‡∏ö:
**‡∏õ‡∏µ‚ÄØ1 ‚Äì ‡∏£‡∏≤‡∏¢‡∏ß‡∏¥‡∏ä‡∏≤‡∏ó‡∏µ‡πà‡πÄ‡∏£‡∏µ‡∏¢‡∏ô**

| ‡∏†‡∏≤‡∏Ñ‡πÄ‡∏£‡∏µ‡∏¢‡∏ô | ‡∏£‡∏´‡∏±‡∏™ | ‡∏ä‡∏∑‡πà‡∏≠‡∏ß‡∏¥‡∏ä‡∏≤ | ‡∏´‡∏ô‡πà‡∏ß‡∏¢‡∏Å‡∏¥‡∏ï |
|-----------|------|----------|----------|
| **‡∏†‡∏≤‡∏Ñ‡πÄ‡∏£‡∏µ‡∏¢‡∏ô‡∏ó‡∏µ‡πà‚ÄØ1** | ‡∏°‡∏ò.100 | ‡∏û‡∏•‡πÄ‡∏°‡∏∑‡∏≠‡∏á‡∏Å‡∏±‡∏ö‡∏Å‡∏≤‡∏£‡∏•‡∏á‡∏°‡∏∑‡∏≠‡πÅ‡∏Å‡πâ‡∏õ‡∏±‡∏ç‡∏´‡∏≤ | 3 |
| | ‡∏°‡∏ò.106 | ‡∏Ñ‡∏ß‡∏≤‡∏°‡∏Ñ‡∏¥‡∏î‡∏™‡∏£‡πâ‡∏≤‡∏á‡∏™‡∏£‡∏£‡∏Ñ‡πå‡πÅ‡∏•‡∏∞‡∏Å‡∏≤‡∏£‡∏™‡∏∑‡πà‡∏≠‡∏™‡∏≤‡∏£ | 3 |
| | ‡∏°‡∏ò.107 | ‡∏ó‡∏±‡∏Å‡∏©‡∏∞‡∏î‡∏¥‡∏à‡∏¥‡∏ó‡∏±‡∏•‡∏Å‡∏±‡∏ö‡∏Å‡∏≤‡∏£‡πÅ‡∏Å‡πâ‡∏õ‡∏±‡∏ç‡∏´‡∏≤ | 3 |
| | ‡∏°‡∏ò.108 | ‡∏Å‡∏≤‡∏£‡∏û‡∏±‡∏í‡∏ô‡∏≤‡πÅ‡∏•‡∏∞‡∏à‡∏±‡∏î‡∏Å‡∏≤‡∏£‡∏ï‡∏ô‡πÄ‡∏≠‡∏á | 3 |
| | ‡∏™‡∏Å.200 | ‡∏ß‡∏¥‡∏ó‡∏¢‡∏≤‡∏®‡∏≤‡∏™‡∏ï‡∏£‡πå‡∏Ç‡πâ‡∏≠‡∏°‡∏π‡∏•‡∏™‡∏≥‡∏´‡∏£‡∏±‡∏ö‡∏ä‡∏µ‡∏ß‡∏¥‡∏ï‡∏õ‡∏£‡∏∞‡∏à‡∏≥‡∏ß‡∏±‡∏ô | 3 |
| | ‡∏ß‡∏™‡∏´.104 | ‡∏Å‡∏≤‡∏£‡πÄ‡∏Ç‡∏µ‡∏¢‡∏ô‡πÇ‡∏õ‡∏£‡πÅ‡∏Å‡∏£‡∏°‡πÄ‡∏û‡∏∑‡πà‡∏≠‡∏ß‡∏¥‡πÄ‡∏Ñ‡∏£‡∏≤‡∏∞‡∏´‡πå‡∏Ç‡πâ‡∏≠‡∏°‡∏π‡∏• | 3 |
| **‡∏†‡∏≤‡∏Ñ‡πÄ‡∏£‡∏µ‡∏¢‡∏ô‡∏ó‡∏µ‡πà‚ÄØ2** | ‡∏™‡∏©.105 | ‡∏