<a href="https://colab.research.google.com/github/secret-coder-pro/jaip/blob/main/Untitled0.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

In [50]:
%pip install llama-index
%pip install llama-index-embeddings-huggingface
%pip install llama-index-llms-huggingface



In [51]:
!mkdir DATA

mkdir: cannot create directory ‘DATA’: File exists


In [52]:
from llama_index.core import SimpleDirectoryReader

documents = SimpleDirectoryReader("./DATA").load_data()
print(f"Loaded {len(documents)} questions")

Failed to load file /content/DATA/pyq_questions.csv with error: 'utf-8' codec can't decode byte 0xb9 in position 396: invalid start byte. Skipping...
Loaded 0 questions


In [66]:
import csv
from llama_index.core import Document

documents = []

with open("./DATA/pyq_questions.csv", newline="", encoding="latin-1") as f:
    reader = csv.DictReader(f)

    for row in reader:
        doc = Document(
            text=row["question"],   # ONLY the question text
            metadata={
                "exam": row["exam"],
                "year": row["year"],
                "subject": row["subject"],
                "topics": row["topics"],
            }
        )
        documents.append(doc)

In [67]:
from llama_index.embeddings.huggingface import HuggingFaceEmbedding

embed_model = HuggingFaceEmbedding(
    model_name="sentence-transformers/all-mpnet-base-v2"
)


In [68]:
!pip install -U bitsandbytes
import torch
from llama_index.llms.huggingface import HuggingFaceLLM
from transformers import BitsAndBytesConfig # Import for quantization


quantization_config = BitsAndBytesConfig(
    load_in_4bit=True,
    bnb_4bit_quant_type="nf4",
    bnb_4bit_compute_dtype=torch.float16,
    bnb_4bit_use_double_quant=True,
)

llm = HuggingFaceLLM(
    model_name="mistralai/Mistral-7B-Instruct-v0.1",
    tokenizer_name="mistralai/Mistral-7B-Instruct-v0.1",
    context_window=4096,
    max_new_tokens=128,
    generate_kwargs={"temperature": 0.0},
    device_map="auto",
    # Pass the quantization config to model_kwargs
    model_kwargs={
        "torch_dtype": torch.float16,
        "quantization_config": quantization_config,
    },
)



Loading checkpoint shards:   0%|          | 0/2 [00:00<?, ?it/s]

In [69]:
from llama_index.core import Settings

Settings.embed_model = embed_model
Settings.llm = llm
Settings.chunk_size = 256      # small → one question per chunk
Settings.chunk_overlap = 0

# The service_context variable is no longer needed with Settings

In [70]:
from llama_index.core import VectorStoreIndex

index = VectorStoreIndex.from_documents(
    documents
)

In [71]:
retriever = index.as_retriever(
    similarity_top_k=10   # number of questions to retrieve
)


In [72]:
def get_pyqs(topic: str):
    results = retriever.retrieve(topic)

    print(f"\nTopic: {topic}\n")
    for i, res in enumerate(results, 1):
        print(f"{i}. {res.node.text}\n")


In [73]:
def get_pyqs_question_year(topic: str):
    results = retriever.retrieve(topic)

    for i, res in enumerate(results, 1):
        question_text = res.node.text
        year = res.node.metadata.get("year", "unknown")

        print(f"{i}. ({year}) {question_text}\n")




In [75]:
get_pyqs_question_year("limit")


1. (2024) Let f(x) be a continuously differentiable function on the interval (0, ?) such that f(1) = 2 and lim (t ? x) [(t¹? f(x) ? x¹? f(t)) / (t? ? x?)] = 1 for each x > 0. Then, for all x > 0, f(x) is equal to: (A) 31/(11x) ? (9/11)x¹? (B) 9/(11x) + (13/11)x¹? (C) ?9/(11x) + (31/11)x¹? (D) 13/(11x) + (9/11)x¹?.

2. (2024) A student appears for a quiz consisting of only truefalse type questions and answers all the questions. The student knows the answers of some questions and guesses the answers for the remaining questions. Whenever the student knows the answer of a question, he gives the correct answer. Assume that the probability of the student giving the correct answer for a question, given that he has guessed it, is 1/2. Also assume that the probability of the answer for a question being guessed, given that the students answer is correct, is 1/6. Then the probability that the student knows the answer of a randomly chosen question is: (A) 1/12 (B) 1/7 (C) 5/7 (D) 5/12.

3. (2020