# MCQ Generation - Inference with Fine-tuned QLoRA
**Load base model + LoRA adapters from Hugging Face and generate MCQs from custom text**

This notebook:
- Loads Qwen2.5-3B-Instruct base model
- Loads your fine-tuned LoRA adapters from HuggingFace
- Generates MCQ questions from your custom text
- Outputs structured JSON format

## 1. Install Dependencies

In [1]:
%%capture
# Install required packages
!pip install -q transformers accelerate peft bitsandbytes torch

## 2. Load Base Model and LoRA Adapters

In [2]:
from transformers import AutoTokenizer, AutoModelForCausalLM, BitsAndBytesConfig
from peft import PeftModel
import torch

# Model names
BASE_MODEL = "unsloth/Qwen2.5-3B-Instruct"
LORA_ADAPTER = "mohamedashraff22/qwen2.5-3b-mcq-lora"  # Your HF adapter

print("Loading base model...")

# Load tokenizer
tokenizer = AutoTokenizer.from_pretrained(BASE_MODEL)

# Configure 4-bit quantization for memory efficiency
bnb_config = BitsAndBytesConfig(
    load_in_4bit=True,
    bnb_4bit_quant_type="nf4",
    bnb_4bit_compute_dtype=torch.bfloat16,
    bnb_4bit_use_double_quant=True,
)

# Load base model in 4-bit
base_model = AutoModelForCausalLM.from_pretrained(
    BASE_MODEL,
    quantization_config=bnb_config,
    device_map="auto",
    trust_remote_code=True,
)

print("Loading LoRA adapters from HuggingFace...")

# Load LoRA adapters
model = PeftModel.from_pretrained(base_model, LORA_ADAPTER)

print("‚úÖ Model loaded successfully!")
print(f"   Base Model: {BASE_MODEL}")
print(f"   LoRA Adapter: {LORA_ADAPTER}")

Loading base model...


The secret `HF_TOKEN` does not exist in your Colab secrets.
To authenticate with the Hugging Face Hub, create a token in your settings tab (https://huggingface.co/settings/tokens), set it as secret in your Google Colab and restart your session.
You will be able to reuse this secret in all of your notebooks.
Please note that authentication is recommended but still optional to access public models or datasets.


tokenizer_config.json: 0.00B [00:00, ?B/s]

vocab.json: 0.00B [00:00, ?B/s]

merges.txt: 0.00B [00:00, ?B/s]

tokenizer.json:   0%|          | 0.00/11.4M [00:00<?, ?B/s]

added_tokens.json:   0%|          | 0.00/605 [00:00<?, ?B/s]

special_tokens_map.json:   0%|          | 0.00/614 [00:00<?, ?B/s]

config.json:   0%|          | 0.00/757 [00:00<?, ?B/s]

model.safetensors.index.json: 0.00B [00:00, ?B/s]

Fetching 2 files:   0%|          | 0/2 [00:00<?, ?it/s]

model-00001-of-00002.safetensors:   0%|          | 0.00/3.97G [00:00<?, ?B/s]

model-00002-of-00002.safetensors:   0%|          | 0.00/2.20G [00:00<?, ?B/s]

Loading checkpoint shards:   0%|          | 0/2 [00:00<?, ?it/s]

generation_config.json:   0%|          | 0.00/266 [00:00<?, ?B/s]

Loading LoRA adapters from HuggingFace...


adapter_config.json: 0.00B [00:00, ?B/s]



adapter_model.safetensors:   0%|          | 0.00/479M [00:00<?, ?B/s]

‚úÖ Model loaded successfully!
   Base Model: unsloth/Qwen2.5-3B-Instruct
   LoRA Adapter: mohamedashraff22/qwen2.5-3b-mcq-lora


## 3. Define System Prompt and Helper Functions

In [3]:
import json
import re
from pydantic import BaseModel, Field
from typing import List

# System prompt used during training
SYSTEM_PROMPT = """You are an expert MCQ question generator. Given a text passage, generate multiple choice questions.

Output Format:
<questions>
[
  {
    "question": "Question text here?",
    "option_a": "First option",
    "option_b": "Second option",
    "option_c": "Third option",
    "option_d": "Fourth option",
    "correct_answer": "A"
  }
]
</questions>
"""

# Pydantic models for validation
class MCQQuestion(BaseModel):
    """Single Multiple Choice Question"""
    question: str = Field(description="The question text")
    option_a: str = Field(description="Option A")
    option_b: str = Field(description="Option B")
    option_c: str = Field(description="Option C")
    option_d: str = Field(description="Option D")
    correct_answer: str = Field(description="Correct answer: A, B, C, or D")

class MCQList(BaseModel):
    """List of MCQ Questions"""
    questions: List[MCQQuestion] = Field(description="List of generated questions")

# Helper functions
def extract_json_content(text: str) -> str:
    """Extract JSON from <questions> tags"""
    match = re.search(r'<questions>\s*(.+?)\s*</questions>', text, re.DOTALL)
    if match:
        return match.group(1).strip()
    return ""

def parse_mcq_output(output_text: str) -> dict:
    """Parse and validate MCQ output"""
    json_content = extract_json_content(output_text)

    try:
        parsed = json.loads(json_content)
        mcq_list = MCQList(questions=[MCQQuestion(**q) for q in parsed])

        return {
            "success": True,
            "data": mcq_list.model_dump(),
            "num_questions": len(mcq_list.questions)
        }
    except Exception as e:
        return {
            "success": False,
            "error": str(e),
            "raw_output": output_text[:500]
        }

print("‚úÖ Helper functions defined!")

‚úÖ Helper functions defined!


## 4. MCQ Generation Function

In [4]:
def generate_mcqs(text: str, num_questions: int = None, temperature: float = 0.7, max_new_tokens: int = 1024):
    """
    Generate MCQ questions from input text

    Args:
        text: Input text passage
        num_questions: Optional number of questions to generate
        temperature: Sampling temperature (0.0-1.0)
        max_new_tokens: Maximum tokens to generate

    Returns:
        Dictionary with generated questions
    """
    # Build user message
    user_content = f"Generate MCQ questions from this text:\n\n{text}"
    if num_questions:
        user_content += f"\n\nGenerate exactly {num_questions} questions."

    # Format prompt with chat template
    messages = [
        {"role": "system", "content": SYSTEM_PROMPT},
        {"role": "user", "content": user_content}
    ]

    prompt = tokenizer.apply_chat_template(
        messages,
        tokenize=False,
        add_generation_prompt=True
    )

    # Tokenize
    inputs = tokenizer(prompt, return_tensors="pt").to(model.device)

    # Generate
    print("üîÑ Generating MCQs...")
    with torch.no_grad():
        outputs = model.generate(
            **inputs,
            max_new_tokens=max_new_tokens,
            temperature=temperature,
            top_p=0.9,
            do_sample=True,
            pad_token_id=tokenizer.eos_token_id,
        )

    # Decode output
    generated_text = tokenizer.decode(outputs[0], skip_special_tokens=True)

    # Extract only the assistant's response
    response = generated_text.split("assistant")[-1].strip() if "assistant" in generated_text else generated_text

    # Parse and validate
    result = parse_mcq_output(response)
    result["raw_output"] = response

    return result

print("‚úÖ Generation function ready!")

‚úÖ Generation function ready!


## 5. Test with Your Custom Text

In [5]:
# YOUR CUSTOM TEXT - Replace this with your own text
test_text = """
Artificial Intelligence (AI) is revolutionizing the way we live and work.
Machine learning, a subset of AI, enables computers to learn from data without
being explicitly programmed. Deep learning, an advanced form of machine learning,
uses neural networks with multiple layers to process complex patterns. These
technologies are being applied in various fields including healthcare, finance,
autonomous vehicles, and natural language processing. AI systems can now perform
tasks that traditionally required human intelligence, such as visual perception,
speech recognition, and decision-making.
"""

# Generate MCQs
result = generate_mcqs(
    text=test_text,
    num_questions=4,  # Generate 4 questions
    temperature=0.7
)

# Display results
print("\n" + "="*80)
print("GENERATION RESULTS")
print("="*80)

if result["success"]:
    print(f"‚úÖ Successfully generated {result['num_questions']} questions!\n")

    # Display questions in readable format
    for i, q in enumerate(result["data"]["questions"], 1):
        print(f"\n{'='*60}")
        print(f"Question {i}: {q['question']}")
        print(f"{'='*60}")
        print(f"A) {q['option_a']}")
        print(f"B) {q['option_b']}")
        print(f"C) {q['option_c']}")
        print(f"D) {q['option_d']}")
        print(f"\n‚úì Correct Answer: {q['correct_answer']}")

    # Display JSON output
    print("\n" + "="*80)
    print("JSON OUTPUT")
    print("="*80)
    print(json.dumps(result["data"], indent=2))

else:
    print(f"‚ùå Failed to generate questions")
    print(f"Error: {result['error']}")
    print(f"\nRaw output:\n{result.get('raw_output', 'N/A')}")

üîÑ Generating MCQs...

GENERATION RESULTS
‚úÖ Successfully generated 4 questions!


Question 1: Which of the following is a subset of Artificial Intelligence?
A) Deep learning
B) Neural networks
C) Machine learning
D) Artificial Neural Networks

‚úì Correct Answer: C

Question 2: What does deep learning use to process complex patterns?
A) Single layer neural networks
B) Multiple layer neural networks
C) Rule-based systems
D) Genetic algorithms

‚úì Correct Answer: B

Question 3: In which field is artificial intelligence NOT currently being applied?
A) Healthcare
B) Finance
C) Autonomous vehicles
D) Space exploration

‚úì Correct Answer: D

Question 4: What kind of tasks can AI systems now perform?
A) Only simple calculations
B) Only basic problem-solving
C) Most cognitive functions
D) Only repetitive tasks

‚úì Correct Answer: C

JSON OUTPUT
{
  "questions": [
    {
      "question": "Which of the following is a subset of Artificial Intelligence?",
      "option_a": "Deep learning",


## 6. Save Output to JSON File

In [6]:
# Save to JSON file
if result["success"]:
    output_file = "generated_mcqs.json"
    with open(output_file, 'w') as f:
        json.dump(result["data"], f, indent=2)

    print(f"‚úÖ Saved MCQs to {output_file}")

    # Download file
    from google.colab import files
    files.download(output_file)
    print("üì• File downloaded to your computer")
else:
    print("‚ö†Ô∏è No valid output to save")

‚úÖ Saved MCQs to generated_mcqs.json


<IPython.core.display.Javascript object>

<IPython.core.display.Javascript object>

üì• File downloaded to your computer


## 7. Batch Processing - Multiple Texts

In [7]:
# Process multiple texts at once
texts = [
    """Python is a high-level programming language known for its simplicity and readability.
    It supports multiple programming paradigms including object-oriented, functional, and procedural.""",

    """The Internet of Things (IoT) refers to the network of physical devices embedded with sensors,
    software, and connectivity that enables them to collect and exchange data over the internet.""",
]

batch_results = []

for idx, text in enumerate(texts, 1):
    print(f"\n{'='*80}")
    print(f"Processing text {idx}/{len(texts)}...")
    print(f"{'='*80}")

    result = generate_mcqs(text, num_questions=3)

    if result["success"]:
        print(f"‚úÖ Generated {result['num_questions']} questions")
        batch_results.append({
            "text_id": idx,
            "text_preview": text[:100] + "...",
            "questions": result["data"]["questions"]
        })
    else:
        print(f"‚ùå Failed: {result['error']}")

# Save batch results
with open('batch_mcqs.json', 'w') as f:
    json.dump(batch_results, f, indent=2)

print(f"\n‚úÖ Batch processing complete! {len(batch_results)} texts processed")
print("üì• Downloading batch results...")

from google.colab import files
files.download('batch_mcqs.json')


Processing text 1/2...
üîÑ Generating MCQs...
‚úÖ Generated 3 questions

Processing text 2/2...
üîÑ Generating MCQs...
‚úÖ Generated 3 questions

‚úÖ Batch processing complete! 2 texts processed
üì• Downloading batch results...


<IPython.core.display.Javascript object>

<IPython.core.display.Javascript object>

## 8. Interactive Testing Cell

In [8]:
# Interactive cell - paste your text here and run

YOUR_TEXT = """
2.3.1. Bluetooth Architecture
Bluetooth defines two types of networks:
Piconet
Scatternet
Piconet
A small, ad-hoc network where one device is the master and controls communication.
It consists of one master (primary) device and up to seven active slave (secondary) devices.
The master coordinates by polling slaves. Slaves can only communicate directly with the master, not with each other.
It's the fundamental building block (e.g., connecting a phone to a headset).
Scatternet
A multi-hop network that combines several piconets.
It's created using "bridge" nodes that act as a master and/or slave in different piconets.
These bridges relay data between piconets on a time-division basis.
2.3.2. Bluetooth TDMA
Bluetooth uses a form of Time Division Multiple Access (TDMA) called TDD-TDMA (time-division duplex TDMA).
Time is divided into slots of 625 Œºs (microseconds).
The master uses even-numbered slots (0, 2, 4, ...).
The slave(s) use odd-numbered slots (1, 3, 5, ...).
This allows the primary and secondary to communicate in half-duplex mode.
2.3.3. Bluetooth Layers
The architecture is split into the Controller and Host stacks.
Controller Stack
Manages the physical and link-level communication.
Radio: Handles transmission and reception of radio waves, including frequency hopping and modulation.
Baseband: Defines packet formats, timing, power control, and the addressing scheme.
Link Manager Protocol (LMP): Manages links, handling tasks like establishing, maintaining, and terminating connections.
It manages both:
connection-oriented (SCO): Used for audio and video.
connection-less (ACL): Used for data.
Host Controller Interface (HCI): A standardized interface allowing the Host stack to send commands to the Controller
and receive events.
Host Stack
Handles higher-level data protocols and services.
Logical Link Control and Adaptation Protocol (L2CAP): Packages data from upper layers for the lower layers, allowing
for multiplexing and segmentation.
Service Discovery Protocol (SDP): Allows a device to discover the services (e.g., file transfer) offered by another
device.
Radio Frequency Communication (RFCOMM): Provides a serial port emulation, often used to replace wired serial
connections.
"""

# Generate MCQs
result = generate_mcqs(YOUR_TEXT, num_questions=5)

# Display
if result["success"]:
    print(json.dumps(result["data"], indent=2))
else:
    print(f"Error: {result['error']}")

üîÑ Generating MCQs...
{
  "questions": [
    {
      "question": "What is the fundamental building block in Bluetooth networks?",
      "option_a": "Piconet",
      "option_b": "Scatternet",
      "option_c": "Radio",
      "option_d": "Baseband",
      "correct_answer": "A"
    },
    {
      "question": "How many active slave devices can be connected to a single master in a Piconet?",
      "option_a": "Up to 8",
      "option_b": "Up to 7",
      "option_c": "Up to 6",
      "option_d": "Up to 5",
      "correct_answer": "B"
    },
    {
      "question": "Which layer manages the physical and link-level communication in Bluetooth?",
      "option_a": "Controller Stack",
      "option_b": "Host Stack",
      "option_c": "Radio Stack",
      "option_d": "Layer 2 Stack",
      "correct_answer": "A"
    },
    {
      "question": "What type of communication mode does the Host Controller Interface (HCI) allow?",
      "option_a": "Half-duplex",
      "option_b": "Full-duplex",
      "o

## Summary

This notebook provides:
- ‚úÖ Load base model + LoRA adapters from HuggingFace
- ‚úÖ Generate MCQs from custom text
- ‚úÖ Parse and validate JSON output
- ‚úÖ Display formatted questions
- ‚úÖ Save to JSON files
- ‚úÖ Batch processing capability

**Usage:**
1. Replace `YOUR_TEXT` in Cell 5 with your content
2. Run the cell to generate MCQs
3. Get structured JSON output
4. Download JSON file to your computer