<a href="https://colab.research.google.com/github/frank-morales2020/MLxDL/blob/main/Qwen3_demo.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

In [None]:
from transformers import AutoModelForCausalLM, AutoTokenizer
import torch

model_name = "Qwen/Qwen3-8B" # Or any other Qwen3 model like "Qwen/Qwen3-30B-A3B"

tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForCausalLM.from_pretrained(
    model_name,
    torch_dtype="auto", # or torch.bfloat16 for bfloat16 models
    device_map="auto"
)

prompt = "Explain the concept of quantum entanglement."
messages = [{"role": "user", "content": prompt}]
text = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)

In [2]:
input_ids = tokenizer(text, return_tensors="pt").input_ids.to(model.device)
output = model.generate(input_ids, max_new_tokens=512)
print(tokenizer.decode(output[0], skip_special_tokens=True))

user
Explain the concept of quantum entanglement.
assistant
<think>
Okay, I need to explain quantum entanglement. Let me start by recalling what I know. Entanglement is a phenomenon in quantum mechanics where particles become interconnected, right? But I should make sure I get the details right.

First, maybe I should mention that entangled particles are in a superposition of states. So, when you measure one, the other's state is instantly determined, no matter the distance. Wait, but how does that work? Einstein called it "spooky action at a distance," which he didn't like. But I think that's been proven through experiments, like Bell's theorem. 

I should explain the basics. Let's say you have two particles, like electrons, that are entangled. Their quantum states are linked. If you measure the spin of one, the other's spin is instantly known, even if they're light-years apart. But it's not that information is transmitted faster than light, right? Because the outcome is random, so no

## AGENT

In [2]:
from transformers import AutoModelForCausalLM, AutoTokenizer
import torch
import json
import os
import re # Import regex module

class Qwen3ExplanationAgent:
    """
    An AI agent designed to explain concepts using the Qwen3 model,
    emulating a structured thought process (analyze, plan, generate, refine).
    This agent directly integrates the Qwen3 model for content generation.
    """

    def __init__(self, model_name="Qwen/Qwen3-8B", name="Qwen3Agent"): # Added 'name' parameter with default
        """
        Initializes the Qwen3 model and tokenizer.
        Args:
            model_name (str): The name of the Qwen3 model to load.
            name (str): The name of the agent.
        """
        self.model_name = model_name
        self.name = name # Initialize the 'name' attribute
        self.tokenizer = None
        self.model = None
        self._load_model()
        # Updated knowledge_base_keywords for flight planning domain
        self.knowledge_base_keywords = {
            "flight planning": ["route optimization", "fuel calculation", "weather considerations", "air traffic control", "regulations"],
            "route optimization": ["great circle route", "wind impact", "restricted airspace", "waypoints"],
            "fuel calculation": ["payload", "distance", "altitude", "reserve fuel", "fuel consumption rate"],
            "weather considerations": ["wind", "turbulence", "icing", "thunderstorms", "visibility"],
            "air traffic control": ["airspace classes", "flight rules", "communication procedures"],
            "aviation regulations": ["flight rules", "licensing", "aircraft maintenance"]
        }
        print(f"[{self.name} - Init] Agent initialized with model: {self.model_name}")

    def _load_model(self):
        """
        Loads the Qwen3 model and tokenizer. This can be resource-intensive.
        """
        print(f"[{self.name} - Model Loading] Loading Qwen3 model '{self.model_name}'...") # Used self.name
        try:
            self.tokenizer = AutoTokenizer.from_pretrained(self.model_name)
            self.model = AutoModelForCausalLM.from_pretrained(
                self.model_name,
                torch_dtype="auto", # Automatically selects best dtype (e.g., bfloat16, float16, float32)
                device_map="auto"   # Automatically maps model layers to available devices (GPU/CPU)
            )
            print(f"[{self.name} - Model Loading] Model loaded successfully on device: {self.model.device}") # Used self.name
        except Exception as e:
            print(f"[{self.name} - Model Loading Error] Failed to load model: {e}") # Used self.name
            print("Please ensure 'transformers' and 'torch' are installed and you have sufficient resources (e.g., GPU memory).")
            self.tokenizer = None # Set to None to indicate failure
            self.model = None    # Set to None to indicate failure

    def _analyze_query(self, query):
        """
        Simulates analyzing the user's query to identify the core concept.
        Args:
            query (str): The user's input query.
        Returns:
            str: The identified concept key (e.g., "flight planning") or None.
        """
        # print(f"[{self.name} - Step 1: Analyzing Query] User asked: '{query}'") # Removed log
        query_lower = query.lower()
        for concept in self.knowledge_base_keywords:
            if concept in query_lower:
                # print(f"[{self.name} - Analysis] Identified core concept: '{concept}'") # Removed log
                return concept
        # print(f"[{self.name} - Analysis] Could not identify a specific concept. Will attempt a general explanation.") # Removed log
        return None

    def _plan_explanation(self, concept_key):
        """
        Simulates planning the structure of the explanation based on the concept.
        Args:
            concept_key (str): The identified concept.
        Returns:
            list: A list of sub-prompts for each section of the explanation.
        """
        # print(f"[{self.name} - Step 2: Planning Explanation] Structuring explanation for '{concept_key}'...") # Removed log
        if concept_key and concept_key in self.knowledge_base_keywords:
            sections = self.knowledge_base_keywords[concept_key]
            plan = [f"Explain the {section.replace('_', ' ')} of {concept_key}." for section in sections]
            # print(f"[{self.name} - Plan] Generated plan: {json.dumps(plan, indent=2)}") # Removed log
            return plan
        else:
            # print(f"[{self.name} - Plan] No specific plan for '{concept_key}'. Will provide a general response.") # Removed log
            return [f"Explain '{concept_key}' in detail."] if concept_key else ["Provide a general explanation for the query."]

    def _generate_section_content(self, section_prompt):
        """
        Uses the Qwen3 model to generate content for a specific section.
        Args:
            section_prompt (str): The specific prompt for generating a section (e.g., "Define quantum entanglement.").
        Returns:
            str: The generated text for that section.
        """
        if not self.model or not self.tokenizer:
            return "Error: Qwen3 model not loaded. Cannot generate content."

        # print(f"[{self.name} - Step 3: Generating Content] Generating for: '{section_prompt}'") # Removed log
        messages = [{"role": "user", "content": section_prompt}]
        text = self.tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)

        # Get both input_ids and attention_mask
        encoded_input = self.tokenizer(text, return_tensors="pt")
        input_ids = encoded_input.input_ids.to(self.model.device)
        attention_mask = encoded_input.attention_mask.to(self.model.device) # Explicitly get attention_mask

        try:
            # Pass attention_mask to the generate method and increase max_new_tokens
            output = self.model.generate(
                input_ids,
                attention_mask=attention_mask, # Pass attention_mask
                max_new_tokens=4096,           # Increased max_new_tokens
                temperature=0.7,
                do_sample=True
            )
            raw_generated_text = self.tokenizer.decode(output[0], skip_special_tokens=True)

            # --- DEBUG PRINT: Raw Generated Text ---
            print(f"\n--- DEBUG: Section Prompt ---\n{section_prompt}\n------------------------------")
            print(f"\n--- DEBUG: Raw Generated Text ---\n{raw_generated_text}\n-----------------------------------\n")

            # --- CRITICAL FIX: Robustly extract assistant's response and then clean <think> tags ---
            assistant_response = ""
            # Pattern to find the last 'assistant' turn and capture everything after it.
            # This accounts for 'assistant\n' or 'ASSISTANT:' or just 'assistant' at the end of a turn.
            # Using re.DOTALL to match across newlines, re.IGNORECASE for flexibility.
            match = re.search(r'(?:assistant|ASSISTANT):?\s*(.*)', raw_generated_text, re.DOTALL | re.IGNORECASE)
            if match:
                assistant_response = match.group(1).strip()
            else:
                # Fallback: if no clear assistant marker is found after generation.
                # This might happen if the model's output deviates significantly.
                # In this case, we'll try to remove the initial user prompt part.
                user_prompt_part = text # 'text' contains the user prompt from apply_chat_template
                # This regex ensures we only remove the *beginning* of the string if it matches the prompt
                user_prompt_pattern = r"^" + re.escape(user_prompt_part.strip()) + r"\s*"
                assistant_response = re.sub(user_prompt_pattern, "", raw_generated_text, flags=re.DOTALL).strip()
                # If after removing the prompt, it's still empty, just take the raw text and hope for the best
                if not assistant_response:
                    assistant_response = raw_generated_text.strip()

            # --- DEBUG PRINT: Assistant Response (before cleaning) ---
            #print(f"\n--- DEBUG: Assistant Response (before cleaning) ---\n{assistant_response}\n-----------------------------------\n")

            # Now, apply cleaning only to the extracted assistant's response
            # Step 1: Remove complete <think>...</think> blocks
            cleaned_text = re.sub(r'<think>.*?</think>', '', assistant_response, flags=re.DOTALL)
            # Step 2: Remove any remaining <think> tags and everything after them (for unclosed tags)
            # This regex will remove '<think>' and everything that follows it until the end of the string.
            cleaned_text = re.sub(r'<think>.*', '', cleaned_text, flags=re.DOTALL)

            final_text = cleaned_text.strip() # Ensure leading/trailing whitespace is removed after cleaning

            # --- DEBUG PRINT: Final Cleaned Text ---
            print(f"\n--- DEBUG: Final Cleaned Text ---\n{final_text}\n-----------------------------------\n")

            # print(f"[{self.name} - Generation] Generated {len(final_text.split())} words for section.") # Removed log
            return final_text
        except Exception as e:
            print(f"[{self.name} - Generation Error] Failed to generate content: {e}") # Used self.name
            return f"Failed to generate content for '{section_prompt}' due to an error."

    def _assemble_and_refine_response(self, concept_key, generated_sections):
        """
        Simulates assembling and refining the generated content into a final response.
        Args:
            concept_key (str): The identified concept.
            generated_sections (dict): Dictionary of generated content for each section.
        Returns:
            str: The final, polished explanation.
        """
        # print(f"[{self.name} - Step 4: Assembling & Refining] Combining sections and polishing...") # Removed log
        final_response_parts = []

        if concept_key:
            final_response_parts.append(f"Here's an explanation of **{concept_key.replace('_', ' ').title()}**:")

            # Order the sections based on the predefined plan or a logical flow
            ordered_sections = self.knowledge_base_keywords.get(concept_key, [])
            for section_type in ordered_sections:
                if section_type in generated_sections and generated_sections[section_type]:
                    title = section_type.replace('_', ' ').title()
                    final_response_parts.append(f"\n### {title}\n{generated_sections[section_type]}")
        else:
            # If no specific concept, just present the general explanation
            if "general_explanation" in generated_sections:
                final_response_parts.append(generated_sections["general_explanation"])
            else:
                final_response_parts.append("I couldn't generate a specific explanation for your query.")


        # Add a concluding remark
        final_response_parts.append("\n\nI hope this detailed explanation is helpful!")

        final_response = "\n".join(final_response_parts)
        # print(f"[{self.name} - Refinement] Final response assembled.") # Removed log
        return final_response

    def explain_concept(self, query):
        """
        Orchestrates the agent's full thought process to explain a concept.
        Args:
            query (str): The user's query.
        Returns:
            str: The agent's final explanation.
        """
        print("\n--- Agentic Process Start ---") # Keep this to show start
        if not self.model or not self.tokenizer:
            print("[Agent - Error] Model not ready. Cannot proceed with explanation.")
            return "I am unable to process your request as the underlying model could not be loaded. Please check your environment."

        concept_key = self._analyze_query(query)
        plan_sections = self._plan_explanation(concept_key)

        generated_content = {}
        if concept_key:
            # Generate content for each planned section using the Qwen3 model
            for section_prompt_template in plan_sections:
                # Extract the section type from the prompt for dict key (e.g., "definition")
                # This is a bit heuristic, assuming prompt like "Explain the definition of X."
                section_type = section_prompt_template.split("Explain the ")[1].split(" of")[0].replace(' ', '_')
                generated_content[section_type] = self._generate_section_content(section_prompt_template)
        else:
            # If no specific concept, generate a general explanation
            general_prompt = f"Explain '{query}' in detail."
            generated_content["general_explanation"] = self._generate_section_content(general_prompt)


        final_explanation = self._assemble_and_refine_response(concept_key, generated_content)
        print("--- Agentic Process End ---\n") # Keep this to show end
        return final_explanation

def main():
    """
    Main function to run the interactive Qwen3 explanation agent.
    """
    # Initialize the agent. This will load the Qwen3 model.
    # Ensure you have 'transformers' and 'torch' installed.
    # This step can take a while and consume significant memory.
    agent = Qwen3ExplanationAgent()

    if not agent.model:
        print("Agent could not be initialized. Exiting.")
        return

    print("\nStarting Qwen3 Agentic Explanation System in fully automatic mode.")
    # Define the query to be processed automatically for flight planning
    automatic_query = "Explain flight planning." # Changed to flight planning

    response = agent.explain_concept(automatic_query)
    print(f"\nAgent's Automatic Explanation for '{automatic_query}':\n{response}")

if __name__ == "__main__":
    main()


[Qwen3Agent - Model Loading] Loading Qwen3 model 'Qwen/Qwen3-8B'...


Loading checkpoint shards:   0%|          | 0/5 [00:00<?, ?it/s]

[Qwen3Agent - Model Loading] Model loaded successfully on device: cuda:0
[Qwen3Agent - Init] Agent initialized with model: Qwen/Qwen3-8B

Starting Qwen3 Agentic Explanation System in fully automatic mode.

--- Agentic Process Start ---

--- DEBUG: Section Prompt ---
Explain the route optimization of flight planning.
------------------------------

--- DEBUG: Raw Generated Text ---
user
Explain the route optimization of flight planning.
assistant
<think>
Okay, I need to explain route optimization in flight planning. Let me start by recalling what I know. Flight planning involves determining the most efficient path for an aircraft from departure to destination. Route optimization would be about finding the best route considering various factors. 

First, I should think about the main factors that influence flight routes. Weather is a big one—storms, wind patterns, turbulence. Then there's air traffic control, as flights need to follow certain corridors and avoid congestion. Fuel efficien