<a href="https://colab.research.google.com/github/frank-morales2020/MLxDL/blob/main/Qwen3_VL_DEMO.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

In [4]:
from openai import OpenAI
from google.colab import userdata

OPENROUTER_API_KEY=userdata.get('OPENROUTER_API_KEY')
if not OPENROUTER_API_KEY:
    raise ValueError("OPENROUTER_API_KEY is not set")

client = OpenAI(
  base_url="https://openrouter.ai/api/v1",
  api_key=OPENROUTER_API_KEY,
)

## Instruct

In [5]:
completion = client.chat.completions.create(
  extra_body={},
  model="qwen/qwen3-vl-8b-instruct",
  messages=[
    {
      "role": "user",
      "content": [
        {
          "type": "text",
          "text": "What is in this image?"
        },
        {
          "type": "image_url",
          "image_url": {
            "url": "https://upload.wikimedia.org/wikipedia/commons/thumb/d/dd/Gfp-wisconsin-madison-the-nature-boardwalk.jpg/2560px-Gfp-wisconsin-madison-the-nature-boardwalk.jpg"
          }
        }
      ]
    }
  ]
)
print(completion.choices[0].message.content)

This image depicts a serene and picturesque natural landscape. Here's a breakdown of what is visible:

- **A Wooden Boardwalk:** The central focus is a weathered wooden boardwalk or path that stretches from the foreground into the distance, leading the viewer's eye toward the horizon. It cuts through the tall grass.

- **Lush Green Grassland:** The boardwalk is flanked on both sides by tall, vibrant green grasses, suggesting a meadow, marsh, or wetland environment. The grass appears healthy and sunlit.

- **Distant Treeline:** In the background, a line of trees and shrubs marks the edge of the open field. Some of the vegetation on the right side has a golden or autumnal hue, contrasting with the green.

- **Vast Blue Sky:** The sky takes up the upper half of the image and is a brilliant blue, filled with wispy, streaky white clouds. The lighting suggests either early morning or late afternoon, casting a warm glow on the scene.

Overall, the image evokes a sense of peace, tranquility, a

## Thinking

In [6]:
completion = client.chat.completions.create(
  extra_body={},
  model="qwen/qwen3-vl-8b-thinking",
  messages=[
    {
      "role": "user",
      "content": [
        {
          "type": "text",
          "text": "What is in this image?"
        },
        {
          "type": "image_url",
          "image_url": {
            "url": "https://upload.wikimedia.org/wikipedia/commons/thumb/d/dd/Gfp-wisconsin-madison-the-nature-boardwalk.jpg/2560px-Gfp-wisconsin-madison-the-nature-boardwalk.jpg"
          }
        }
      ]
    }
  ]
)
print(completion.choices[0].message.content)

The image depicts a serene natural landscape featuring a **wooden boardwalk** that stretches into the distance, flanked by tall, lush green grasses. The boardwalk is surrounded by a vibrant, grassy field that extends toward the horizon. In the background, there are clusters of trees and shrubs, some displaying hints of autumnal colors. Above, the sky is a bright, clear blue with wispy, scattered clouds, suggesting a pleasant, sunny day. The scene conveys a peaceful, open-air environment, likely a wetland, meadow, or nature reserve, emphasizing tranquility and the beauty of nature.


## AGENTIC

In [1]:
import os
import time
import re
from openai import OpenAI
from google.colab import userdata
from typing import Dict, Any, Tuple
import base64
import requests

# --- Configuration ---
# Set up the OpenRouter client using the API key from Colab secrets
OPENROUTER_API_KEY = userdata.get('OPENROUTER_API_KEY')
if not OPENROUTER_API_KEY:
    raise ValueError("OPENROUTER_API_KEY is not set. Please set it in Colab Secrets.")

client = OpenAI(
    base_url="https://openrouter.ai/api/v1",
    api_key=OPENROUTER_API_KEY,
)

# --- Define Models to Compare ---
MODELS = {
    "Instruct": "qwen/qwen3-vl-8b-instruct",
    "Thinking": "qwen/qwen3-vl-8b-thinking",
}

# --- Complex Multimodal Reasoning Task (ADJUSTED) ---
# The task now requires: 1. Object counting, 2. Spatial reasoning, 3. Inferential conclusion.

CHART_URL = "https://upload.wikimedia.org/wikipedia/commons/thumb/d/dd/Gfp-wisconsin-madison-the-nature-boardwalk.jpg/2560px-Gfp-wisconsin-madison-the-nature-boardwalk.jpg"

COMPLEX_TASK_PROMPT = (
    "Analyze the image of the boardwalk: "
    "1. Estimate the number of distinct vertical posts visible on the left side of the path from foreground to background. "
    "2. Determine if the path is primarily heading north or south, based *only* on the apparent sun direction (assume late afternoon). "
    "3. Conclude by stating the compass direction the photographer is facing. "
    "**You must break down your steps and reasoning for the sun's position before giving the final answer.**"
)

# --- Agentic Execution Function ---

def run_multimodal_test(model_name: str, model_id: str, prompt: str, image_url: str) -> Dict[str, Any]:
    """Runs a single test and collects performance metrics."""
    print(f"--- Running Test for: {model_name} ({model_id}) ---")
    start_time = time.time()

    # Construct content list using the image URL
    content_list = [
        {"type": "text", "text": prompt},
        {"type": "image_url", "image_url": {"url": image_url}}
    ]

    try:
        completion = client.chat.completions.create(
            extra_body={},
            model=model_id,
            messages=[{"role": "user", "content": content_list}],
        )

        end_time = time.time()
        output_content = completion.choices[0].message.content
        latency = end_time - start_time

        # Metric 1: Check for explicit Chain-of-Thought (CoT) trace
        has_cot = bool(re.search(r'<think>.*?</think>', output_content, re.DOTALL))
        # Count the length of the reasoning block, if present
        cot_match = re.search(r'<think>(.*?)</think>', output_content, re.DOTALL)
        cot_token_length = len(cot_match.group(1).split()) if cot_match else 0

        # Metric 2: Attempt to extract the final compass direction
        final_answer_match = re.search(r'photographer is facing\s*(north|south|east|west)\b', output_content, re.IGNORECASE)
        final_answer = final_answer_match.group(1).capitalize() if final_answer_match else "N/A"

        # Metric 3: Token Count
        total_tokens = completion.usage.total_tokens

        return {
            "model": model_name,
            "latency_s": round(latency, 2),
            "total_tokens": total_tokens,
            "has_cot": has_cot,
            "cot_token_length": cot_token_length,
            "final_answer_extracted": final_answer,
            "raw_output": output_content,
            "success": True
        }

    except Exception as e:
        print(f"An error occurred: {e}")
        return {"model": model_name, "success": False, "error": str(e)}

# --- Main Benchmark Execution ---

if __name__ == '__main__':

    results = []

    # Run tests for each model
    for name, model_id in MODELS.items():
        result = run_multimodal_test(name, model_id, COMPLEX_TASK_PROMPT, CHART_URL)
        results.append(result)
        print(f"Status: {result.get('success', False)}, Latency: {result.get('latency_s')}s")
        print("-" * 50)

    # --- Print Comparison Results ---

    print("\n" + "=" * 60)
    print("           Qwen3 VL 8B Instruct vs. Thinking COMPARISON")
    print("=" * 60)

    for r in results:
        print(f"\nModel: {r['model']}")
        if r['success']:
            print(f"  Total Latency: {r['latency_s']} seconds ⏱️")
            print(f"  Total Tokens: {r['total_tokens']}")
            print(f"  Explicit CoT Found: {r['has_cot']}")
            print(f"  CoT Token Length (Approx): {r['cot_token_length']} tokens")
            print(f"  Final Answer Attempt (Direction): {r['final_answer_extracted']}")

            # Key difference insight
            if r['model'] == 'Thinking' and r['has_cot']:
                print("  **Insight: Thinking model engaged deep reasoning.**")
            elif r['model'] == 'Instruct' and not r['has_cot']:
                print("  **Insight: Instruct model prioritized direct answer.**")

            # Optional: Print truncated output to see CoT structure
            print("\n--- RAW OUTPUT (First 500 characters) ---")
            print(r['raw_output'][:500] + "...")
            print("----------------------------\n")
        else:
            print(f"  Test Failed: {r['error']}")

--- Running Test for: Instruct (qwen/qwen3-vl-8b-instruct) ---
Status: True, Latency: 10.87s
--------------------------------------------------
--- Running Test for: Thinking (qwen/qwen3-vl-8b-thinking) ---
Status: True, Latency: 262.55s
--------------------------------------------------

           Qwen3 VL 8B Instruct vs. Thinking COMPARISON

Model: Instruct
  Total Latency: 10.87 seconds ⏱️
  Total Tokens: 2720
  Explicit CoT Found: False
  CoT Token Length (Approx): 0 tokens
  Final Answer Attempt (Direction): East
  **Insight: Instruct model prioritized direct answer.**

--- RAW OUTPUT (First 500 characters) ---
Let’s break this down step by step as requested.

---

**1. Estimate the number of distinct vertical posts visible on the left side of the path from foreground to background.**

Looking at the left side of the wooden boardwalk (the viewer’s left, which is the path’s left as it extends into the distance), we can see vertical posts that appear to be supporting the boardwalk 