Classify this scientific image into one of the following categories:

TABLES/CHARTS:
- Data table
- Statistical table
- Comparison table

GRAPHS/PLOTS:
- Line graph
- Bar chart
- Scatter plot
- Box plot
- Histogram
- Pie chart
- Heat map
- Network graph
- Time series

MICROSCOPY:
- Light microscopy
- Electron microscopy
- Fluorescence microscopy
- Confocal microscopy
- Super-resolution microscopy

SPECTROSCOPY:
- Mass spectroscopy
- NMR spectroscopy
- IR spectroscopy
- UV-vis spectroscopy
- X-ray spectroscopy

MEDICAL/BIOLOGICAL IMAGING:
- X-ray
- CT scan
- MRI scan
- Ultrasound
- PET scan
- Histology slide
- Western blot
- Gel electrophoresis

DIAGRAMS/ILLUSTRATIONS:
- Chemical structure
- Molecular diagram
- Anatomical illustration
- Flowchart
- Schematic diagram
- Circuit diagram
- Mechanical diagram
- Process flow diagram

MAPS/GEOGRAPHICAL:
- Geographic map
- GIS visualization
- Satellite image
- Terrain model

MATHEMATICAL:
- Equation
- Mathematical model
- Geometric figure
- Mathematical plot

COMPUTER-GENERATED:
- 3D rendering
- Simulation visualization
- Computer model

MISCELLANEOUS:
- Field photograph
- Sample photograph
- Experimental setup
- Equipment photograph
- Screenshot
- Logo/Institutional insignia
- Cover art
- Author photograph
- Infographic
- Conceptual illustration



In [1]:
IMAGE_CLASSIFICATION_SYSTEM_PROMPT = """
You are an expert image classifier specializing in scientific literature. Your task is to classify images from academic papers and research documents into precise numbered categories. Analyze each image carefully, considering its visual characteristics, content, and typical usage in scientific literature.

Return your analysis as a valid JSON object with exactly this format:
{
  "class": int,
  "confidence": float
}

Where "class" is the integer corresponding to the image category, and "confidence" is a number between 0.0 and 1.0 representing your confidence in the classification. Do not include any other fields in your response - only these two fields in this exact format.
"""
IMAGE_CLASSIFICATION_USER_PROMPT = """
Classify this scientific image into exactly one of the following numbered categories:

1. Data table
2. Statistical table
3. Line graph
4. Bar chart
5. Scatter plot
6. Box plot
7. Histogram
8. Pie chart
9. Heat map
10. Network graph
11. Time series plot
12. Light microscopy
13. Electron microscopy
14. Fluorescence microscopy
15. Confocal microscopy
16. Mass spectroscopy
17. NMR spectroscopy
18. IR spectroscopy
19. UV-vis spectroscopy
20. X-ray spectroscopy
21. X-ray (medical)
22. CT scan
23. MRI scan
24. Ultrasound
25. PET scan
26. Histology slide
27. Western blot
28. Gel electrophoresis
29. Chemical structure
30. Molecular diagram
31. Anatomical illustration
32. Flowchart
33. Schematic diagram
34. Circuit diagram
35. Mechanical diagram
36. Process flow diagram
37. Geographic map
38. GIS visualization
39. Satellite image
40. Equation/Mathematical expression
41. Geometric figure
42. 3D rendering
43. Computer simulation
44. Field photograph
45. Sample photograph
46. Experimental setup
47. Equipment photograph
48. Screenshot
49. Logo/Institutional insignia
50. Infographic
51. other

Return only a JSON with the class number (integer) and your confidence (float between 0.0 and 1.0) in this exact format:
{
  "class": int,
  "confidence": float
}
"""

In [5]:
from openai import AzureOpenAI, OpenAI
from azureml.rag.utils.connections import get_connection_by_id_v2
from azureml.rag.utils.logging import get_logger, safe_mlflow_start_run, track_activity
import os

logger = get_logger("document_analyzer")

def setup_openai_client(connection_id: str, type: str = "openai"):
    """Set up Azure OpenAI client using connection."""
    try:
        # Get connection details
        connection = get_connection_by_id_v2(connection_id)

        # Safely access endpoint and api_key from dictionary
        endpoint = connection.get("endpoint") or connection.get("properties", {}).get("metadata", {}).get("endpoint")
        api_key = connection.get("api_key") or connection.get("properties", {}).get("credentials", {}).get("keys", {}).get("api_key")

        if not endpoint or not api_key:
            raise ValueError("Missing endpoint or api_key in connection details.")

        if type == "openai":
            # Create client for openai
            client = OpenAI(
                api_key=api_key,
                base_url=endpoint
            )
        else:                
            # Create client for  vision
            client = AzureOpenAI(
                api_key=api_key,
                api_version="2025-01-01-preview",
                azure_endpoint=endpoint
            )

        return client

    except Exception as e:
        logger.error(f"Failed to setup Azure OpenAI connection: {str(e)}")
        raise

In [6]:
# Example of registering the component in a workspace
from azure.ai.ml import MLClient
from azure.identity import DefaultAzureCredential

# Get workspace
ml_client = MLClient.from_config(
    credential=DefaultAzureCredential()
)

Found the config file in: /config.json


In [7]:
# aoai_connection_name = "open_ai_connection"
aoai_connection_name = "deepseek"
acs_connection_name = "acs-connection"
data_set_name = "papers"
asset_name = "aoai_acs_mlindex"
doc_intelligence_connection_name = "doc-intelligence-connection"
vision_deploy_name = "gpt-4"
aoai_embedding_model_name = "text-embedding-3-large"

acs_config = {
    "index_name": "qknows-embedding",
}

experiment_name = "sample-acs-embedding"

aoai_connection_id = ml_client.connections.get(aoai_connection_name).id

aoai_cleint = setup_openai_client(aoai_connection_id)

# Define your deployment name for the vision model (e.g., "gpt-4v")
vision_deployment_name = "gpt-4v"

In [8]:
url = aoai_cleint.base_url
str(url)

'https://deepseek-r1-ilhid.westus3.models.ai.azure.com'

In [11]:
chat_completion = aoai_cleint.chat.completions.create(
    messages=[
        {
            "role": "user",
            "content": "Say this is a test",
        }
    ],
    model="deepseek-r1",
)

print(chat_completion)

ChatCompletion(id='a70d405895644827962dde9033e4acf0', choices=[Choice(finish_reason='stop', index=0, logprobs=None, message=ChatCompletionMessage(content='<think>\nOkay, the user said "Say this is a test". Let me make sure I understand what they\'re asking for. They probably want me to respond by repeating the phrase "this is a test". Maybe they\'re checking if the system is working or just trying out how the bot responds. Let me keep it simple and just say exactly that. Sometimes users want a straightforward reply without any extra information. I\'ll go with "This is a test." as the response. That should cover it. If they need more, they\'ll ask again.\n</think>\n\nThis is a test.', refusal=None, role='assistant', audio=None, function_call=None, tool_calls=None, reasoning_content=None))], created=1741116879, model='deepseek-r1', object='chat.completion', service_tier=None, system_fingerprint=None, usage=CompletionUsage(completion_tokens=120, prompt_tokens=8, total_tokens=128, completi

In [13]:
print(chat_completion.choices[0].message.content)

<think>
Okay, the user said "Say this is a test". Let me make sure I understand what they're asking for. They probably want me to respond by repeating the phrase "this is a test". Maybe they're checking if the system is working or just trying out how the bot responds. Let me keep it simple and just say exactly that. Sometimes users want a straightforward reply without any extra information. I'll go with "This is a test." as the response. That should cover it. If they need more, they'll ask again.
</think>

This is a test.


In [26]:
# pip install azure-ai-inference
import os
from azure.ai.inference import ChatCompletionsClient
from azure.core.credentials import AzureKeyCredential
connection = get_connection_by_id_v2(aoai_connection_id)
api_key = connection.get("properties", {}).get("credentials", {}).get("keys", {}).get("api_key")
if not api_key:
  raise Exception("A key should be provided to invoke the endpoint")

client = ChatCompletionsClient(
    endpoint='https://DeepSeek-R1-ilhid.westus3.models.ai.azure.com',
    credential=AzureKeyCredential(api_key)
)

model_info = client.get_model_info()
print("Model name:", model_info.model_name)
print("Model type:", model_info.model_type)
print("Model provider name:", model_info.model_provider_name)

payload = {
  "messages": [
    {
      "role": "user",
      "content": "I am going to Paris, what should I see?"
    },
    {
      "role": "assistant",
      "content": "Paris, the capital of France, is known for its stunning architecture, art museums, historical landmarks, and romantic atmosphere. Here are some of the top attractions to see in Paris:\n\n1. The Eiffel Tower: The iconic Eiffel Tower is one of the most recognizable landmarks in the world and offers breathtaking views of the city.\n2. The Louvre Museum: The Louvre is one of the world's largest and most famous museums, housing an impressive collection of art and artifacts, including the Mona Lisa.\n3. Notre-Dame Cathedral: This beautiful cathedral is one of the most famous landmarks in Paris and is known for its Gothic architecture and stunning stained glass windows.\n\nThese are just a few of the many attractions that Paris has to offer. With so much to see and do, it's no wonder that Paris is one of the most popular tourist destinations in the world."
    },
    {
      "role": "user",
      "content": "What is so great about #1?"
    }
  ],
  "max_tokens": 2048
}
response = client.complete(payload)

print("Response:", response.choices[0].message.content)
print("Model:", response.model)
print("Usage:")
print("	Prompt tokens:", response.usage.prompt_tokens)
print("	Total tokens:", response.usage.total_tokens)
print("	Completion tokens:", response.usage.completion_tokens)

Model name: deepseek-r1
Model type: chat-completion
Model provider name: DeepSeek
Response: <think>
Okay, the user asked, "What is so great about #1?" referring to the Eiffel Tower from my previous list. I need to elaborate on why the Eiffel Tower is a must-see. First, I should consider their context: they're planning a trip to Paris and looking for highlights. They probably want to know specifics that make it stand out. 

They might be curious about its history, unique features, or experiences offered. Maybe they're wondering if it's worth visiting despite being so touristy. I should highlight its historical significance, architectural marvel, the views, and maybe some tips like visiting at different times or dining options. Also, mention its cultural impact as a symbol of Paris and romance.

Should I structure the answer with clear sections? Maybe bullet points again, but in a more detailed way. Also, check facts: when was it built, who designed it, any interesting facts like initial

In [7]:
class ImageClassification(DocumentProcessor):
    def __init__(self, input_folder, output_folder, openai_client, vision_deployment_name):
        super().__init__(input_folder, output_folder, openai_client, vision_deployment_name)
        self.vision_client = openai_client
        self.vision_deployment_name = vision_deployment_name
    
    def get_image_classification(self, image_path):
        try:
            base64_image = self.encode_image(image_path)
            if not base64_image:
                print(f"Skipping classification for invalid image: {image_path}")
                return None
            
            max_retries = 3
            for attempt in range(max_retries):
                try:
                    # Method 1: Try with response_format as JSON object
                    # This is the format for newer API versions
                    try:
                        response = self.vision_client.chat.completions.create(
                            model=self.vision_deployment_name,
                            response_format={"type": "json_object"},
                            messages=[
                                {
                                    "role": "system",
                                    "content": IMAGE_CLASSIFICATION_SYSTEM_PROMPT
                                },
                                {
                                    "role": "user",
                                    "content": [
                                        {"type": "text", "text": IMAGE_CLASSIFICATION_USER_PROMPT},
                                        {"type": "image_url", "image_url": {"url": f"data:image/jpeg;base64,{base64_image}"}}
                                    ]
                                }
                            ],
                            temperature=0.0,
                            max_tokens=100
                        )
                        return json.loads(response.choices[0].message.content)
                    
                    except Exception as e:
                        if "extra fields not permitted" in str(e):
                            # Method 2: Try without response_format parameter
                            # This is for older API versions that don't support response_format
                            response = self.vision_client.chat.completions.create(
                                model=self.vision_deployment_name,
                                messages=[
                                    {
                                        "role": "system",
                                        "content": IMAGE_CLASSIFICATION_SYSTEM_PROMPT + "\nYour response must be valid JSON."
                                    },
                                    {
                                        "role": "user",
                                        "content": [
                                            {"type": "text", "text": IMAGE_CLASSIFICATION_USER_PROMPT},
                                            {"type": "image_url", "image_url": {"url": f"data:image/jpeg;base64,{base64_image}"}}
                                        ]
                                    }
                                ],
                                temperature=0.0,
                                max_tokens=100
                            )
                            return json.loads(response.choices[0].message.content)
                        else:
                            raise e
                        
                except json.JSONDecodeError:
                    # If the response isn't valid JSON, try to extract JSON from the text
                    content = response.choices[0].message.content
                    try:
                        # Try to find JSON within the response
                        json_start = content.find('{')
                        json_end = content.rfind('}') + 1
                        if json_start >= 0 and json_end > json_start:
                            json_str = content[json_start:json_end]
                            return json.loads(json_str)
                        else:
                            print(f"Could not extract JSON from response: {content}")
                            return None
                    except Exception as json_extract_err:
                        print(f"Failed to parse JSON response: {str(json_extract_err)}")
                        return None
                
                except Exception as e:
                    if attempt < max_retries - 1:
                        print(f"Image classification attempt {attempt + 1} failed: {str(e)}")
                        continue
                    else:
                        print(f"All image classification attempts failed for {image_path}")
                        return None
                        
        except Exception as e:
            print(f"Image classification failed: {str(e)}")
            return None

In [8]:
# Initialize the image classification service
classifier = ImageClassification(
    input_folder= "/home/azureuser/academic-document-analyzer/docs/",
    output_folder= ".",
    openai_client=aoai_cleint,
    vision_deployment_name=vision_deployment_name
)

# Path to the image you want to classify
image_path = "/home/azureuser/academic-document-analyzer/docs/indexer_pipeline-next-step.png"

# Classify the image
result = classifier.get_image_classification(image_path)

# Process the result
if result:
    class_id = result["class"]
    confidence = result["confidence"]
    
    # Map class ID to class name (you would have this mapping defined somewhere)
    class_names = {
        1: "Data table",
        2: "Statistical table",
        3: "Line graph",
        4: "Bar chart",
        5: "Scatter plot",
        6: "Box plot",
        7: "Histogram",
        8: "Pie chart",
        9: "Heat map",
        10: "Network graph",                
        11: "Time series plot",
        12: "Light microscopy",
        13: "Electron microscopy",
        14: "Fluorescence microscopy",
        15: "Confocal microscopy",
        16: "Mass spectroscopy",
        17: "NMR spectroscopy",
        18: "IR spectroscopy",
        19: "UV-vis spectroscopy",
        20: "X-ray spectroscopy",
        21: "X-ray (medical)",
        22: "CT scan",
        23: "MRI scan",
        24: "Ultrasound",
        25: "PET scan",
        26: "Histology slide",
        27: "Western blot",
        28: "Gel electrophoresis",
        29: "Chemical structure",
        30: "Molecular diagram",
        31: "Anatomical illustration",
        32: "Flowchart",
        33: "Schematic diagram",
        34: "Circuit diagram",
        35: "Mechanical diagram",
        36: "Process flow diagram",
        37: "Geographic map",
        38: "GIS visualization",    
        39: "Satellite image",
        40: "Equation/Mathematical expression",
        41: "Geometric figure",
        42: "3D rendering",     
        43: "Computer simulation",
        44: "Field photograph",
        45: "Sample photograph",
        46: "Experimental setup",
        47: "Equipment photograph",
        48: "Screenshot",
        49: "Logo/Institutional insignia",
        50: "Infographic",
        51: "other"
    }
    
    class_name = class_names.get(class_id, "Unknown class")
    
    print(f"Image classified as: {class_name} (ID: {class_id})")
    print(f"Confidence: {confidence:.2f}")
else:
    print("Classification failed")

Image classified as: Flowchart (ID: 32)
Confidence: 0.95
