# Isaac-0.1 FiftyOne Integration Example

[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/harpreetsahota204/isaac0_1/blob/main/isaac_0_1_example.ipynb)

This notebook demonstrates how to use Isaac-0.1 by Perceptron AI with FiftyOne for various computer vision tasks including object detection, OCR, classification, and visual question answering.

## About Isaac-0.1

Isaac-0.1 is an open-source, 2B-parameter perceptive-language model designed for real-world visual understanding tasks. It delivers capabilities comparable to models 50x larger while being efficient enough for practical applications.


## Installation

First, let's install the required dependencies:


In [None]:
%pip install -q fiftyone
%pip install -q perceptron
%pip install -q transformers
%pip install -q torch torchvision
%pip install -q huggingface-hub


## Setup

Let's import the necessary libraries and suppress warnings for cleaner output:


In [None]:
import warnings
from transformers import logging

# Suppress transformers warnings for cleaner output
logging.set_verbosity_error()
warnings.filterwarnings("ignore")

import fiftyone as fo
import fiftyone.zoo as foz
import fiftyone.utils.huggingface as fouh


## Register and Load Isaac-0.1 Model

First, we need to register the Isaac-0.1 model zoo source and then load the model:


In [None]:
# Register the Isaac-0.1 model zoo source
foz.register_zoo_model_source(
    "https://github.com/harpreetsahota204/isaac0_1",
    overwrite=True
)

# Load the Isaac-0.1 model
print("Loading Isaac-0.1 model...")
model = foz.load_zoo_model("PerceptronAI/Isaac-0.1")
print("Model loaded successfully!")


## Part 1: Testing on Generic Images

Let's load a sample dataset and test various operations on generic images:


In [None]:
# Load a sample dataset from Hugging Face
dataset = fouh.load_from_hub(
    "Voxel51/GQA-Scene-Graph",
    max_samples=50,  # Using fewer samples for demo
)

print(f"Loaded {len(dataset)} samples")
print(f"First sample fields: {dataset.first().field_names}")


In [None]:
# Extract unique object labels from the dataset
sample_objects = dataset.values("detections.detections.label")
sample_level_objects = [list(set(obj)) if obj else [] for obj in sample_objects]
dataset.set_values("sample_level_objects", sample_level_objects)

# Display a sample of objects found
print("Sample objects found in first image:", sample_level_objects[0][:10] if sample_level_objects[0] else "None")


### 1.1 Visual Question Answering (VQA)

Let's use Isaac-0.1 to answer questions about the images:


In [None]:
# Set model to VQA mode
model.operation = "vqa"
print(f"System prompt for VQA:\n{model.system_prompt}\n")

# Set the question prompt
model.prompt = "Provide a short description of the spatial relationships between the objects in this scene"

# Apply the model to a subset of samples
print("Running VQA on dataset...")
dataset.apply_model(model, label_field="vqa_description")

# Display results
first_sample = dataset.first()
print(f"VQA Result for first image:\n{first_sample.vqa_description}")


### 1.2 Object Detection

Now let's detect objects in the images:


In [None]:
# Set model to detection mode
model.operation = "detect"
print(f"System prompt for detection:\n{model.system_prompt[:200]}...\n")

# Use sample-level prompts for detection (objects from the ground truth)
print("Running object detection using sample-level prompts...")
dataset.apply_model(
    model, 
    label_field="isaac_detections", 
    prompt_field="sample_level_objects"
)

# Display detection results
first_sample = dataset.first()
if first_sample.isaac_detections and first_sample.isaac_detections.detections:
    print(f"Detected {len(first_sample.isaac_detections.detections)} objects in first image:")
    for det in first_sample.isaac_detections.detections[:5]:  # Show first 5
        print(f"  - {det.label}")
else:
    print("No detections in first image")


### 1.3 Keypoint Detection

Let's identify key points in the images:

In [None]:
# Set model to keypoint detection mode
model.operation = "point"
print(f"System prompt for keypoints:\n{model.system_prompt[:200]}...\n")

# Apply keypoint detection using sample-level prompts
print("Running keypoint detection...")
dataset.limit(10).apply_model(
    model, 
    label_field="isaac_keypoints", 
    prompt_field="sample_level_objects"
)

# Display keypoint results
first_sample = dataset.first()
if first_sample.isaac_keypoints and first_sample.isaac_keypoints.keypoints:
    print(f"Detected {len(first_sample.isaac_keypoints.keypoints)} keypoints in first image:")
    for kp in first_sample.isaac_keypoints.keypoints[:5]:  # Show first 5
        print(f"  - {kp.label}")
else:
    print("No keypoints detected in first image")


### 1.4 Image Classification

Let's classify the weather/environment in the images:

In [None]:
# Set model to classification mode
model.operation = "classify"
print(f"System prompt for classification:\n{model.system_prompt[:200]}...\n")

# Set classification prompt
model.prompt = "Classify the weather/environment in this scene into exactly one of the following: sunny, rainy, snowy, cloudy, indoor"

# Apply classification
print("Running classification...")
dataset.limit(10).apply_model(model, label_field="weather_classification")

# Display classification results
first_sample = dataset.first()
if first_sample.weather_classification and first_sample.weather_classification.classifications:
    print(f"Weather classification for first image:")
    for cls in first_sample.weather_classification.classifications:
        print(f"  - {cls.label}")
else:
    print("No classification for first image")


## Part 2: Testing on Text Images (OCR)

Now let's test Isaac-0.1's OCR capabilities on images containing text:


In [None]:
# Load a dataset with text images
print("Loading text dataset...")
text_dataset = fouh.load_from_hub(
    "Voxel51/Total-Text-Dataset",
    max_samples=20  # Using fewer samples for demo
)

print(f"Loaded {len(text_dataset)} text samples")


### 2.1 OCR Text Extraction

Extract text content from images:


In [None]:
# Set model to OCR mode for text extraction
model.operation = "ocr"
print(f"System prompt for OCR:\n{model.system_prompt[:200]}...\n")

# Set OCR prompt
model.prompt = "Report all text visible in this image"

# Apply OCR to extract text
print("Running OCR text extraction...")
text_dataset.limit(10).apply_model(model, label_field="extracted_text")

# Display results
first_text_sample = text_dataset.first()
if first_text_sample.extracted_text:
    print(f"Extracted text from first image:\n{first_text_sample.extracted_text[:200]}...")
else:
    print("No text extracted from first image")


### 2.2 OCR Text Detection

Detect text regions with bounding boxes:

In [None]:
# Set model to OCR detection mode
model.operation = "ocr_detection"
print(f"System prompt for OCR detection:\n{model.system_prompt[:200]}...\n")

# Set OCR detection prompt
model.prompt = "Detect all text regions in this image"

# Apply OCR detection
print("Running OCR text detection...")
text_dataset.limit(10).apply_model(model, label_field="text_regions")

# Display detection results
first_text_sample = text_dataset.first()
if first_text_sample.text_regions and first_text_sample.text_regions.detections:
    print(f"Detected {len(first_text_sample.text_regions.detections)} text regions in first image:")
    for det in first_text_sample.text_regions.detections[:5]:  # Show first 5
        print(f"  - '{det.label}'")
else:
    print("No text regions detected in first image")


## Summary

In this notebook, we demonstrated how to use Isaac-0.1 with FiftyOne for various computer vision tasks:

1. **Visual Question Answering (VQA)** - Generated descriptions of spatial relationships
2. **Object Detection** - Detected objects with bounding boxes
3. **Keypoint Detection** - Identified key points in images
4. **Classification** - Classified weather/environment conditions
5. **OCR Text Extraction** - Extracted text content from images
6. **OCR Text Detection** - Detected text regions with bounding boxes

Isaac-0.1 is a powerful 2B-parameter model that delivers impressive results across all these tasks while being efficient enough for practical applications.

## Resources

- [Isaac-0.1 on Hugging Face](https://huggingface.co/PerceptronAI/Isaac-0.1)
- [Isaac-0.1 FiftyOne Integration](https://github.com/harpreetsahota204/isaac0_1)
- [Perceptron AI GitHub](https://github.com/perceptron-ai-inc/perceptron)
- [FiftyOne Documentation](https://docs.voxel51.com/)

## License

- **Code**: Apache 2.0 License
- **Model Weights**: Creative Commons Attribution-NonCommercial 4.0 International License
