# MinerU 2.5 with FiftyOne - Document Parsing Example

[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/harpreetsahota204/mineru_2_5/blob/main/mineru_fiftyone_example.ipynb)

This notebook demonstrates how to use the MinerU 2.5 model with FiftyOne for document parsing tasks.


## Installation

Install required packages:


In [None]:
%pip install fiftyone
%pip install "mineru-vl-utils[transformers]"
%pip install transformers

# Optional: Install Caption Viewer plugin for best text viewing experience
# Run this in a terminal or shell command
!fiftyone plugins download https://github.com/harpreetsahota204/caption_viewer


## Load Dataset

We'll use a sample dataset from Hugging Face Hub:


In [None]:
import fiftyone as fo
from fiftyone.utils.huggingface import load_from_hub

# Load the dataset
dataset = load_from_hub(
    "harpreetsahota/NutriGreen",
    overwrite=True,
    persistent=True,
    name="NutriGreen_MinerU25_Outputs",
    max_samples=5
)


## Register and Load MinerU 2.5 Model

Register the model source and load the model:


In [None]:
import fiftyone.zoo as foz

# Register the model source
foz.register_zoo_model_source(
    "https://github.com/harpreetsahota204/mineru_2_5",
    overwrite=True
)

# Load the model
model = foz.load_zoo_model("opendatalab/MinerU2.5-2509-1.2B")


## Apply OCR Detection (Structured Extraction)

Extract document structure with bounding boxes:


In [None]:
# Set operation to OCR detection
model.operation = "ocr_detection"

# Apply model to dataset
dataset.apply_model(model, label_field="text_detections")

print(f"Applied OCR detection to {len(dataset)} samples")


## Apply OCR (Plain Text Extraction)

Extract plain text content:


In [None]:
# Set operation to OCR
model.operation = "ocr"

# Apply model to dataset
dataset.apply_model(model, label_field="text_extraction")

print(f"Applied OCR extraction to {len(dataset)} samples")


## Visualize Results

**Tip**: For the best viewing experience of extracted text, use the [Caption Viewer plugin](https://github.com/harpreetsahota204/caption_viewer). It provides intelligent formatting for OCR outputs with proper line breaks, table rendering, and JSON formatting.

Launch the FiftyOne App to visualize the results:


In [None]:
# Launch the FiftyOne App
session = fo.launch_app(dataset)


## Inspect Sample Results

Let's look at the results from one sample:


In [None]:
# Get the first sample
sample = dataset.first()

# Print detection results
print("=== Detections ===")
if sample.text_detections:
    for detection in sample.text_detections.detections[:3]:  # Show first 3
        print(f"Type: {detection.label}")
        print(f"BBox: {detection.bounding_box}")
        print(f"Text: {detection.text[:100]}...")  # First 100 chars
        print("---")

# Print OCR text
print("\n=== OCR Text (first 500 chars) ===")
if sample.text_extraction:
    print(sample.text_extraction[:500])


## Using Caption Viewer Plugin

After launching the FiftyOne App above, follow these steps to view formatted text outputs:

1. Click on any sample to open the modal view
2. Click the `+` button to add a panel
3. Select **"Caption Viewer"** from the panel list
4. In the panel menu (â˜°), select the field you want to view:
   - `text_extraction` for plain OCR text
   - Or any other text field
5. Navigate through samples to see beautifully formatted text

The Caption Viewer plugin will automatically:
- Render line breaks properly
- Convert HTML tables to markdown
- Pretty-print JSON content
- Show character counts
