A web extension for capturing and annotating web page snapshots for labeling and review. Create labeled datasets from web pages with high-fidelity snapshots.
- High-Fidelity Snapshots: Captures complete web pages as self-contained HTML files using mhtml capture
- Inert Snapshots: All interactive elements (links, forms, scripts) are disabled to preserve the exact state
- Text Annotations: Highlight and label text content with different annotation types:
- Relevant: Mark content that is relevant to a query
- Answer: Mark content that contains the answer
- No Content: Mark when no relevant content exists
- Q&A Labeling: Create question-answer pairs for each snapshot with:
- Query text
- Expected answer
- Annotation links to highlighted text
- Evaluation Metrics: Rate each Q&A pair with:
- Answer correctness (correct/incorrect/partial)
- Answer in page (yes/no/unclear)
- Page content quality (good/broken)
- Review Workflow: Approve or decline snapshots with optional review notes
- Export/Import: Backup and restore your labeled data as JSON files
- Local Storage: All data stored locally using Chrome storage API (portable and exportable)
-
Install dependencies:
npm install
-
Build the extension:
npm run build
-
For development with auto-rebuild:
npm run dev
- Open Chrome and go to
chrome://extensions/ - Enable "Developer mode" (toggle in top right)
- Click "Load unpacked"
- Select the
distfolder from this project
- Navigate to any web page you want to capture
- Click the refine.page extension icon
- Click "Capture Page"
- The page will be saved as an inert snapshot
- Click "View Snapshots" or click on a recent snapshot
- In the viewer:
- Select an annotation tool (Relevant, Answer, or No Content)
- Select text in the page preview to create annotations
- The selected text will be highlighted with the annotation type's color
- Click "+ Add Question" in the right panel
- Enter the query and expected answer
- Create annotations that are linked to this question
- Fill out the evaluation metrics
- Use the filter tabs to see pending or completed snapshots
- Review annotations and evaluations
- Click "Approve" or "Decline" to update the status
- Add review notes if needed
- Click "Export" in the popup to download all snapshots as JSON
- Click "Import" to restore from a backup file
refine-page/
├── src/
│ ├── types/ # TypeScript type definitions
│ ├── background/ # Service worker for message handling
│ ├── content/ # Content script for page capture
│ ├── popup/ # Extension popup UI
│ ├── viewer/ # Full-page annotation viewer
│ └── snapshot/ # Snapshot preview page
├── icons/ # Extension icons
├── scripts/ # Build scripts
└── dist/ # Built extension (generated)
- @annotorious/annotorious: Image/region annotation (for future use)
- @recogito/text-annotator: Text annotation (for future use)
Snapshots are stored with the following structure:
interface Snapshot {
id: string;
url: string;
title: string;
html: string; // Complete self-contained HTML
viewport: { width: number; height: number };
annotations: {
text: TextAnnotation[];
region: RegionAnnotation[];
};
questions: Question[];
status: 'pending' | 'approved' | 'declined' | 'needs_revision';
reviewNotes?: string;
capturedAt: string;
updatedAt: string;
tags: string[];
}Apache-2.0