Skip to content

rob-e-graham/archai

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

42 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

ARCHAI™ — Sovereign Semantic Heritage Infrastructure

"Museums are not silent repositories of Memory; they are living, thinking organisms, where imagination and knowledge, tradition and innovation meet." — Gayane Umerova, UNESCO, 2025

Version: 10.8 Author: Rob Graham · FAMTEC (Fine Art Media Tech) / RMIT University Status: Working prototype — multi-institution semantic search + LLM object chat + NFC visitor pages Target: ISEA2026 Dubai, 6th Summit on New Media Art Archiving (April 11–12) Paper: docs/ARCHAI_ISEA2026_Rob_Graham.pdf Licence: MPL-2.0 (code) · CC BY 4.0 (MV data) · CC0 (Met data) · V&A Open Access — see NOTICE for IP and trademark details Trademark: ARCHAI™ is a trademark of Rob Graham / FAMTEC. See NOTICE for usage terms.


What's Working Right Now

✅ Multi-Collection Semantic Search

Three museum collections in Qdrant, searchable simultaneously:

Collection Source Objects Licence Status
archai_pilot Museums Victoria ~194 CC BY 4.0 ✅ Live
archai_met The Metropolitan Museum of Art, NYC ~100 CC0 ✅ Live
archai_va Victoria and Albert Museum, London ~80 V&A Open Access ✅ Live
archai_curator All of the above + comments Built on demand Mixed ✅ Live
  • Query → embedded via nomic-embed-text → vector searched across all 3 collections → results merged by cosine similarity
  • Results colour-tagged: MV (teal), Met (gold), V&A (purple)
  • Text fallback when Ollama offline
  • Sort by: name, date, discipline, source
  • Filter: with images (default), all, MV/Met/V&A only
  • Deduplicated by canonical_id across collections

✅ Object-as-Speaker LLM Chat

Each object speaks in first person via llama3, grounded in verified metadata:

  • System prompt built from ALL metadata fields
  • Dynamic institution name per object
  • Hallucination prevention: "That's not in my record"
  • Metadata fallback when Ollama offline — no LLM required

✅ Object Detail Panel

  • Full metadata, image, curatorial description
  • Live llama3 chat with question chips
  • Semantically related objects across all collections
  • Source-specific links: "View on The Met →", "View on V&A →"
  • Visitor comment thread — all comments (including flagged) with approve/remove/reply actions for curators

✅ AI-Moderated Visitor Comments

Comments submitted by visitors are AI-screened in real time:

  • Ollama classifies each comment as safe / suspicious / harmful
  • Safe comments visible immediately on the object page
  • Suspicious/harmful comments hidden — sent to curator review queue
  • Human curator has final say — approve or remove
  • Threaded replies supported (staff can respond to visitors)
  • Stored in SQLite — becomes part of the object's collection record
  • Comments included in curator vector collection for semantic search

✅ Backend Proxy (Public Hosting Safety)

Safe proxy layer for exposing ARCHAI publicly via Cloudflare Tunnel:

  • Rate limiting per IP (15 chat/min, 30 search/min)
  • Prompt injection pattern blocking (regex filter)
  • Safety wrapper prepended to all LLM system prompts
  • Token and prompt length caps (512 tokens, 500 chars)
  • All frontend fetch calls route through qdrantFetch()/ollamaFetch() wrappers

✅ Curator Vector Collection

Enriched archai_curator Qdrant collection combining:

  • All object metadata from all 3 source collections
  • Visitor comments attached to each object
  • Rebuilt on demand via POST /api/proxy/curator/build
  • Semantic search across everything via POST /api/proxy/curator/search

✅ NFC Visitor Pages (Mobile)

194 standalone HTML pages from all 3 collections:

  • Object image, metadata, description, LLM chat over LAN or via proxy
  • Share: native iOS sheet, email, copy link, X/Twitter
  • Comment submission with AI moderation (localStorage fallback offline)
  • Related objects with cross-collection links
  • Captive portal for exhibition WiFi

✅ NFC Management Panel

  • Tags from objects with images, mixed across MV/Met/V&A
  • 3-column layout: tag list → editor → phone preview
  • Search, filter, publish/unpublish

✅ Role Switcher

Role Access
Admin All tabs
Curator Curator, Nodel, NFC, Vocab, Visitor, FAMTEC
Collections Curator, NFC, Vocab, Visitor, FAMTEC
Technician Nodel, Visitor, FAMTEC
Volunteer Curator, NFC, Visitor, FAMTEC
Visitor Visitor only

✅ Curator Toolbar

  • Select All / Export CSV / Batch Tag — fully wired
  • CSV export with all metadata fields, scoped to selection or full collection
  • Batch tagging applies keywords to selected objects and rebuilds vocab index

✅ FAMTEC Exchange

  • Test space for interaction design, workflow feel, and interface prototyping inside ARCHAI
  • Placeholder institution names are used to simulate exchange activity and help evaluate the app experience
  • Feed includes loan, rental, skills, and crew-availability scenarios
  • Post listings (hardware, skills, requests), send enquiries, view details
  • Enquiries route to institution chat threads where available
  • Chip filters and institution chat are functional prototype interactions
  • This is not the final FAMTEC platform: production development will be handled separately by FAMTEC outside the PhD work, with potential later integration into ARCHAI once developed

✅ Nodel Panel

  • Gallery cards with status indicators
  • Node table, fault log, schedule
  • Refresh status polling, emergency stop with confirmation
  • Direct links to Nodel web UI and Directus admin

✅ Vocabulary & Thesaurus (CHIN-aligned)

  • Live vocabulary index built from Qdrant payloads across all 3 collections (9 facets: discipline, category, object type, classifications, collecting areas, keywords, culture, period, medium)
  • CHIN/AAT reference terms — 26 curated terms from Getty AAT with scope notes, broader/narrower hierarchies, and AAT IDs
  • DOCAM Glossaurus — media art preservation terminology (emulation, migration, variable media, documentation strategies)
  • Nomenclature for Museum Cataloging — Parks Canada/CHIN object naming and classification
  • CHIN Discipline Authority List (2006) — bilingual EN/FR discipline headings
  • Term search across all sources with scope notes, provider badges, and language tags
  • Term detail panel: path, scope note, broader/narrower terms, related terms, collection usage with example objects
  • Apply to Search (jumps to Curator with search), Add Local Mapping (creates institution-specific terms)
  • Indigenous protocol layer with governance notice

Screenshots

Desktop views:

Curator collections search Exhibitions live dashboard NFC management with visitor preview Vocabulary and thesaurus tools Visitor view object page FAMTEC exchange Object detail view Curator object conversation

Mobile views:

Mobile NFC index Mobile object hero Mobile object detail Mobile object chat Mobile object response Mobile related objects and footer


What's Not Working Yet

🔲 Cross-Collection LLM Intelligence (RAG)

LLM currently only sees one object's metadata. Needs RAG: embed user question → search Qdrant → inject related objects into LLM context → synthesise connections. Curators get full cross-collection access, visitors get bounded single-object responses.

🔲 LLM Image Analysis

Use llava to extract colours, text, objects from images → searchable metadata.

🔲 External Vocabulary APIs

AAT, LCSH, TGN, and ULAN are listed but inactive — currently using curated reference terms rather than live API lookups. Getty AAT LOD endpoint integration is architecturally ready.

🔲 FAMTEC Persistence

Current in-app FAMTEC Exchange uses prototype data and in-memory arrays only. The production FAMTEC Exchange platform will be developed separately by FAMTEC outside the PhD work, with potential later integration into ARCHAI once developed.

🔲 Directus Integration

Health-checked only. NFC save attempts backend sync but falls back to local confirmation.

🔲 Nodel API

Static prototype data. Needs WebSocket to real Nodel instance. UI links and emergency stop are wired.

🔲 Harvester Improvements

Date extraction from titles, better Met filtering, incremental harvest. Run Harvesters button verifies collection counts and triggers reload.


Architecture

┌──────────────────────────────────────────────────────────┐
│                    ARCHAI Frontend                      │
│                 (ARCHAI_v10_8.html · browser)           │
│                                                         │
│  Search ──→ Ollama embed ──→ Qdrant (4 collections)     │
│  Chat   ──→ Ollama llama3 ──→ grounded response         │
│  NFC    ──→ Ollama llama3 ──→ chat over LAN / proxy     │
│  Sort   ──→ client-side on loaded objects               │
│  Comments ──→ Backend API ──→ AI moderation ──→ SQLite  │
└────────┬──────────────┬──────────────┬──────────────────┘
         │              │              │
    localhost:6333  localhost:11434  localhost:8787
      Qdrant          Ollama        Backend API
                                   ├── Safe proxy (rate limit + injection block)
                                   ├── Comments (AI moderation → SQLite)
                                   ├── Curator vectors (build + search)
                                   └── Directus bridge (optional)

  Public access (Cloudflare Tunnel):
  Visitor phone ──→ tunnel ──→ Backend proxy ──→ Ollama/Qdrant
                                    └──→ Comments API (AI screened)

Project Structure

archai/
├── ARCHAI_v10_8.html              ← Main frontend (single-file app)
├── README.md                      ← This file
├── ARCHAI_OPERATIONS_GUIDE.md     ← Full ops guide (startup, testing, APIs, adding objects)
├── REMOTE_TESTING_GUIDE.md        ← Tailscale setup for iPad/iPhone testing
├── start-archai.sh                ← One-command startup + health checks
├── backend-archai/
│   ├── src/
│   │   ├── server.js              ← Express entry point
│   │   ├── data/db.js             ← SQLite database (comments)
│   │   ├── middleware/rateLimit.js ← Rate limiter
│   │   ├── routes/
│   │   │   ├── proxy.js           ← Safe Qdrant/Ollama/curator proxy
│   │   │   ├── comments.js        ← AI-moderated threaded comments
│   │   │   └── ...                ← Other route modules
│   │   └── services/
│   │       ├── moderation.js      ← Ollama comment screening
│   │       └── curator-vectors.js ← Curator collection builder
│   ├── scripts/
│   │   ├── met-harvester.js       ← Met NYC → Qdrant
│   │   └── va-harvester.js        ← V&A London → Qdrant
│   └── data/archai.db             ← SQLite (created at runtime)
├── nfc-pages/
│   ├── generate-nfc-pages.js      ← All collections → HTML per tag
│   ├── nfc-visitor-template.html  ← Mobile template
│   ├── captive-portal.html
│   └── v/                         ← Generated pages (~194, not in git)
├── docs/
│   └── ARCHAI_ISEA2026_Rob_Graham.pdf
└── docker-compose.yml

Quick Start

cd ~/Desktop/APPS/ARCHAI\ APP
./start-archai.sh

Starts Docker, Qdrant, Ollama (with LAN+CORS), backend API, frontend server. Runs 7 health checks, shows loaded models, Qdrant collections, comment count, and prints all URLs.

Main app: http://localhost:8000/ARCHAI_v10_8.html NFC index: http://localhost:8000/nfc-pages/v/index.html Backend API: http://localhost:8787/api/health Tailscale: http://100.109.26.39:8000/ARCHAI_v10_8.html

See ARCHAI_OPERATIONS_GUIDE.md for full setup, testing, and API reference.


Hardware

Mac Studio M2 Max · 64GB · 1TB. Base institutional deployment: ~$3,500–5,000 USD one-time. No subscriptions, no cloud dependency.


Version History

Version Changes
v6 Initial prototype, mock objects
v7 Role switcher, FAMTEC, Nodel, NFC, vocabulary
v10.4 MV-only, live Qdrant + Ollama, LLM chat
v10.5 Restored all panels, NFC page generator
v10.6 Multi-collection (MV+Met+V&A), sort/filter, dedup, harvesters, NFC share+comments, 200 pages, dynamic institutions
v10.7 Live CHIN-aligned thesaurus (AAT+DOCAM+Nomenclature+CHIN Disciplines), all buttons wired, responsive thumbnail scaling, vocab search with scope notes and provider badges
v10.8 Backend proxy for safe public hosting (rate limiting, prompt injection blocking), AI-moderated threaded comments (Ollama screening → curator review queue), curator vector collection (all metadata + comments searchable), SQLite persistence, NFC pages wired to backend API, object detail comment thread with approve/remove/reply, startup script with health checks, operations guide

Rob Graham · FAMTEC / RMIT · rob@fineartmedia.tech GitHub: github.com/rob-e-graham/archai


ARCHAI™ is a trademark of Rob Graham / FAMTEC (Fine Art Media Tech). Use of the source code under MPL-2.0 does not grant trademark rights. See NOTICE for details.

About

ARCHAI — Sovereign Semantic Heritage Infrastructure Toolkit

Resources

License

Contributing

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors