Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
143 changes: 101 additions & 42 deletions .claude/CLAUDE.md
Original file line number Diff line number Diff line change
Expand Up @@ -11,73 +11,132 @@ The following files in `.machine_readable/` contain structured project metadata:

---

# CLAUDE.md - AI Assistant Instructions
# CLAUDE.md - Bofig Project Instructions

## Current Phase: Phase 2 — Lithoglyph Migration

**ADR-006 (2026-03-13):** Lithoglyph replaces ArangoDB as primary data store.

### Pipeline
```
Docudactyl (extraction) → Lithoglyph (storage + provenance) ← bofig (queries + visualisation)
```

### Migration Status
- Evidence reads/writes: **migrating** from ArangoDB → Lithoglyph GQL
- Entities: **migrating** from ArangoDB → Lithoglyph
- Claims: **pending** migration to Lithoglyph
- Relationships (graph edges): **ArangoDB** (kept until Lithoglyph Factor GQL gets graph traversals)
- User auth: **PostgreSQL** (permanent, phx.gen.auth)

### Critical Architecture Rules
1. **New domain data features MUST target Lithoglyph**, not ArangoDB
2. **ArangoDB is deprecated for domain data** — only `relationships` edge collection remains
3. **PostgreSQL is for user auth ONLY** — never store domain data there
4. **PROMPT scores** have exactly 6 dimensions (Provenance, Replicability, Objective, Methodology, Publication, Transparency)
5. **Audience types** are: researcher, policymaker, skeptic, activist, affected_person, journalist

## Build & Test

```bash
# Compile (0 warnings required)
mix compile --warnings-as-errors

# Tests (require PostgreSQL + ArangoDB running)
mix test # 257 tests

# NER extractor tests only (no DB needed)
MIX_ENV=test mix run --no-start -e '
ExUnit.start(autorun: false)
Code.require_file("test/evidence_graph/lithoglyph/ner_extractor_test.exs")
ExUnit.run()
'

# Credo lint
mix credo --strict

# Start dev server
mix phx.server
```

## Key Modules

| Module | Purpose | DB |
|--------|---------|-----|
| `EvidenceGraph.Lithoglyph.Client` | Req HTTP client for Lithoglyph API | Lithoglyph |
| `EvidenceGraph.Lithoglyph.Importer` | GenServer batch import with NER | Lithoglyph → ArangoDB |
| `EvidenceGraph.Lithoglyph.NERExtractor` | Regex NER extraction from content | None (pure) |
| `EvidenceGraph.Entities` | Entity resolution, fuzzy match, merge | ArangoDB (migrating) |
| `EvidenceGraph.Claims` | Claim CRUD + PROMPT scoring | ArangoDB (migrating) |
| `EvidenceGraph.Evidence` | Evidence CRUD + metadata | ArangoDB (migrating) |
| `EvidenceGraph.Relationships` | Graph edges, traversals, contradictions | ArangoDB (kept Phase 2) |
| `EvidenceGraph.ArangoDB` | ArangoDB driver wrapper | ArangoDB |

## ArangoDB Query Patterns

```elixir
# Read query (no write transaction)
ArangoDB.query_read(aql, %{bind_var: value})

# Write query (transactional)
ArangoDB.query(aql, %{bind_var: value})

# Insert document
ArangoDB.insert("collection_name", %{field: value})

# Edge document format
%{
_from: "evidence/evidence_123",
_to: "entities/entity_456",
relationship_type: "mentions",
weight: 1.0,
confidence: 0.9
}
```

## Lithoglyph Client Patterns

```elixir
# Query evidence from Lithoglyph
LithClient.query("SELECT * FROM evidence WHERE investigation_id = @id", %{id: inv_id})

# Insert with provenance (mandatory)
LithClient.insert("evidence", document, actor: "user:123", rationale: "Import from Docudactyl")

# Dedup check
LithClient.exists_by_hash?("evidence", sha256_hash)
```

## Language Policy (Hyperpolymath Standard)

### ALLOWED Languages & Tools

| Language/Tool | Use Case | Notes |
|---------------|----------|-------|
| **Elixir** | This project's primary language | Phoenix, LiveView, Absinthe |
| **ReScript** | Primary application code | Compiles to JS, type-safe |
| **Deno** | Runtime & package management | Replaces Node/npm/bun |
| **Rust** | Performance-critical, systems, WASM | Preferred for CLI tools |
| **Tauri 2.0+** | Mobile apps (iOS/Android) | Rust backend + web UI |
| **Dioxus** | Mobile apps (native UI) | Pure Rust, React-like |
| **Gleam** | Backend services | Runs on BEAM or compiles to JS |
| **Bash/POSIX Shell** | Scripts, automation | Keep minimal |
| **JavaScript** | Only where ReScript cannot | MCP protocol glue, Deno APIs |
| **Nickel** | Configuration language | For complex configs |
| **JavaScript** | Only where ReScript cannot | D3.js hooks in this project |
| **Guile Scheme** | State/meta files | STATE.scm, META.scm, ECOSYSTEM.scm |
| **Julia** | Batch scripts, data processing | Per RSR |
| **OCaml** | AffineScript compiler | Language-specific |
| **Ada** | Safety-critical systems | Where required |

### BANNED - Do Not Use

| Banned | Replacement |
|--------|-------------|
| TypeScript | ReScript |
| Node.js | Deno |
| npm | Deno |
| Bun | Deno |
| pnpm/yarn | Deno |
| npm/Bun/pnpm/yarn | Deno |
| Go | Rust |
| Python | Julia/Rust/ReScript |
| Java/Kotlin | Rust/Tauri/Dioxus |
| Swift | Tauri/Dioxus |
| React Native | Tauri/Dioxus |
| Flutter/Dart | Tauri/Dioxus |

### Mobile Development

**No exceptions for Kotlin/Swift** - use Rust-first approach:

1. **Tauri 2.0+** - Web UI (ReScript) + Rust backend, MIT/Apache-2.0
2. **Dioxus** - Pure Rust native UI, MIT/Apache-2.0

Both are FOSS with independent governance (no Big Tech).

### Enforcement Rules

1. **No new TypeScript files** - Convert existing TS to ReScript
2. **No package.json for runtime deps** - Use deno.json imports
3. **No node_modules in production** - Deno caches deps automatically
4. **No Go code** - Use Rust instead
5. **No Python anywhere** - Use Julia for data/batch, Rust for systems, ReScript for apps
6. **No Kotlin/Swift for mobile** - Use Tauri 2.0+ or Dioxus

### Package Management

- **Primary**: Guix (guix.scm)
- **Fallback**: Nix (flake.nix)
- **JS deps**: Deno (deno.json imports)
| Java/Kotlin | Rust |

### Security Requirements

- No MD5/SHA1 for security (use SHA256+)
- HTTPS only (no HTTP URLs)
- No hardcoded secrets
- SHA-pinned dependencies
- SPDX license headers on all files

- SHA-pinned dependencies in workflows
- SPDX license headers on all files (`PMPL-1.0-or-later`)
19 changes: 19 additions & 0 deletions .github/CODEOWNERS
Original file line number Diff line number Diff line change
@@ -0,0 +1,19 @@
# SPDX-License-Identifier: PMPL-1.0-or-later
# CODEOWNERS - Evidence Graph for Investigative Journalism (bofig)

# Default owner for all files
* @hyperpolymath

# Core domain logic
/lib/evidence_graph/ @hyperpolymath

# Lithoglyph integration (Phase 2)
/lib/evidence_graph/lithoglyph/ @hyperpolymath

# Security-sensitive files
/config/ @hyperpolymath
/.github/workflows/ @hyperpolymath
/SECURITY.md @hyperpolymath

# Machine-readable state
/.machine_readable/ @hyperpolymath
43 changes: 30 additions & 13 deletions .machine_readable/ECOSYSTEM.scm
Original file line number Diff line number Diff line change
@@ -1,10 +1,10 @@
; SPDX-License-Identifier: PMPL-1.0-or-later
; Copyright (c) 2026 Jonathan D.A. Jewell (hyperpolymath) <jonathan.jewell@open.ac.uk>
; Copyright (c) 2026 Jonathan D.A. Jewell (hyperpolymath) <j.d.a.jewell@open.ac.uk>

(ecosystem
(metadata
(version "1.0.0")
(last-updated "2026-02-21")
(version "1.1.0")
(last-updated "2026-03-13")
(format "ECOSYSTEM.scm v1"))

(project
Expand All @@ -17,18 +17,35 @@
(position-in-ecosystem
(domain "investigative-journalism-tools")
(novelty "First system combining PROMPT scoring, boundary objects, and i-docs navigation")
(academic-context "PhD thesis: practical infrastructure for pragmatic epistemology"))
(academic-context "PhD thesis: practical infrastructure for pragmatic epistemology")
(pipeline-position "Docudactyl (extraction) -> Lithoglyph (storage+provenance) -> bofig (navigation+visualisation)"))

(related-projects
(project "formdb-debugger" (relationship "sibling-standard"))
(project "formbase" (relationship "sibling-standard"))
(project "hypothesis" (relationship "inspiration") (url "https://hypothes.is/"))
(project "zotero" (relationship "integration-target") (url "https://www.zotero.org/")))
(project "lithoglyph"
(relationship "primary-dependency")
(role "Evidence store, provenance layer, GQL/GQL-DT query engine")
(notes "Phase 2: migrating all domain data from ArangoDB to Lithoglyph"))
(project "docudactyl"
(relationship "upstream-dependency")
(role "Multi-format HPC document extraction, feeds evidence into Lithoglyph"))
(project "verisimdb"
(relationship "sibling-standard")
(role "Octad database with VQL, shares GQL patterns with Lithoglyph"))
(project "zotero"
(relationship "integration-target")
(url "https://www.zotero.org/")
(role "Reference management, two-way sync for evidence import/export"))
(project "hypothesis"
(relationship "inspiration")
(url "https://hypothes.is/")
(role "Web annotation model")))

(technology-stack
(runtime "BEAM/OTP 26+")
(language "Elixir 1.16+")
(framework "Phoenix 1.7+ with LiveView")
(database "ArangoDB 3.11+")
(api "Absinthe GraphQL")
(runtime "BEAM/OTP 27")
(language "Elixir 1.18+")
(framework "Phoenix 1.8+ with LiveView")
(evidence-store "Lithoglyph (GQL/GQL-DT, Phase 2+)")
(graph-db "ArangoDB 3.11+ (relationships only, Phase 2; removed Phase 3)")
(auth-db "PostgreSQL 16")
(api "Absinthe GraphQL + REST")
(visualization "D3.js v7")))
24 changes: 17 additions & 7 deletions .machine_readable/META.scm
Original file line number Diff line number Diff line change
@@ -1,18 +1,19 @@
; SPDX-License-Identifier: PMPL-1.0-or-later
; Copyright (c) 2026 Jonathan D.A. Jewell (hyperpolymath) <jonathan.jewell@open.ac.uk>
; Copyright (c) 2026 Jonathan D.A. Jewell (hyperpolymath) <j.d.a.jewell@open.ac.uk>

(meta
(metadata
(version "1.0.0")
(last-updated "2026-02-21")
(version "1.1.0")
(last-updated "2026-03-13")
(format "META.scm v1"))

(architecture-decisions
(adr "001"
(title "ArangoDB for primary data store")
(status "accepted")
(status "superseded")
(superseded-by "006")
(decision "Use ArangoDB 3.11+ with Arangox Elixir driver via MintClient")
(rationale "Production-proven, multi-model (document + graph), managed hosting available"))
(rationale "Production-proven, multi-model (document + graph), managed hosting available. Superseded: Lithoglyph now provides these capabilities with additional provenance and type safety."))

(adr "002"
(title "Phoenix LiveView over React SPA")
Expand All @@ -36,9 +37,18 @@
(title "Ecto without SQL for domain models")
(status "accepted")
(decision "Use Ecto schemas and changesets for validation only, not persistence")
(rationale "Leverage Ecto's validation without coupling to SQL")))
(rationale "Leverage Ecto's validation without coupling to SQL"))

(adr "006"
(title "Lithoglyph replaces ArangoDB as primary data store")
(status "accepted")
(date "2026-03-13")
(supersedes "001")
(decision "Migrate domain data (evidence, claims, entities) from ArangoDB to Lithoglyph. ArangoDB retained for graph edges only during Phase 2, fully removed in Phase 3.")
(rationale "Eliminates data duplication (Docudactyl→Lithoglyph→ArangoDB was a copy pipeline). Lithoglyph provides mandatory provenance, WAL audit trail, GQL-DT dependent types for PROMPT scores, and compile-time verification. ArangoDB was correct for Phase 1 prototyping but duplicates what Lithoglyph already stores.")))

(design-rationale
(principle "Infrastructure for pragmatic epistemology")
(principle "Navigation over narration")
(principle "Coordination without consensus")))
(principle "Coordination without consensus")
(principle "Query the source, don't copy the data")))
Loading
Loading