Skip to content

new: first ARK version#7

Merged
gabestein merged 11 commits into
mainfrom
gs/ark-first-implementation
May 5, 2026
Merged

new: first ARK version#7
gabestein merged 11 commits into
mainfrom
gs/ark-first-implementation

Conversation

@gabestein
Copy link
Copy Markdown
Member

@gabestein gabestein commented May 4, 2026

ARK Server

Context

Underlay needs persistent, citeable identifiers for collections and records — standard academic/archival infrastructure. We're implementing ARK (Archival Resource Key) identifiers, which are free, open persistent identifiers that support inflection (metadata queries via ?info, ?json) and redirect via standard web redirects.

The implementation follows ARK conventions and best practices by giving each org a unique "shoulder" (ul{counter}{digit}), minting opaque betanumeric IDs for collections, and resolving ARK URLs to collection overviews, specific versions, or individual records. If enabled, record-level ARKs redirect to a URL field within the record's data.

In this way, Underlay can be used not just for data storage, but also as a persistent identifier and metadata retrieval service for both collections and individual records they reference on remote servers.


Architecture Overview

URL format: https://underlay.org/ark:NAAN/ul{counter}{digit}{collectionHash}
Examples:

  • Collection: ark:12345/ulb0xkm3nq8p
  • Version: ark:12345/ulb0xkm3nq8p.v2
  • Record: ark:12345/ulb0xkm3nq8p/Person/author-001
  • Record + version: ark:12345/ulb0xkm3nq8p.v2/Person/author-001

Parsing shoulder from name: Shoulders use the ARK "primordinal" convention — they end at the first digit. Since shoulder = ul{consonants}{digit}, parsing scans for the digit after ul. E.g. ulb0xkm3nq8p → shoulder=ulb0, collectionArkId=xkm3nq8p. To ensure unambiguous parsing, collection hash characters must start with a consonant (guaranteed by the betanumeric encoding: if the first character is a digit, prepend a consonant pad character).

Betanumeric alphabet: bcdfghjkmnpqrstvwxz0123456789 (consonants without 'l' + digits, 29 chars)

Routing: ARK URLs are user-facing (/ark:NAAN/...), handled by Astro middleware (intercepts before page routing), which calls Fastify API internally to resolve and then returns a redirect or metadata response.

Storage: Minimize writes. We store:

  • ark_shoulders — one row per org (created on first ARK use)
  • ark_collections — one row per collection (ARK ID, enabled flag, custom redirect URL)
  • ark_record_types — one row per (collection, record type) where ARKs are enabled (stores which URL field to redirect to)
  • No per-record rows — record ARKs resolved dynamically by fetching the record at request time

Step 1: Database Schema

File: src/db/schema.ts

Add to accounts table: arkNaan: text("ark_naan") (nullable; if set, overrides default NAAN for all ARKs)

Add three new tables:

// ark_shoulders — one per org, assigned sequentially
export const arkShoulders = pgTable("ark_shoulders", {
  id: uuid("id").defaultRandom().primaryKey(),
  accountId: uuid("account_id").notNull().unique().references(() => accounts.id, { onDelete: "cascade" }),
  shoulder: text("shoulder").notNull().unique(), // e.g. "ulb0", "ulc7"
  createdAt: timestamp("created_at", { withTimezone: true }).defaultNow().notNull(),
});

// ark_collections — one per collection with ARKs enabled
export const arkCollections = pgTable("ark_collections", {
  collectionId: uuid("collection_id").notNull().primaryKey().references(() => collections.id, { onDelete: "cascade" }),
  arkId: text("ark_id").notNull().unique(),   // opaque betanumeric, e.g. "xkm3nq8p"
  enabled: boolean("enabled").notNull().default(true),
  customUrl: text("custom_url"),              // optional override redirect
  createdAt: timestamp("created_at", { withTimezone: true }).defaultNow().notNull(),
});

// ark_record_types — per (collection, recordType) when schema ARKs enabled
export const arkRecordTypes = pgTable(
  "ark_record_types",
  {
    collectionId: uuid("collection_id").notNull().references(() => collections.id, { onDelete: "cascade" }),
    recordType: text("record_type").notNull(),
    redirectUrlField: text("redirect_url_field").notNull(), // field name in record data
  },
  (t) => [primaryKey({ columns: [t.collectionId, t.recordType] })],
);

Migration file: src/db/migrations/0004_{name}.sql


Step 2: ARK Utility Library

New file: src/lib/ark.ts

Functions:

  • BETANUMERIC = "bcdfghjkmnpqrstvwxz0123456789" (29 chars)
  • BETANUMERIC_CONSONANTS = "bcdfghjkmnpqrstvwxz" (19 chars — used for shoulder counter)
  • collectionToArkId(collectionId: string): string — SHA-256 of UUID → base-29 encode first 48 bits → 8-char betanumeric string (deterministic, no DB write needed for the hash)
  • nextShoulderCounter(existingCount: number): string — converts integer index to consonant-alphabet string (0→"b", 1→"c", ..., 18→"z", 19→"bb", ...)
  • mintShoulder(accountId: string): Promise<string> — atomic: count existing shoulders, compute next counter, append random digit 0-9, insert into ark_shoulders, return full shoulder string like "ulb0"
  • getOrMintShoulder(accountId: string): Promise<string> — get existing or mint new
  • parseArkPath(pathAfterNaan: string): { shoulder, collectionArkId, version?, recordType?, recordId? } — parses ulb0xkm3nq8p.v2/Person/author-001
  • buildArkUrl(naan, shoulder, collectionArkId, version?, recordType?, recordId?): string
  • getCollectionArk(collectionId: string, naan: string): Promise<string | null> — returns full ARK URL for a collection if enabled

Step 3: ARK API Routes

New file: src/api/routes/ark.ts, registered in src/api/server.ts with prefix: "/api"

Routes:

GET /api/ark/resolve — main resolution endpoint called by Astro middleware

  • Query param: path (the full path after ark:)
  • Parse: extract NAAN from path start, then shoulder, collectionArkId, optional version, optional record type/id
  • Lookup: ark_shoulders by shoulder value → get accountId
  • Lookup: ark_collections by arkId → get collectionId, enabled flag, customUrl
  • If not enabled → 404
  • Always resolve fully and build metadata object (see Step 5 for shape by object type)
  • If no record suffix: resolve to collection URL or version URL
    • customUrl → redirect URL
    • With version: redirect to /{ownerSlug}/{collectionSlug}/v/{number}, metadata includes version details (semver, message, pushedBy, createdAt)
    • Without version: redirect to /{ownerSlug}/{collectionSlug}, metadata reflects latest version
  • If record suffix: lookup ark_record_types → fetch record from correct version → extract URL field → redirect URL; metadata includes recordType, recordId, data (public fields only), schema (the schema for that type)
  • Returns: { type: 'redirect'|'not_found', url?, metadata: { type, ... } }

GET /api/collections/:owner/:slug/ark — get ARK settings for collection (owner auth required)

  • Returns: { enabled, customUrl, arkUrl, shoulder, arkId }

PATCH /api/collections/:owner/:slug/ark — update ARK settings (owner/admin auth)

  • Body: { enabled?: boolean, customUrl?: string | null }

GET /api/collections/:owner/:slug/ark/record-types — get schema ARK settings

  • Returns list of { recordType, redirectUrlField } for enabled record types

PATCH /api/collections/:owner/:slug/ark/record-types — enable/disable ARK for a record type

  • Body: { recordType, redirectUrlField: string | null } (null to disable)

PATCH /api/accounts/:slug/ark — update org NAAN (admin auth)

  • Body: { naan: string | null }

Hook collection creation (src/api/routes/collections.ts):

After db.insert(schema.collections), auto-mint ARK:

  1. getOrMintShoulder(account.id)
  2. Compute collectionToArkId(id)
  3. Insert into ark_collections with enabled: true

Update collection/version API responses:

  • GET /collections/:owner/:slug → add ark?: string field (the ARK URL, if enabled)
  • GET /collections/:owner/:slug/versions → add ark?: string per version (with .vN suffix)
  • GET /collections/:owner/:slug/versions/:n → add ark?: string for the version
  • GET /collections/:owner/:slug/versions/:n/records → add ark?: string per record (if record type ARK enabled)

Step 4: ARK Root Handler

In the Astro middleware (or as part of the ARK resolver logic), handle requests to /ark:NAAN/ (path ends immediately after the NAAN with a trailing slash and nothing more) as a special case:

Return text/plain content describing the naming authority:

The Underlay assigns identifiers within the ARK domain {NAAN} with the following principles:
...

(Full policy text TBD during implementation; should state that Underlay maintains persistent redirects for collections and records, that ARKs are not reassigned, and reference the erc-support info.)

The erc-support.where field in all ERC responses points to this URL: https://underlay.org/ark:{NAAN}/


Step 5: Astro Middleware — ARK Resolver

File: src/middleware.ts

Intercept requests where pathname.startsWith("/ark:"):

if (context.url.pathname.startsWith("/ark:")) {
  const fullPath = context.url.pathname.slice(1) // strip leading /
  // fullPath looks like "ark:12345/ulb0xkm3nq8p.v2/Person/author-001"
  
  const params = new URLSearchParams({ path: fullPath })
  const res = await fetch(`http://localhost:3000/api/ark/resolve?${params}`)
  const data = await res.json()
  
  if (data.type === 'not_found') return new Response('ARK not found', { status: 404 })
  
  const searchStr = context.url.search  // "?info", "?json", "??"
  
  if (searchStr === '?info' || searchStr === '??') {
    // Return ERC text/plain response
    return new Response(buildERC(data.metadata), {
      headers: { 'Content-Type': 'text/plain; charset=utf-8' }
    })
  }
  if (searchStr === '?json') {
    return new Response(JSON.stringify(data.metadata), {
      headers: { 'Content-Type': 'application/json' }
    })
  }
  
  return Response.redirect(data.url, 302)
}

/api/ark/resolve returns a metadata object whose shape depends on the resolved object type:

  • Collection (no version): { type: 'collection', collectionName, ownerName, createdAt, arkUrl }
  • Version: { type: 'version', collectionName, ownerName, versionNumber, semver, createdAt, message, arkUrl }
  • Record: { type: 'record', collectionName, ownerName, versionNumber, recordType, recordId, schema, data (filtered), createdAt, arkUrl }

buildERC(metadata) — builds ANVL-format ERC response tailored to object type:

For a collection:

erc:
who: {ownerName}
what: {collectionName}
when: {createdAt in YYYYMMDD}
where: {arkUrl}

erc-support:
who: Underlay
what: Underlay ARK Service
when: 20260504
where: https://underlay.org/ark:{NAAN}/

For a version:

erc:
who: {ownerName}
what: {collectionName} v{semver}
when: {version createdAt in YYYYMMDD}
where: {arkUrl with .vN}

erc-support:
who: Underlay
what: Underlay ARK Service
when: 20260504
where: https://underlay.org/ark:{NAAN}/

For a record:

erc:
who: {ownerName}
what: {recordType} {recordId} in {collectionName}
when: {version createdAt in YYYYMMDD}
where: {arkUrl with type/id suffix}

erc-support:
who: Underlay
what: Underlay ARK Service
when: 20260504
where: https://underlay.org/ark:{NAAN}/

For ?json, return the full metadata object as JSON (including schema for records, version provenance fields like semver, message, pushedBy).


Step 6: Organization Settings — ARK NAAN

File: src/pages/[owner]/settings.astro

Add new section below API Keys (before Danger Zone):

  • Heading: "ARK Identifiers"
  • Show current shoulder if minted (read-only)
  • NAAN field: input showing current orgData.arkNaan ?? 'Default (12345)' with save button
  • action = "update-ark" form handler that calls PATCH /api/accounts/${owner}/ark

Add PATCH /accounts/:slug/ark handler in src/api/routes/accounts.ts:

  • Validate NAAN is digits only, 5 chars
  • Update accounts.arkNaan

Step 7: Collection Settings — ARK Section

File: src/pages/[owner]/[collection]/settings.astro

Add section below Export:

  • Fetch ARK settings from GET /api/collections/${owner}/${collection}/ark
  • Show current ARK URL (copyable code block) if enabled
  • Toggle checkbox: "Enable ARK identifier"
  • Conditional field: "Custom redirect URL" (shown when enabled, empty = default collection page)
  • Save button → action = "update-ark" → calls PATCH /api/collections/${owner}/${collection}/ark

Step 8: Collection Schema Settings — Per-Type ARK Minting

File: src/pages/[owner]/[collection]/schemas.astro

For each schema type, if there's at least one type: "string", format: "uri" or "url" field:

  • Add a small section below the field table: "ARK identifiers for this type"
  • Dropdown: "Redirect URL field" (shows all URL-type fields, + "Disabled")
  • On save → PATCH /api/collections/${owner}/${collection}/ark/record-types with { recordType, redirectUrlField: field | null }
  • When saving with a field selected: the API simply stores the mapping (no bulk pre-generation needed)

Note: "mint ARKs" in the spec means enabling the ARK infrastructure for that type. ARKs are resolved dynamically at request time, not pre-stored per record.


Step 9: UI — Add ARK Display

Collection overview (src/pages/[owner]/[collection]/index.astro):

  • Add "ARK" section in sidebar (below Subscribe), showing copyable ARK URL if enabled
  • Copy button using navigator.clipboard.writeText()

Versions page (src/pages/[owner]/[collection]/versions.astro):

  • Add ARK URL (with .vN suffix) as a small copyable <code> element per row

Version detail page (src/pages/[owner]/[collection]/v/[n].astro):

  • Add ARK URL with version in the version info bar

Record table in version detail:

  • If record type has ARKs enabled, add ARK column to record rows

Step 10: Environment Variable

File: src/lib/ark.ts (or src/lib/page-utils.ts)

  • export const DEFAULT_NAAN = process.env.ARK_DEFAULT_NAAN ?? "12345"

Add ARK_DEFAULT_NAAN=12345 to .env.test


Critical Files

File Change
src/db/schema.ts Add arkNaan to accounts, add 3 new tables
src/db/migrations/0004_*.sql Migration
src/lib/ark.ts New — betanumeric utils, shoulder minting, ARK building/parsing
src/api/server.ts Register arkRoutes
src/api/routes/ark.ts New — resolution + settings endpoints
src/api/routes/collections.ts Auto-mint on create; return ark in responses
src/api/routes/versions.ts Return ark with .vN suffix in version responses
src/middleware.ts Intercept /ark:*, call resolve, return redirect/ERC/JSON
src/pages/[owner]/settings.astro ARK NAAN section
src/pages/[owner]/[collection]/settings.astro Enable/disable ARK, custom URL
src/pages/[owner]/[collection]/schemas.astro Per-type ARK URL field selector
src/pages/[owner]/[collection]/index.astro ARK in sidebar
src/pages/[owner]/[collection]/versions.astro ARK per version row
src/pages/[owner]/[collection]/v/[n].astro ARK in version detail

Design Decisions

No per-record storage: Record ARKs are resolved dynamically (look up collection→type→field→fetch record). Only 3 new tables regardless of collection size.

Deterministic collection ARK IDs: SHA-256(collectionUUID) → base-29 encode → 8 betanumeric chars. No lookup table needed for collection IDs — just store once in ark_collections.

Shoulder uniqueness: nextShoulderCounter counts existing rows in ark_shoulders atomically to assign the next sequential counter. Random digit (0-9) appended for character diversity per ARK best practices.

Enabled by default: New collections get ark_collections row with enabled: true. Users can disable in settings.

Record IDs in ARK URLs are literal: Using the actual recordId text (not hashed) avoids a massive reverse-lookup table. Record IDs in Underlay are already not sequential integers.

URL field detection for schema ARKs: At save time, the UI only offers fields where type === "string" and (format === "uri" or format === "url"). No runtime schema validation in the resolver — we trust the stored field name.


Verification

  1. Start the dev server: npm run dev
  2. Create a new org and collection — confirm ark_shoulders and ark_collections rows created
  3. Navigate to /ark:12345/ulb0xkm3nq8p — confirm redirect to collection overview
  4. Navigate to /ark:12345/ulb0xkm3nq8p.v1 — confirm redirect to version 1 page
  5. Hit ?info inflection: curl "http://localhost:4321/ark:12345/ulb0xkm3nq8p?info" — confirm ERC text/plain response with who/what/when/where + erc-support block
  6. Hit ?json inflection — confirm JSON response with collection metadata
  7. Enable ARK for a record type with a URL field, confirm /ark:12345/ulb0xkm3nq8p/Person/author-001 redirects to URL in that record's data
  8. Set custom URL in collection settings — confirm ARK redirects to custom URL
  9. Set org NAAN — confirm new collections use new NAAN
  10. Check collection overview sidebar shows copyable ARK URL
  11. Check versions page shows per-version ARKs

@gabestein gabestein requested a review from isTravis May 4, 2026 17:12
@gabestein gabestein requested a review from isTravis May 4, 2026 21:08
@gabestein gabestein merged commit 643cc3f into main May 5, 2026
@gabestein gabestein deleted the gs/ark-first-implementation branch May 5, 2026 02:00
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants