Skip to content

opensciencearchive/osa-py

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

5 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

OSA

OSA Python SDK

The developer toolkit for the Open Science Archive — define metadata schemas, write validation hooks, build ingesters, and deploy conventions.

License Python

Pre-release — APIs will change without notice.


Install

pip install osa

Quickstart

Define a schema, write a hook, register a convention:

from osa import Schema, Field, Record, Reject, hook, convention

class Crystal(Schema, id="crystal-structure"):
    pdb_id: str
    resolution: float = Field(unit="Å")
    method: str

@hook
def resolution_check(record: Record[Crystal]) -> None:
    if record.metadata.resolution > 3.5:
        raise Reject("Resolution too low for inclusion")

convention(
    title="Crystal Structures",
    schema=Crystal,
    hooks=[resolution_check],
    files={"extensions": [".cif"], "min_count": 1},
)

Testing

Test hooks in-process without Docker:

from osa.testing import run_hook

run_hook(resolution_check, meta={"pdb_id": "1ABC", "resolution": 2.1, "method": "X-ray"})

Deploy

osa link --server https://my-archive.org
osa login
osa deploy

CLI reference

Command Description
osa link --server <url> Link project to an archive server
osa login Authenticate via device flow
osa logout Remove stored credentials
osa deploy Build OCI images and register conventions
osa meta Print the convention manifest
osa ingestion start Trigger an ingestion run

Concepts

Schema — a Pydantic model defining typed metadata fields for a convention.

Hook — a pure function decorated with @hook that receives a Record[T] and returns structured results. Hooks run as OCI containers with a filesystem I/O contract.

Convention — a bundle of a schema, hooks, file requirements, and an optional ingester.

Ingester — an async generator that pulls records from external systems into the archive on a schedule.

Record[T] — generic container binding a schema type to its metadata, files, and SRN.

Project structure

osa/
├── __init__.py              # Public API
├── authoring/               # @hook, convention(), Reject, Ingester
├── types/                   # Schema, Record, Field, File
├── runtime/                 # OCI entrypoints (osa-run-hook, osa-run-ingester)
├── testing/                 # run_hook() test harness
└── cli/                     # osa command (login, deploy, link, ...)

License

Apache 2.0

About

CLI and SDK for Open Science Archive

Resources

License

Stars

Watchers

Forks

Contributors