v0.4.0
Added
Pipeline-wide integration of the protocol layer
bead.labelsis the single canonical home for the
[[label]]/[[label:text]]/[[label|transform]]syntax.
parse_label_refs,find_label_names, andreplace_label_refs
replace the three independent regex implementations that previously
lived inbead.protocol.drift,bead.deployment.jspsych.trials,
andbead.items.span_labeling.bead.config.protocol.ProtocolConfigplugs intoBeadConfig.protocol
with declarative TOML/YAML configuration: anchor specs, drift
settings, realization strategies (template / contextual / lm), and
family composition.ProtocolConfig.build(lm_client=..., cache=...)
materializes a liveAnnotationProtocol.bead.protocol.itemsprovides the canonical
QuestionRealization → Itemand protocol-wide
family_to_item_template/protocol_to_item_templates/
realize_protocol_to_itemsbridges, plusscale_type_to_task_type
as the single canonical mapping fromScaleTypetoTaskType.bead.active_learning.models.registryexposes
MODEL_CLASSES/CONFIG_CLASSESand
model_class_for_task_type/config_class_for_task_type/
model_class_for_encoding/config_class_for_encodingas the
single canonical task-type → model-class / config-class registry.
bead.cli.modelsandbead.cli.trainingconsume the registry
directly, replacing two parallel string-keyed dicts and a dynamic
_import_classhelper.bead.deployment.protocol_trials.protocol_to_jspsych_trialsis the
canonical end-to-end bridge from anAnnotationProtocoland a
sequence ofProtocolContextrecords to a flat list of jsPsych
trial dicts.bead.data_collection.jatos_results_to_annotation_recordsconverts
raw JATOS results intoAnnotationRecordinstances, the input
shape consumed byannotator_reliabilityand
InterAnnotatorMetrics.bead protocolCLI subcommand:bead protocol validate,
bead protocol realize,bead protocol itemsdrive the
configured protocol from the shell.
Changed
LMRealizationaccepts aModelOutputCache(the bead-wide
content-addressable cache) via its requiredcachekeyword and a
requiredmodel_namekeyword for cache-key isolation. The internal
FIFO dict and thecache/max_cache_size/clear_cache/
cache_sizeparameters and methods are removed; the
ModelOutputCacheis the single canonical caching surface.bead.cli.modelsno longer maintainsTASK_TYPE_MODELS/
TASK_TYPE_CONFIGSstring-path dicts or the_import_class
helper; they are replaced by direct calls into
bead.active_learning.models.registry.bead.cli.trainingfollows
the same pattern.bead.deployment.jspsych.trials._parse_prompt_references,
_SpanReference,_SPAN_REF_PATTERN, and the duplicated
_SPAN_REF_PATTERNinbead.items.span_labelingare removed in
favor ofbead.labels.parse_label_refs/LabelRef.
bead.protocol: annotation protocol primitives
A new top-level package providing a type-theoretic stack for defining
annotation protocols: anchors as types, contexts as dependent
indices, realization strategies as computational content, and drift
guards as type-checkers.
bead.protocol.anchordefinesSemanticAnchor(the type-level
spec of a question, with required span labels, required keywords,
optional embedding center andmax_drift) andResponseSpace/
SemanticPoles.bead.protocol.contextdefines a genericProtocolContextand
ContextItemplus a module-level predicate registry
(register_context_predicate,get_context_predicate,
list_context_predicates) for callers to register named context
predicates at import time.bead.protocol.realizationprovidesRealizationStrategy
(typing.Protocol),TemplateRealization,
ContextualTemplateRealization(rule-based selection from ranked
variants), andLMRealization(with caching and FIFO eviction)
plus anLMClientProtocolwith explicit
temperature/max_tokenskeyword parameters.bead.protocol.driftdefinesDriftScore, theDriftValidator
Protocol, and three concrete validators
(StructuralDriftValidator,EmbeddingDriftValidator,
PerplexityDriftValidator) plus a compositeDriftGuard. The
embedding and perplexity validators consume narrow
EmbeddingAdapter/PerplexityAdapterProtocols, so any object
exposing the right method (including bead's
bead.items.adapters.ModelAdapter) conforms.bead.protocol.familydefinesQuestionFamily(with explicit
depends_onfor conditional dependencies) andAnnotationProtocol
(the iterated dependent product), withrealize_allthreading
responses through the context.AnnotationProtocolrejects
duplicate anchor names, self-dependencies, and forward / unknown
depends_onreferences at construction and onappend.bead.protocol.encodingdefinesScaleType
(StrEnum: binary / ordinal / nominal) andResponseEncoding(with
invariant validators forn_levels == len(labels), label
uniqueness, andBINARYhaving exactly 2 levels), plus
encode_response_spaceas the bridge fromResponseSpace.bead.protocol.diagnosticsdefinesDiagnosticLevel,
DiagnosticRecord,DatasetReport(immutable, withwith_*
mutators),ConditionalObservationValidator(which operates on
AnnotationProtocol.depends_on), and theRecordLikeProtocol
for the structural record shape consumed by the validator.LMRealizationraisesRuntimeErroron backend failures and on
empty / whitespace-only responses (instead of caching an empty
string).
bead.evaluation.reliability: per-annotator reliability
AnnotationRecordis aBeadBaseModelwith the canonical
(annotator_id, item_id, question_name, response_label)shape.annotator_reliability(records, encodings=...)returns
per-annotator response distributions and Shannon entropy in bits,
optionally filtering unrecognized labels.low_entropy_annotators(profiles, threshold=...)flags annotators
who collapse the response space.
Documentation
docs/api/protocol.mdanddocs/api/evaluation.mdupdates expose
the new modules throughmkdocstrings.docs/user-guide/protocols.mdwalks through anchors, contexts
(including the predicate registry and per-dependent attributes),
the three realization strategies, drift validation (with the named
EmbeddingAdapterandPerplexityAdapterProtocols), protocol
composition, the structural construction-time invariants, the
encode_response_spacebridge to modeling, conditional-observation
diagnostics (including theRecordLikeProtocol), and reliability.- The protocol layer is cross-linked from
docs/user-guide/concepts.md,docs/user-guide/index.md,
docs/index.md, the projectREADME.md, and a new "Protocol layer"
paragraph indocs/developer-guide/architecture.mdthat places it
as a cross-cutting layer feeding into the existing 6-stage pipeline.