mind-runtime is an experimental, cross-platform runtime for audio-native embodied-agent research. It treats audio as the primary substrate, uses learned token and latent models throughout the runtime graph, and exposes inspection tools for debugging internal state.
Audio In -> AudioTokenizer -> TemporalBinder -> WorkspaceCore
-> {FastLoop, DeliberativeLoop, SalienceExecutive, MemoryRetriever}
-> IntentionState -> SpeechDecoder -> Audio Out
flowchart TD
audio[Audio In] --> tokenizer[AudioTokenizer]
tokenizer --> binder[TemporalBinder]
binder --> workspace[WorkspaceCore]
workspace --> world[AudioWorldModel]
workspace --> memoryQuery[MemoryRetriever]
workspace --> goalModel[GoalStore / AgentModelBank]
binder --> fastLoop[FastLoop]
world --> deliberative[DeliberativeLoop]
fastLoop --> executive[SalienceExecutive]
deliberative --> executive
memoryQuery --> executive
goalModel --> executive
executive --> intention[IntentionState]
intention --> selfModel[SelfModel]
intention --> speech[SpeechDecoder]
speech --> audioOut[Audio Out]
workspace --> stores[Persistent Side Stores]
stores --> runtime[Runtime Engine]
Persistent side stores:
SelfModelAgentModelBankGoalStoreEpisodicMemorySemanticMemorySkillMemoryCalibrationStore
- Experimental research codebase, not production software.
- Trainable PyTorch implementations are included for the tokenizer, binder, workspace, world model, executive, self model, goal manager, agent-state model, semantic/skill distillers, calibration model, memory query path, and speech decoder.
- No pretrained checkpoints are bundled with the repository.
- By default, the runtime requires a trained checkpoint bundle and fails fast if one is not available.
- Streaming audio ingestion with a 40 ms analysis window and 20 ms hop
- Dual synchronized token streams:
- acoustic stream at 50 Hz with 256-d embeddings and 4 residual codebooks
- semantic/event stream at 12.5 Hz with 512-d embeddings and 2 residual codebooks
TemporalBinderfor event boundaries, turn structure, speaker/source continuity, and local causal lag estimatesWorkspaceCorewith explicit entity, event, hypothesis, and background-thread slotsAudioWorldModelwith 200 ms, 1 s, and 4 s horizon-aware prediction heads and rollout features- learned
GoalManagerModel,AgentStateModel,SemanticMemoryDistiller,SkillMemoryDistiller, andCalibrationModelcomponents that feed runtime side stores without hand-seeded skill routines or runtime calibration reweighting SalienceExecutive/ArbitrationExecutivecovering the full action space:continuecontinue_with_more_computemonitor_backgrounddeferpartial_interruptfull_interruptcheckpoint_and_suspendterminateretrieve_memoryresume_prior_threadspawn_new_thread
SpeechDecoderthat generates semantic and acoustic logits for every codebook plus waveform output- Inspectable silent-state, inner-audio, and thread views through a FastAPI server and browser dashboard
- Human-operated curriculum management plus concrete stage trainers via
mind-curriculumandmind-train - Stage-aware training data flow with horizon-aligned audio targets for Stage 2 and trajectory / synthetic-scenario support for Stages 4 and 5
Python 3.11+ is required.
Create and activate a Python 3.11+ virtual environment first. On a fresh checkout, verify the active interpreter before installing:
python --versionInstall the base package:
python -m pip install -e .Install optional extras for live audio, local ANN storage, and development tooling:
python -m pip install -e ".[audio,memory,dev]"mind-curriculum init
mind-curriculum status
mind-curriculum stage-brief stage0_tokenizer_pretrainingUse the templates and workflow documented in docs/training-curriculum.md:
mind-curriculum register-dataset /path/to/dataset_manifest.yaml
mind-curriculum create-run stage0_tokenizer_pretraining "tokenizer-pretrain-v1" --owner "you" --dataset your_dataset_idmind-train execute-run <run_id> --root ops/curriculummind-server --host 127.0.0.1 --port 8000 --checkpoint-dir /path/to/promoted/bundleThen open:
http://127.0.0.1:8000/http://127.0.0.1:8000/inspect/full
The server exposes the main inspection surfaces used by the dashboard:
GET /inspect/fullGET /inspect/stateGET /inspect/inner-audioGET /inspect/threadsGET /inspect/storesPOST /inject/audioPOST /threads/resumePOST /threads/terminatePOST /sleep/consolidate
The repository includes a YAML-backed, human-operated training workflow for staged curriculum execution.
Notable contract details in the current trainer stack:
- Stage 2 consumes distinct horizon-aligned future targets instead of reusing one future clip for every head.
- Stage 3 supervises all semantic and acoustic codebooks and includes loopback/self-other losses.
- Stage 4 uses trajectory rollouts with runtime-consistent fast-loop, deliberative, goal, agent, and calibration features.
- Stage 5 now has a real non-zero training path for goal promotion, persistence, retirement, G0 alignment, memory distillation, and agent-state learning.
Main commands:
mind-curriculum init
mind-curriculum status
mind-curriculum next-stage
mind-curriculum stage-brief stage0_tokenizer_pretraining
mind-train execute-run <run_id> --root ops/curriculumSee docs/training-curriculum.md for dataset manifest expectations, stage ordering, checkpoint logging, evaluation, and promotion flow.
This repository intentionally does not hide missing training behind heuristic runtime fallbacks.
- The default runtime path expects trained checkpoints.
- If
config.model.strict_checkpoint_loadingis enabled, booting without a bundle raises an error. mind-server --allow-random-initandconfig.model.allow_random_init_for_training = Trueare available for debugging and test coverage only. They are not substitutes for trained or promoted checkpoints.
src/mind/models/v1.py: trainable v1 model bundlesrc/mind/runtime_engine/: runtime orchestration package and subsystem side-store integrationsrc/mind/api/server.py: inspection API and dashboard serversrc/mind/audio_io.py: optional live audio I/Osrc/mind/training/: curriculum admin, stage-aware datasets, trainers, and runnerdocs/training-curriculum.md: operator runbooktests/: unit tests
- Apple Silicon: prefer a native
arm64Python environment; PyTorch can run on CPU ormps. - Linux: install PortAudio if you want live audio via
sounddevice. - Windows:
sounddevicecan use ASIO ifSD_ENABLE_ASIO=1is set before import and the installed PortAudio build supports it.
Run the test suite:
python -m unittest discover -s testsUse the python from the active Python 3.11+ environment. If you want pytest and ruff, install the development extra:
python -m pip install -e ".[dev]"
python -m ruff check .Useful Python extras and libraries used in this project include: