ALIENS — Adaptive Literacy Instruction in English and Spanish

ALIENS is a hybrid deterministic-LLM system for adaptive reading instruction. It takes a single canonical source passage, generates learner-appropriate variants that preserve the same protected meaning units and lesson-critical vocabulary, and diagnoses each learner's reading outcome to guide the next cycle. It operates in English and Spanish and includes a dedicated extension for learners with dyslexia.

Overview

A classroom using ALIENS works like this:

A curriculum author writes one source passage at the level the subject demands.
ALIENS extracts the passage's protected meaning units, causal sequence, and required vocabulary — the canonical representation — in a single LLM call.
For each learner, ALIENS generates a family of candidate variants tuned to that learner's band, scaffold profile, and need dimensions.
The deterministic engine scores every candidate against the canonical constraints, selects the best one, and delivers it.
The learner reads their variant and completes a six-item diagnostic assessment.
ALIENS scores the assessment, constructs a signal profile, and classifies the outcome into one of seven diagnostic labels.
The learner state is updated deterministically. The next cycle adapts.

Every learner reads about the same events, encounters the same required vocabulary, and can participate in the same class discussion. What changes is the surface complexity of the text through which they access that shared content.

Repository structure

aliens/
├── alien_system.py              # Core engine — English        (2,349 lines)
├── alien_system_es.py           # Spanish edition              (2,867 lines)
├── alien_dyslexia.py            # Dyslexia extension module      (897 lines)
├── test_alien.py                # English test suite  — 389 tests
├── test_alien_es.py             # Spanish test suite  — 184 tests
├── test_alien_dyslexia.py       # Dyslexia test suite — 132 tests
├── test_fixes.py                # Fix regression suite  — 86 tests
├── test_system.py               # System integration test — 167 tests
├── README.md
├── CONTRIBUTING.md
├── CHANGELOG.md
├── SECURITY.md
├── ALIEN_API_INTEGRATION_GUIDE.md
├── PRODUCTION_READINESS_REPORT.md
├── DESIGN_JUSTIFICATION.md
└── LICENSE

Total: 11,253 lines across eight Python files. 958 tests, all passing. Standard library only — no external dependencies.

How it works

Phase 1 — Prepare

canonicalize_passage()   → CanonicalPassage          [1 LLM call]
build_candidate_plan()   → list[dict]                [deterministic]
generate_candidates()    → list[CandidatePassage]    [1 LLM call]
score_candidate()        → DeterministicScores       [deterministic, per candidate]
estimate_fit()           → dict[str, FitEstimate]    [1 LLM call]
select_candidate()       → CandidatePassage          [deterministic]
generate_assessment()    → AssessmentPackage         [1 LLM call]

Phase 2 — Complete

score_mcq()              → float                     [deterministic]
score_retell()           → dict                      [1 LLM call, deterministic fallback]
build_reading_signals()  → ReadingSignals            [deterministic]
diagnose_outcome()       → DiagnosisLabel            [1 LLM call, deterministic fallback]
update_learner_state()   → LearnerState              [deterministic]

6 LLM calls per cycle. Every hard constraint — meaning coverage, sequence validation, vocabulary coverage, candidate selection, MCQ scoring, learner state updates — is deterministic. LLM calls handle open-ended generation and qualitative judgment; the engine enforces correctness.

Canonical invariants

The deterministic engine blocks any candidate that:

Drops a required meaning unit (meaning coverage below the scaled threshold)
Presents meaning units out of the protected sequence
Omits a required vocabulary term
Fails the LLM's own self-audit (meaning_preserved = False)

Candidates with only surface-measure warnings (FK outside tolerance, length deviation) are eligible for DEGRADED selection — the best available candidate is used and a differentiated log message is written, distinguishing FK-only degradation (expected for academic vocabulary at lower bands) from structural degradation (LLM generation failure requiring quality review).

Quick start

Requirements

Python 3.11+
Standard library only (re, json, logging, dataclasses, uuid, time)
An LLM backend implementing the LLMBackend protocol

Installation

git clone https://github.com/jordankingAI/aliens.git
cd aliens
python test_alien.py           # 389 tests
python test_alien_es.py        # 184 tests
python test_alien_dyslexia.py  # 132 tests
python test_fixes.py           #  86 tests
python test_system.py          # 167 tests — full system integration

Minimal usage — English

from alien_system import AdaptiveReadingSystem, LearnerState, Level, ReadingTelemetry

class MyLLM:
    def complete_json(self, system_prompt: str, user_prompt: str) -> dict:
        # call your LLM API here and return parsed JSON
        ...

system  = AdaptiveReadingSystem(llm=MyLLM())
learner = LearnerState(
    learner_id="student_001",
    current_band=5.0,
    vocabulary_need=Level.HIGH,
    syntax_need=Level.MEDIUM,
)

# Prepare cycle — 4 LLM calls
prep = system.prepare_cycle(
    source_text="Your source passage here...",
    passage_id="passage_001",
    instructional_objective="Understand the main theme.",
    learner=learner,
)
# prep.selected_candidate.text  → deliver this to the learner
# prep.assessment.items         → present these questions

# Complete cycle — 2 LLM calls
outcome = system.complete_cycle(
    learner=prep.prepared_learner,  # journey-initialised state from prepare_cycle
    prep=prep,
    learner_answers={
        "Q1": "B", "Q2": "C", "Q3": "A", "Q4": "B",
        "Q5": "Learner's retell text here...",
        "Q6": 3,
    },
    telemetry=ReadingTelemetry(
        fluency_score=0.75,
        hint_use_rate=0.15,
        reread_count=2,
        completion=True,
    ),
)

print(outcome.diagnosis)        # e.g. DiagnosisLabel.WELL_CALIBRATED
print(outcome.updated_learner)  # updated LearnerState — store for next cycle

Dyslexia-aware usage

from alien_dyslexia import (
    DyslexiaAwareSystem, DyslexicReadingTelemetry, seed_dyslexic_learner
)

# CRITICAL: seed from LISTENING COMPREHENSION, not a decoding or silent-reading score.
# Seeding from a decoding score is the most damaging deployment error.
learner = seed_dyslexic_learner(
    learner_id="student_002",
    comprehension_band=7.5,
    vocabulary_need=Level.MEDIUM,
)

system = DyslexiaAwareSystem(llm=MyLLM())

telemetry = DyslexicReadingTelemetry(
    fluency_score=0.29,     # slow decoding — not comprehension difficulty
    hint_use_rate=0.51,     # pronunciation lookups — not meaning scaffolding
    reread_count=10,
    completion=False,       # timing constraint — not cognitive overload
    oral_retell_text="The learner's spoken retell transcribed here...",
)

Learner model

@dataclass(frozen=True)
class LearnerState:
    learner_id:             str
    current_band:           float          # Flesch-Kincaid grade equivalent
    vocabulary_need:        Level          # LOW / MEDIUM / HIGH
    syntax_need:            Level
    cohesion_need:          Level
    support_dependence:     Level
    readiness_to_increase:  Level
    recent_outcomes:        tuple[DiagnosisLabel, ...]  # rolling window (default 3)
    target_band:            float | None   # source FK of current passage
    entry_band:             float | None   # band when this passage began
    cycles_on_passage:      int

Level is an ordered three-value enum: LOW < MEDIUM < HIGH. Each field exposes .up() and .down() methods capped at the enum boundaries. Learner state updates are fully deterministic and apply the same rules on every cycle with no judgment calls.

Diagnosis labels

Label	Primary trigger	Band	Key need update
`underchallenged`	comp ≥ 0.85 · fluency ≥ 0.75 · hints ≤ 0.10 · retell ≥ 0.75	+1 step	readiness → HIGH
`well_calibrated`	adequate completion, no barrier	—	readiness UP
`successful_but_support_dependent`	comp ≥ 0.70, hints ≥ 0.30	—	support_dependence UP
`cohesion_inference_barrier`	comp ≥ 0.70, inference < 0.55	—	cohesion_need UP
`vocabulary_barrier`	comp < 0.70, vocab_need ≥ syntax_need	—	vocabulary_need UP
`syntax_barrier`	comp < 0.70, syntax_need > vocab_need	−1 step if severe	syntax_need UP
`overloaded`	comp < 0.50, OR incomplete, OR (hints ≥ 0.30 AND retell < 0.50)	−1 step	all needs UP

Diagnosis is LLM-primary with a deterministic fallback. If the LLM call fails or returns an invalid label, the engine applies the rule table above without interrupting the cycle.

Dyslexia extension

alien_dyslexia.py is a zero-modification wrapper around alien_system.py — no changes to the core engine. It adds three layers:

Data types — DyslexicLearnerState, DyslexicReadingTelemetry (with oral_retell_text), DyslexicReadingSignals (raw/adjusted signal pairs).

Engine — DyslexiaAwareDeterministicEngine adds a decoding_support slot to the candidate plan and checks for decoding_barrier before applying standard diagnostic labels.

Orchestrator — DyslexiaAwareSystem applies three signal adjustments when decoding_disability = True AND comprehension ≥ 0.70:

Signal	Adjustment	Rationale
`fluency_score`	`max(raw, comprehension × 0.9)`	Slow decoding ≠ poor understanding
`hint_use_rate`	`raw × 0.5`	Pronunciation lookups ≠ meaning scaffolding
`completion`	`True`	Timing constraint ≠ cognitive overload

Without these corrections, a dyslexic learner with grade-level comprehension and slow decoding receives the same raw signal profile as a learner who is genuinely overloaded. The system would diagnose overloaded, drop the band, and consign the learner to material well below their intellectual level. The adjustments prevent this misclassification.

The oral_retell_text field in DyslexicReadingTelemetry allows a transcribed spoken retell to be scored in place of the written Q5 response, which dyslexia affects directly.

Spanish edition

alien_system_es.py is a self-contained Spanish port sharing no runtime dependencies with the English module except ALIENError.

Feature	English	Spanish
Readability	Flesch-Kincaid Grade Level	Szigriszt-Pazos Perspicuity Index (1992)
Syllable counting	English rules	Spanish phonological rules (vowel nuclei)
Stemmer	English inflection rules	Spanish inflection rules (gender, number, verb morphology)
Stopword list	76 words	176 words including gendered variants
Negation set	`no, not, never, none…`	`no, nunca, jamás, nada, nadie, ningún…`
DEGRADED log language	English	Spanish
`ALIENError`	Defined in this module	`from alien_system import ALIENError`

alien_system_es.ALIENError is alien_system.ALIENError evaluates to True. A single except ALIENError handler catches errors from both modules — essential for any language-routing pipeline.

Engine configuration

@dataclass(frozen=True)
class EngineConfig:
    band_step:                      float = 0.8
    # FK tolerance — see domain guidance below
    fk_tolerance:                   float = 1.2
    overall_meaning_threshold:      float = 0.75  # scales down with passage distance
    vocabulary_threshold:           float = 0.85
    length_deviation_threshold:     float = 0.40
    severe_comprehension_threshold: float = 0.50
    low_hint_use_threshold:         float = 0.10
    high_hint_use_threshold:        float = 0.30
    history_limit:                  int   = 3

FK tolerance by domain

Domain	Recommended `fk_tolerance`
General / narrative fiction	1.2 – 2.0 (default covers most cases)
Academic / technical content	2.5 – 4.0
Highly specialised content	4.0 – 6.0 (treat FK as advisory only)

For academically dense passages, required domain vocabulary is inherently polysyllabic regardless of sentence simplicity. Rewrites for lower-band learners will consistently score FK above the target band. This is a property of the domain vocabulary, not a generation failure. When all candidates are DEGRADED solely due to FK/length warnings, prepare_cycle logs a specific "DEGRADED (FK/length surface warnings only)" message to distinguish it from structural DEGRADED (missing meaning units or broken sequence).

Testing

python test_alien.py           # 389 tests — full English engine
python test_alien_es.py        # 184 tests — full Spanish engine
python test_alien_dyslexia.py  # 132 tests — dyslexia extension
python test_fixes.py           #  86 tests — fix regression suite
python test_system.py          # 167 tests — system integration (EN + ES pipelines end-to-end)

# Total: 958 tests. Standard library only. No test runner required.

The fix regression suite covers five specific repairs validated against the bugs discovered during end-to-end simulation:

Fix	What it tests
1	Multi-word anchors split word-by-word (15 anchor phrases, EN + ES)
2	`sentence_unit_match_score` returns `0.0` not `1.0` for empty `unit_toks`
3	`ALIENError` shared identity across EN and ES modules
4	DEGRADED log differentiates FK-only from structural failures
5	`split_sentences` handles terminal punctuation inside quotation marks

LLM backend

class LLMBackend(Protocol):
    def complete_json(self, system_prompt: str, user_prompt: str) -> dict[str, Any]:
        """Return parsed JSON matching the requested output contract."""
        ...

The TaskRoutingMockLLM class is provided for testing. It routes by the task key in the JSON user prompt and supports error injection for testing fallback paths:

from alien_system import TaskRoutingMockLLM

mock = TaskRoutingMockLLM(
    responses={
        "canonicalize_passage": {...},
        "generate_candidates":  {...},
        "estimate_fit":         {...},
        "generate_assessment":  {...},
        "score_retell":         {...},
        "diagnose_outcome":     {...},
    },
    error_on_tasks={"score_retell"},  # test deterministic fallback
)

All six task types have system prompts in PromptLibrary. Output contracts are validated before parsing; a ValidationError is raised on contract violations.

Deployment guidance

Minimum viable integration

Implement LLMBackend.complete_json() for your provider.
Seed each learner with LearnerState (standard) or seed_dyslexic_learner() (dyslexic). For dyslexic learners, comprehension_band must come from a listening comprehension instrument, not a silent reading or decoding score.
Store outcome.updated_learner between sessions. All persistence is the caller's responsibility — ALIENS holds no state.
Set fk_tolerance in EngineConfig based on your content domain.

What ALIENS does not provide

A normed reading assessment. current_band is an instructional working estimate updated by performance signals, not a validated Lexile or DRA score.
A dyslexia screening or diagnostic tool. decoding_disability = True must be set by a qualified practitioner based on a formal assessment. ALIENS uses the flag to prevent misclassification; it does not detect dyslexia.
A teacher dashboard or reporting interface. ALIENS is a backend engine.
Factual accuracy validation of generated content. The system validates structural contracts — meaning units, sequence, vocabulary. Human review of generated passages is strongly recommended before classroom deployment.

Data and privacy

ALIENS stores no learner data. The LearnerState is a frozen dataclass passed in and returned by value. No network calls are made outside the LLMBackend interface the caller provides.

Pedagogical basis

ALIENS implements five evidence-based instructional design principles:

Zone of Proximal Development (Vygotsky, 1978) — the band system, candidate plan, and push/safety-net slot structure.
Comprehensible Input / i+1 (Krashen, 1985) — the five scaffold dimensions and candidate generator prompt contracts.
Scaffolding and Gradual Release (Wood, Bruner & Ross, 1976; Pearson & Gallagher, 1983) — support_dependence tracking and the withdrawal pathway through well_calibrated cycles.
Formative Assessment for Learning (Black & Wiliam, 1998; Wiliam, 2011) — the six-item diagnostic, seven diagnosis labels, and immediate deterministic state update.
Curriculum Coherence / Shared Content Principle (Schmidt et al., 2005; Porter, 2002) — the canonical passage contract ensures every learner variant covers the same meaning units, in the same sequence, with the same required vocabulary.

For the full theoretical justification, empirical basis, and mapping of each principle to specific system components, see DESIGN_JUSTIFICATION.md.

Contributing

See CONTRIBUTING.md.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

ALIENS — Adaptive Literacy Instruction in English and Spanish

Contents

Overview

Repository structure

How it works

Phase 1 — Prepare

Phase 2 — Complete

Canonical invariants

Quick start

Requirements

Installation

Minimal usage — English

Dyslexia-aware usage

Learner model

Diagnosis labels

Dyslexia extension

Spanish edition

Engine configuration

FK tolerance by domain

Testing

LLM backend

Deployment guidance

Minimum viable integration

What ALIENS does not provide

Data and privacy

Pedagogical basis

Contributing

License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 8 Commits
.github		.github
.gitignore		.gitignore
ALIEN_API_INTEGRATION_GUIDE.md		ALIEN_API_INTEGRATION_GUIDE.md
CHANGELOG.md		CHANGELOG.md
CLAUDE.md		CLAUDE.md
CONTRIBUTING.md		CONTRIBUTING.md
DESIGN_JUSTIFICATION.md		DESIGN_JUSTIFICATION.md
LICENSE		LICENSE
PRODUCTION_READINESS_REPORT.md		PRODUCTION_READINESS_REPORT.md
README.md		README.md
SECURITY.md		SECURITY.md
alien_dyslexia.py		alien_dyslexia.py
alien_system.py		alien_system.py
alien_system_es.py		alien_system_es.py
test_alien.py		test_alien.py
test_alien_dyslexia.py		test_alien_dyslexia.py
test_alien_es.py		test_alien_es.py
test_fixes.py		test_fixes.py
test_system.py		test_system.py

Folders and files

Latest commit

History

Repository files navigation

ALIENS — Adaptive Literacy Instruction in English and Spanish

Contents

Overview

Repository structure

How it works

Phase 1 — Prepare

Phase 2 — Complete

Canonical invariants

Quick start

Requirements

Installation

Minimal usage — English

Dyslexia-aware usage

Learner model

Diagnosis labels

Dyslexia extension

Spanish edition

Engine configuration

FK tolerance by domain

Testing

LLM backend

Deployment guidance

Minimum viable integration

What ALIENS does not provide

Data and privacy

Pedagogical basis

Contributing

License

About

Resources

License

Contributing

Security policy

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages