Website: https://capa-language.com/
A complete front-end for the Capa programming language: lexer, parser, semantic analyzer, Python transpiler and runtime, all hand-written in Python.
For the first time, Capa programs run.
What would Capa have caught? A walkthrough of the event-stream npm supply-chain attack from 2018, showing concretely how the language structurally prevents that class of attack.
$ python -m capa --run examples/grades.capa
=== Roster ===
Ana: 17.5 (Excellent)
Bruno: 13.0 (Pass)
Carla: 8.5 (Fail)
Diogo: 15.5 (Good)
Eva: 11.0 (Pass)
Filipe: 19.0 (Excellent)
Statistics:
Average: 14.083333333333334
Passed: 5
Failed: 1Capa/
├── capa/ # Python package implementing the compiler
│ ├── __init__.py # public package exports
│ ├── __main__.py # enables `python -m capa ...`
│ ├── cli.py # command-line utility
│ ├── tokens.py # TokenKind, Token, Pos, KEYWORDS
│ ├── errors.py # LexerError with pedagogical formatting
│ ├── lexer.py # the lexer implementation
│ ├── capa_ast.py # AST nodes + dump pretty-printer
│ ├── parser.py # recursive-descent parser
│ ├── typesys.py # internal type representation
│ ├── analyzer.py # name resolution + type checking + capabilities
│ ├── transpiler.py # codegen for Python 3.10+
│ └── runtime/
│ └── __init__.py # Result, Option, Stdio, Fs, ..., Unsafe, py_import
├── tests/ # 536 unit + end-to-end tests
│ ├── test_lexer.py
│ ├── test_parser.py
│ ├── test_analyzer.py
│ └── test_transpiler.py # transpile and execute Capa programs
├── examples/ # 21 .capa files demonstrating the language
│ ├── hello.capa # hello world
│ ├── basics.capa # several constructs
│ ├── tasks.capa # canonical EBNF example
│ ├── grades.capa # non-trivial program (~110 lines)
│ ├── io.capa # exercises Result and the ? operator
│ ├── demo_event_stream.capa # supply-chain attack walkthrough (see docs/)
│ ├── net_attenuation.capa # capability attenuation (WhitePaper §4.3)
│ ├── user_capabilities.capa # user-defined capabilities (WhitePaper §4.6)
│ ├── python_interop.capa# Python boundary under the Unsafe capability
│ └── errors.capa # test fixture with semantic errors
├── docs/ # tutorial, reference, stdlib, getting-started
├── Capa-EBNF.md # formal grammar of the language
├── Capa-WhitePaper.md # technical rationale + roadmap
├── pyproject.toml # package metadata
├── LICENSE # MIT
└── README.md
Note on module names: capa_ast.py (instead of ast.py) and
typesys.py (instead of types.py) avoid colliding with Python stdlib
modules, collisions that cause subtle circular-import errors when the
package is invoked via python -m capa.
.capa
↓ Lexer tokens with significant indentation
↓ Parser AST
↓ Analyzer name resolution + types
↓ Transpiler Python 3.10+ code
↓ Runtime imports Result, Stdio, Fs, ...
↓ python execute Capa program running
Download the binary for your platform from the latest release:
| Platform | File |
|---|---|
| Linux x86_64 | capa-linux-x86_64 |
| macOS Apple Silicon (M1 and later) | capa-macos-arm64 |
| Windows x86_64 | capa-windows-x86_64.exe |
Intel Macs are not shipped as a pre-built binary; install from source (see below).
# Linux / macOS
curl -L https://github.com/nelsonduarte/capa/releases/latest/download/capa-linux-x86_64 -o capa
chmod +x capa
./capa --run hello.capaThe binary bundles a Python interpreter and the Capa runtime via
PyInstaller, so end users do not need Python installed. Each release
also ships a .sha256 checksum file alongside each binary; verify
the download before running:
sha256sum -c capa-linux-x86_64.sha256From the project root (Capa/):
pip install -e . # or just use `python -m capa` directlyTo build a binary yourself:
pip install pyinstaller>=6.0
pyinstaller deploy/capa.spec
./dist/capa --run examples/hello.capaA VSCode extension providing syntax highlighting lives in
vscode/. It is not on the Marketplace yet; install it
manually with a symlink (ln -s "$(pwd)/vscode" ~/.vscode/extensions/capa-language
on macOS/Linux, New-Item -ItemType Junction ... on Windows) and
reload VSCode. A full LSP is on the roadmap.
A five-page static site lives in docs/:
docs/index.html, landing pagedocs/why.html, the case for the languagedocs/tour.html, guided language tourdocs/start.html, install + first program + CLI referencedocs/roadmap.html, honest status and roadmap
No JavaScript, no framework, no external fonts; one stylesheet (docs/style.css).
Preview locally with cd docs && python -m http.server and open
http://localhost:8000/. Once GitHub Pages is enabled for the docs/
directory it will be the public-facing site for the project.
# Tokenize
python -m capa examples/hello.capa
# Parse (AST)
python -m capa --parse examples/tasks.capa
# Analyze (lex + parse + semantic check)
python -m capa --check examples/io.capa
# Transpile to Python (prints to stdout)
python -m capa --transpile examples/grades.capa
# Run the Capa program (transpile + execute)
python -m capa --run examples/grades.capa
# Emit a JSON capability manifest (per-function caps + attributes)
python -m capa --manifest examples/manifest_demo.capa
# Emit a CycloneDX 1.5 SBOM with capability metadata embedded as properties
python -m capa --cyclonedx examples/manifest_demo.capa
# Emit a self-contained HTML doc page built from /// doc comments
python -m capa --doc examples/documented_demo.capa > demo.htmlfrom capa import Lexer, Parser, analyze, transpile
source = open("program.capa", encoding="utf-8").read()
tokens = Lexer(source, filename="program.capa").lex()
module = Parser(tokens, source=source, filename="program.capa").parse_module()
result = analyze(module, source=source, filename="program.capa")
if not result.ok:
for e in result.errors:
print(e.format())
else:
code = transpile(module, filename="program.capa")
print(code)python -m unittest discover tests536 tests (lexer + parser + analyzer + transpiler). The transpiler suite actually executes the generated Python and checks stdout, the only honest way to test a transpiler.
| Capa | Generated Python |
|---|---|
fun f(x: T) -> R |
def f(x): (no annotations) |
let x = e / var x = e |
x = e |
if/while/for |
the same |
match |
match/case (Python 3.10+) |
type T { ... } (struct) |
@dataclass class T: |
type T = A | B(P) (sum) |
classes + alias T = A | B |
trait T |
class _Trait_T: (informative) |
impl T |
methods attached to T |
obj.meth(x) |
obj.meth(x) |
Some(x) / Ok(x) / Err(e) |
the same (runtime classes) |
e? (try) |
_capa_try(e) + @_capa_wrap decorator |
"hi ${name}" |
f"hi {name}" |
if/while/forare statements. The ternaryif cond then a else bis the only expression form ofif.
- String interpolation is recognised but not recursively tokenised in
the lexer; the transpiler handles it afterwards. It works for simple
cases (
${ident},${a.b},${a + b}) but the content between${...}is interpreted as direct Python code, not pure Capa. - Raw strings (
r"...") are not supported.
-
Capability discipline (three layers):
Structural layer (v1): capabilities (
Stdio,Fs,Net,Env,Proc,Clock,Random,Db,Unsafe) can only appear in function parameters. The analyzer rejects:- capabilities as struct fields
- capabilities as variant payloads
- capabilities as function return types
- capabilities as constant types
- capabilities bound in local
let/var - capabilities inside generic types (
List<Stdio>,Option<Fs>...) or tuples
Flow layer (v2):
- Non-aliasing in calls: the same capability cannot appear as
multiple arguments of the same call, nor simultaneously as receiver
and argument.
f(stdio, stdio)is an error. - Must-use: a capability declared as a parameter must be used at
least once in the function body. Convention: prefix the name with
_to silence the warning (idiomatic in Rust and Haskell).
Linear layer (v3),
consumekeyword:- Optional move semantics: marking a parameter as
consume cap: Capindicates the function consumes (takes ownership of) the passed capability. After the call, the caller can no longer use that capability. - Flow analysis with fork/merge: in
if/elif/elseandmatch, we snapshot the consumed set before each branch and take a conservative union after, if any branch consumes, the cap is considered consumed from that point on. This is the rule used by Rust, and prevents use after potential consume. - Loops with dry-run + redo: for
whileandfor, we perform a silent pass over the body to discover which caps will be consumed; we then pre-mark those caps as consumed and do the real pass. This catches "consume in iteration 1, use in iteration 2", the linearity failure we used to miss in loops. - By default, parameters are borrows: the caller can keep using the cap. Typical pattern: several borrows followed by a final consume.
Together, the three layers give full linearity for common usage: capabilities cannot be duplicated structurally, no aliasing per call, must-use, and consume is strictly verified with flow analysis including branches and loops.
-
Local generic inference. The checker infers type arguments in three contexts:
- Variant constructors with a payload:
Ok(42)producesResult<Int, ?>,Some("hi")producesOption<String>. Type params not inferable from the payload (likeEinResult<T, E>) remainTyUnknown. - Function calls on generic functions: given
fun first<T>(xs: List<T>) -> T, the callfirst([1, 2, 3])infersT = Intand returnsInt. - Generic struct literals:
Pair { first: 1, second: "x" }producesPair<Int, String>ifPair<A, B>was declared.
Inference is local (each call is an isolated problem, no let-polymorphic generalisation). It is implemented via simple unification with substitutions, no constraint-set solving. Good enough for common cases; where inference fails, it returns
TyUnknown, which is compatible with anything. - Variant constructors with a payload:
-
Closures (lambdas) v2. Syntax:
fun (params) -> Ret => body, wherebodyis either a single expression OR an indented block:// Single-expression let double = fun (x: Int) -> Int => x * 2 // Block-body with explicit return let log = fun (x: Int) -> Int => stdio.println("got ${x}") return x * 10Function types as annotations:
Fun(Int, Int) -> Int. Enables higher-order functions:fun apply(f: Fun(Int) -> Int, x: Int) -> Int return f(x) fun compose(f: Fun(Int) -> Int, g: Fun(Int) -> Int) -> Fun(Int) -> Int return fun (x: Int) -> Int => g(f(x))Linearity in closures:
- Captures are borrows: a closure can capture a capability from the enclosing scope and use it for borrows (calls that do not consume).
- Captures cannot be consumed: trying to consume a captured cap
is rejected, because the closure can be invoked multiple times
but the cap can only be consumed once. This is the distinction
between
FnandFnOncein Rust, resolved by capture analysis. The rule also applies to block-body lambdas. - Capabilities as parameters of the closure itself can be consumed: each invocation receives its own, without sharing.
Current limitations:
- Block-body lambdas cannot appear directly inside
(...). Capa relies on NEWLINE/INDENT/DEDENT to delimit the block, and the lexer suppresses those tokens inside parentheses for implicit line continuation. Writingf(fun (x) => <block>, 5)is therefore rejected with a targeted parser error pointing at the recommended workaround: bind the lambda to aletfirst, then pass the binding (let body = fun (x) => <block>; f(body, 5)), or use a single-expression body. This is a deliberate restriction, not a bug; the same root cause applies to indent-formmatchinside parens. - No let-polymorphic generalisation.
-
Standard library: builtin methods on
List<T>.length,push,contains,map,filter,fold, fully verified by the checker. Polymorphic types:map<U>(Fun(T) -> U) -> List<U>(T from the receiver),filter(Fun(T) -> Bool) -> List<T>,fold<U>(U, Fun(U,T) -> U) -> U.let xs = [1, 2, 3, 4, 5] let evens = xs.filter(fun (x: Int) -> Bool => x % 2 == 0) let doubled = xs.map(fun (x: Int) -> Int => x * 2) let total = xs.fold(0, fun (acc: Int, x: Int) -> Int => acc + x) -
Multi-line method chaining. When a line starts with
., the lexer suppresses the NEWLINE/INDENT, allowing idiomatic chaining:let total = xs .filter(fun (x: Int) -> Bool => x > 0) .map(fun (x: Int) -> Int => x * x) .fold(0, fun (acc: Int, x: Int) -> Int => acc + x)//comments between chain steps are tolerated. It also works with multi-line field access. List literals are transpiled toCapaList(...), a subclass of Python'slist. -
Standard library: builtin methods on
String.length,trim,to_upper,to_lower,contains,starts_with,ends_with,split,replace, fully verified by the checker. Types:length() -> Int,to_upper() -> String,trim() -> Stringcontains(s: String) -> Bool,starts_with(s: String) -> Boolsplit(sep: String) -> List<String>,replace(old: String, new: String) -> String
let normalised = " Hello World ".trim().to_lower() let parts = "one,two,three".split(",")Implementation: the transpiler does type-aware dispatch, it receives the type mapping from the analyzer and, for receivers of type
String, maps Capa methods (length,to_upper, etc.) to their Python equivalents (len,.upper(), etc.). For user-defined types andList<T>(whose methods have the same names), it emits direct Python method calls. -
Interpolated strings as real expressions.
"${expr}"is parsed intoInterpolatedString(parts: list[str | Expr])with each${...}as a full Capa expression. Type-checking works inside interpolations; method calls in interpolations dispatch correctly:"${s.length()}"emitsf"{len(s)}".$$is the escape for$. -
Standard library:
Map<K, V>andSet<T>. Data structures with builtin methods verified by the checker.Map:
length,get(returnsOption<V>),set,contains_key,keys(returnsList<K>),values(returnsList<V>).Set:
length,add,remove,contains,to_list.Construction: builtin functions
new_map()andnew_set()return empty instances. To pin the type params, annotate thelet:let counts: Map<String, Int> = new_map() counts.set("ana", 30) match counts.get("ana") Some(n) -> stdio.println("age = ${n}") None -> stdio.println("not found") let unique: Set<String> = new_set() unique.add("a") unique.add("b") unique.add("a") // ignored, sets are uniqueImplementation: Map uses a Python
dict, Set uses a Pythonset. The transpiler does type-aware dispatch:m.get(k)→ a ternary expressionSome(m[k]) if k in m else None_. -
Typed capabilities (
Stdio,Fs,Env,Clock,Random,Net): all methods have precise types in the checker, instead of the old permissiveTyUnknownfallback.Capability Methods Return Stdioprint,println,eprintln(s: String)()Stdioread_line()Result<String, IoError>Fsread(p: String)Result<String, IoError>Fswrite(p: String, c: String)Result<(), IoError>Fsexists(p: String)BoolFsrestrict_to(prefix: String)Fs(attenuated)Fsallows(path: String)BoolEnvget(name: String)Option<String>Envargs()List<String>Envrestrict_to_keys(keys: List<String>)Env(attenuated)Envallows(name: String)BoolClocknow_secs(),now_monotonic()FloatClocksleep(seconds: Float)()Clockrestrict_to_after(t: Float)Clock(attenuated)Clockallows()BoolRandomint_range(low: Int, high: Int)IntRandomfloat_unit()FloatRandomwith_seed(seed: Int)Random(deterministic)Netrestrict_to(host: String)Net(attenuated)Netallows(host: String)BoolNetget(url: String)Result<String, IoError>Consequence:
clock.sleep(1)is now an error (expects Float).fs.read(42)is an error (expects String). Andmatch fs.exists(p) Ok(_) -> ...is an error becauseexistsreturnsBool, notResult. -
First-class capability attenuation (
Net.restrict_to). The result of attenuation is a fresh capability, bindable inlet(the structural "no caps in locals" rule is relaxed specifically for method-call results, since those cannot be aliases). The restriction is monotonic: chaining tworestrict_tocalls intersects their allowed-host sets, never widens. Runtime check fires before any system call, so a blocked host never touches the network. Seeexamples/net_attenuation.capa. -
User-defined capabilities (
capability X { ... }, WhitePaper §4.6). Libraries can declare their own capabilities,SendEmail,QueryDB,PublishMessage, and types that implement them are treated as capabilities by the discipline (no aliasing, no storing in plain locals, no leaking through generic args). The structural rules are relaxed in two surgical ways: a type that implements a user-defined capability may hold built-in caps as struct fields (encapsulation), and a regular function may return a user-defined cap (factory pattern). Built-in caps still cannot be returned, so the chain of authority stays explicit at every link. Seeexamples/user_capabilities.capa.Extraction helpers on
JsonValue:is_null() -> Bool,as_bool() -> Option<Bool>,as_num() -> Option<Float>,as_string() -> Option<String>,as_array() -> Option<List<JsonValue>>,as_object() -> Option<Map<String, JsonValue>>. Each returnsNoneif the variant does not match, eliminating boilerplate matches.Methods on
Option<T>:is_some() -> Bool,is_none() -> Bool,unwrap_or(default: T) -> T,map<U>(Fun(T) -> U) -> Option<U>,and_then<U>(Fun(T) -> Option<U>) -> Option<U>,ok_or<E>(err: E) -> Result<T, E>.Same on
Result<T, E>:is_ok,is_err,unwrap_or,map<U>(Fun(T) -> U) -> Result<U, E>,and_then<U>(Fun(T) -> Result<U, E>) -> Result<U, E>,map_err<F>(Fun(E) -> F) -> Result<T, F>.Together they enable idiomatic functional code without chained matches:
let n = parse_int(s).map(fun (x: Int) -> Int => x * 2).unwrap_or(0) // ok_or converts Option into Result let r: Result<Int, String> = parse_int(s).ok_or("invalid input") // map_err converts the error type let r2 = fs.read(p).map_err(fun (e: IoError) -> Int => 1) -
if-expression (ternary):if cond then e1 else e2as an inline expression. Thethenkeyword is required in this form, withoutthen,ifis interpreted as a statement in the appropriate context.let cat = if n > 0 then "+" else if n < 0 then "-" else "0" // Useful in single-line closures let abs_xs = xs.map(fun (x: Int) -> Int => if x < 0 then 0 - x else x) // And in monadic pipelines let par = parse_int(s).and_then(fun (x: Int) -> Option<Int> => if x > 0 then Some(x * 2) else None )The condition must be
Bool, the branches must have compatible types. Precise errors:if 42 then "a" else "b"→condition must be Bool, got Int. -
Collection helpers:
List<T>:is_empty() -> Bool,first() -> Option<T>,last() -> Option<T>,get(i: Int) -> Option<T>(safe access)String:is_empty() -> BoolMap<K, V>:is_empty() -> BoolSet<T>:is_empty() -> Bool
-
JSON in the standard library. A built-in
JsonValuetype with 6 recursive variants:JNull,JBool(Bool),JNum(Float),JStr(String),JArr(List<JsonValue>),JObj(Map<String, JsonValue>). Functionsparse_json(s: String) -> Result<JsonValue, String>andto_json(v: JsonValue) -> String.match parse_json(input) Err(msg) -> stdio.eprintln("error: ${msg}") Ok(config) -> match config JObj(m) -> match m.get("name") Some(JStr(name)) -> stdio.println("name: ${name}") _ -> stdio.println("name missing") _ -> stdio.println("not an object") // Build and serialise let resp: Map<String, JsonValue> = new_map() resp.set("status", JStr("ok")) resp.set("count", JNum(42.0)) let s = to_json(JObj(resp))Exhaustiveness works naturally,
match j: JsonValuerequires coverage of all 6 variants or a catch-all_. -
Builtin conversion functions:
parse_int(s: String) -> Option<Int>, returnsNoneon invalid inputparse_float(s: String) -> Option<Float>, same for floats
Idiomatic interactive program:
fun ask(stdio: Stdio, prompt: String) -> Option<String> stdio.print(prompt) return match stdio.read_line() Ok(line) -> Some(line.trim()) Err(_) -> None fun main(stdio: Stdio) match ask(stdio, "Age: ") Some(s) -> match parse_int(s) Some(n) -> stdio.println("You are ${n}.") None -> stdio.println("Invalid age.") None -> stdio.println("EOF.") -
Pattern matching with type-param substitution.
match m.get(k)wherem: Map<String, Int>infersSome(n)withn: Int, notn: T. Substitution of owner type params by the scrutinee's concrete type args is applied automatically. -
Advanced pattern matching:
Tuples: deconstruction in
let,var,for, andmatch:let (name, age) = person() match pair (1, s) -> stdio.println(s) (n, _) -> stdio.println("${n}")Nested patterns:
(Some(n), label)matches a tuple whose first element isSome(...). The pattern's arity is checked against the scrutinee's type.String literals in match:
"help" -> ...,"quit" -> ...- useful for command dispatchers.Or-patterns:
A | B -> ...matches if any alternative matches. Useful for arms that share a body:match cmd "h" | "help" | "?" -> stdio.println("...") "q" | "quit" | "exit" -> stdio.println("goodbye") _ -> stdio.println("unknown") match c Red | Yellow -> "warm" Blue | Green -> "cold"Each alternative in an or-pattern counts toward exhaustiveness checking.
Or-patterns with bindings: each alternative may bind variables, as long as all of them bind the same set of names with compatible types:
type Op = Add(Int) Sub(Int) Mul(Int) fun value(o: Op) -> Int return match o Add(n) | Sub(n) | Mul(n) -> n // n: Int in allInconsistencies are caught:
Add(n) | NoOp→ error becauseNoOpdoes not bindn.AsInt(x) | AsStr(x)→ error becausexhasIntin one andStringin the other. -
Cross-statement inference: types are refined by later uses via shared TyVar.
let xs = []producesList<TyVar(_t)>. The firstxs.push(42)pins_t = Int, and later uses are checked againstList<Int>:let xs = [] xs.push(42) // OK, infers Int xs.push("oops") // error: expects Int, got String let ys = xs // shares the same TyVar ys.push("oops") // also an error: it's already List<Int>Works transitively, passing
xsto a function that expectsList<Int>without an intermediate annotation also works. -
Index on
List<T>[i]returnsT. Other indexable types (Map, etc.) still returnTyUnknown. -
Real method dispatch. Methods defined in
implblocks are registered on the target type's Symbol. Callsobj.method(args)are checked:- The method must exist on the receiver's type
- Arity must match
- Argument types must be compatible (with type-param substitution
from the type:
c.put(x)on aBox<Int>expectsx: Int) Selfin the return type is resolved to the receiver's type- Additional inference applies to the method's type params
Exception: capabilities (
Stdio,Fs, etc.) have no impls in Capa code, their methods are still typed asTyUnknownand resolved at runtime against the Python runtime implementation. -
Match exhaustiveness for sum types and
Bool:- Sum types: every variant must be covered by some arm without
a guard. A wildcard (
_) or a simple binding (other) without a guard acts as a catch-all. Guarded arms do not count toward coverage because the guard may fail. - Bool: requires coverage of
trueandfalse(or_). Messages likenon-exhaustive match on Bool: missing false. - For non-sum types with high cardinality (Int, structs with heterogeneous fields), the checker does not yet require exhaustiveness, intractable for Int (2^64 values) and irrelevant for structs.
- Sum types: every variant must be covered by some arm without
a guard. A wildcard (
- The output is readable Python but not idiomatic, it mirrors the Capa structure closely.
- The
?operator uses an internal exception for propagation. It works and is correct, but is not as efficient as the expanded imperative version.
During development, the transpiler + runtime caught bugs in the design and in canonical examples that the checker had not been able to catch:
matchas a statement with expression arms silently discards the value,tasks.capaproducedNoneinstead of strings. Detected by running the program.- Builtin variants (
Ok,Err,Some) were not marked as having a payload in the analyzer,Ok(n)gave a checking error. Fixed. - Builtin generic types had no
type_params,Resultappeared without args, incompatible withResult<Int, IoError>. Fixed. - Functions that use
?need a decorator that catches the internal exception; without it,Errpropagates to the top of the program. We added an AST detector that applies the decorator automatically.
This was exactly the reason to build the transpiler before linearity - running real code is the harshest test of a design.
- Full trait verification. When you declare an
impl Trait for Type, the checker verifies:- Presence of every method declared in the trait
- Signature compatibility: every implemented method must have
the same arity, parameter types, and return type.
Selfin the trait signature is resolved to the concrete impl type. - Extra methods (helpers) are allowed.
In decreasing order of impact:
- Interactive REPL or watch mode for quick experimentation.
- Match-as-expression in one line with inline syntax (e.g.,
{ }delimiters à la Rust). - More aggressive cross-statement inference: combine with
let xs = []followed by iteration over a concrete type. - Complete documentation (language reference, tutorial, examples).
- Phase 4 of the original roadmap: native LLVM backend.