v0.17.17
Fixed
-
alef-e2e: remove hardcoded project-name special-casing across codegen. Alef is a generic generator — literal consumer-repo or product names in emitted source/output strings are bugs. Sweep across
alef-e2ecodegen: (1)elixir.rsderives the per-project Elixir module viamodule_pathinstead of literalKreuzcrawl; (2)streaming_assertions.rsaddsaccessor_with_module_qualifierso the C# crawl-event branch uses the namespace qualifier and the Rust branch uses the cargo crate name; (3)php.rsdrops the unreachable fallback that hardcoded\HtmlToMarkdown\ConversionOptions; (4)c.rsreplaces the hardcodedLiterllmDefaultClientChatStreamStreamHandlewith{pascal_prefix}DefaultClientChatStreamStreamHandle; (5)java.rstemplates theFormatMetadataDisplayimport via{java_group_id}and removesdefault_java_nested_types(); (6)csharp.rsremovesdefault_csharp_nested_types(); (7)rust/http.rsdoc-comment rewording — emitted code already used thedep_nameparameter correctly. Downstream impact: kreuzberg/kreuzcrawlalef.tomlmust now declarenested_typesexplicitly under their[crates.e2e.calls.*]sections. -
alef-backend-go: exclude struct types from unresolved-type fallback. Wave 7 Go fallback emitted
*json.RawMessagefor all Named types not in enum_names or data_enum_names, but legitimate struct types in the same generated package were incorrectly treated as external/unresolved. This caused type mismatches when struct types likeOcrConfig,ChunkingConfigappeared in fields of other structs (e.g.,ExtractionConfig.Ocrwas emitted as*json.RawMessageinstead of*OcrConfig), breaking Go compilation with "cannot use &v as *json.RawMessage" errors. Added astruct_namesHashSet to track all non-opaque struct type names, passed togen_struct_type, and included in the unresolved check so real struct types fall through to correct typed fields. Unblocks go binding compilation. (crates/alef-backend-go/src/gen_bindings/mod.rs,crates/alef-backend-go/src/gen_bindings/types.rs) -
alef-e2e/java: derive streaming DTO imports from declared adapters instead of hardcoding kreuzcrawl types. The Java test-file generator unconditionally emitted
import <pkg>.CrawlEvent; import <pkg>.CrawlStreamRequest; import <pkg>.BatchCrawlStreamRequest;for every project with a streaming fixture, leaking kreuzcrawl-specific type names into the generic polyglot generator. Other consumers (liter-llm, kreuzberg) ship streaming with different DTOs (e.g.,ChatCompletionChunk+ChatCompletionRequest) and never declare anyCrawl*types, so the maven build failed with threecannot find symbolerrors on every streaming-touched test class. The fix iteratesadapters, filters toAdapterPattern::Streaming, and imports each adapter'sitem_type(skippingChatCompletionChunkwhich is already emitted above) plusrequest_type(stripped of its Rust path prefix). For kreuzcrawl this produces the same three imports as before; for liter-llm it produces none extra. (crates/alef-e2e/src/codegen/java.rs) -
alef-e2e/elixir: handle binary/list return types in min/max_length assertions and zero-arg functions. Elixir e2e tests for binary-returning functions like
render_pdf_page_to_pngfailed withString.length/1type errors because the assertion logic didn't account for binaries (which usebyte_size/1) or lists (byte arrays from Rustler). The min/max_length assertion now pattern-matches all three cases:is_binary → byte_size,is_list → length,else → String.length. Additionally, zero-argument functions were incorrectly receiving the harness' internalsetupdictionary from the fixture input; the argument builder now filters out thesetupkey when no explicit args are configured so functions with no parameters are called with no arguments. Addedalef_e2e_format_to_string/1helper for FormatMetadata struct display conversion (pattern-matches on nested fields likemetadata.image.formatto extract displayable strings, falling back to struct inspection). Fixes Elixir e2e: pdf_test, smoke_image_png, extractors_list, register_document_extractor. (crates/alef-e2e/src/codegen/elixir.rs) -
alef-backend-csharp: emit
gen_adapter_wrapperfor streaming adapters. Adds theIAsyncEnumerable<ItemType>public method derived fromAdapterConfig. Wires&config.adaptersthroughgen_bindings::emit. (crates/alef-backend-csharp/src/gen_bindings/methods.rs,crates/alef-backend-csharp/src/gen_bindings/mod.rs) -
alef-backend-rustler: restore
use_keyword_opts = trailing_keyword_count >= 2threshold. A single trailing optional param (e.g.config: Option<T>) now stays positional with\\ nilinstead of collapsing toopts \\ []. Aligns with the common config-parameter pattern where a single JSON string ornilis passed positionally. (crates/alef-backend-rustler/src/gen_bindings/mod.rs) -
alef-backend-go: emit
*json.RawMessagefor unresolved external-crate Named types. Go struct fields with unresolvedNamedtypes (e.g., types from external crates that alef cannot resolve to a struct definition) now fall back to*json.RawMessageinstead of attempting to reference a non-existent Go struct. This allows JSON round-tripping of opaque external types without requiring the binding generator to understand their shape. Fields are emitted withomitemptytag to handle null/absent values gracefully. Prevents compile errors when polyglot repos embed external types via re-exports. (crates/alef-backend-go/src/gen_bindings/types.rs) -
alef-e2e/dart: required string args use positional syntax to match hand-written facades. Commit
2572cb72reverted the priorrequired → positional, optional → namedheuristic in favour of "always emit named", citing FRB v2's all-named convention. The revert was correct for liter-llm (whosechat/embedcalls go through thefrom_jsonpath that emitsreq:named, and theclient_factorypath that hardcodes its own arg shape), but broke every polyglot repo whose Dart surface is a hand-written facade wrapping the FRB bridge:H2mBridge.convert(String html, {ConversionOptions? options}),TreeSitterLanguagePackBridge.process(String source, ProcessConfig config),KreuzbergBridge.extractBytes(Uint8List content, String mimeType, [ExtractionConfig? config]). All three failed Dart compilation withToo few positional argumentsbecause the codegen emittedprocess(source: 'x', _config)against a positional signature. Restored therequired → positional, optional → namedpolicy (matching thejson_objecthandler at line 743 and thebytes/file_pathhandlers above), which mirrors the Rust idiom every facade follows. Unblocks tslp, h2m, and kreuzberg Dart e2e compilation. (crates/alef-e2e/src/codegen/dart.rs) -
alef-backend-pyo3: emit
_rust.CrawlStreamRequestand_rust.BatchCrawlStreamRequestin adapter wrappers. The generatedapi.pystreaming adapter wrappers constructed the facade dataclassCrawlStreamRequest(url=url)and passed it toengine.crawl_stream(req), but the PyO3 native method enforces strict type identity against_rust.CrawlStreamRequest. Tests failed withTypeError: 'CrawlStreamRequest' object is not an instance of 'CrawlStreamRequest'(same name, different class object). The wrapper now constructs the native pyclass via_rust.CrawlStreamRequest(url=url)(and_rust.BatchCrawlStreamRequest(urls=urls)), matching the type the underlying method expects. Unblocks ~8 python streaming e2e tests. (crates/alef-backend-pyo3/src/gen_bindings/functions.rs) -
alef-backend-php: walk raw
typ.fields(notbinding_fields) for trait-bridge withers. Commit4e18a6a5added awith_<field>wither emission for trait-bridge / opaque-handle optional fields (so PHP callers can attach a visitor after constructing options), but iteratedbinding_fields(&typ.fields)which filters outbinding_excludedentries — and trait-bridge fields likeConversionOptions.visitorare flagged binding-excluded so they don't appear in the generated__construct/from_jsonparameter list. Net result: the wither method was never emitted, every html-to-markdown PHP visitor e2e (54 tests) errored withCall to undefined method ConversionOptions::with_visitor(). Switched the wither loop to walktyp.fields.iter()directly so trait-bridge fields are reached. The wither remains gated onOption<Named>whereNamed ∈ opaque_types ∪ bridge_type_aliases, so non-bridge excluded fields are still skipped. (crates/alef-backend-php/src/gen_bindings/types.rs) -
alef-e2e/zig: omit fixtures whose target language is outside
[crates.zig].languages. The Zig binding statically compiles a subset of tree-sitter grammars (it does not currently dynamically load parsers at runtime), but the e2e generator emitted tests for every fixture regardless of the fixture'sinput.language. Fixtures likesmoke_bibtextherefore generated tests that failed to load their parser. Adds a newlanguages: Vec<String>field toZigConfigand a filter in the Zig codegen that consults bothinput.languageandinput.config.language(mirroring the WASM filter fromf9e0ff50). When the list is set and non-empty, fixtures whose target grammar is not in the list are omitted entirely from the generated test file — not emitted asit.skip()placeholders. Defaults to empty (all fixtures included), preserving prior behaviour for hosts that haven't opted in. tree-sitter-language-pack Zig e2e: every non-static-set fixture (e.g.smoke_actionscript,smoke_bibtex) auto-excluded. (crates/alef-core/src/config/languages.rs,crates/alef-e2e/src/codegen/zig.rs) -
alef-backend-wasm: generate camelCase-aware Input DTOs for config-like function parameters. WASM function parameters of types like
ProcessConfig(ending with "Config"/"Options"/"Settings"/"Params") failed to deserialize camelCase JSON from JS becauseserde_wasm_bindgen::from_valueexpected snake_case field names. The JS test suite passed{chunkMaxSize: 50}but the RustProcessConfig { chunk_max_size: ... }field wasn't found, silently defaulting toNone, causing chunking to silently fail with empty chunks arrays. The fix generates an Input DTO struct (e.g.,ProcessConfigInput) with#[serde(default, rename_all = "camelCase")]before each function that takes such a parameter, deserializes the JS value into the DTO, then converts to the core type via a generatedFromimpl. All config-like parameters now correctly round-trip camelCase JSON from JavaScript. Tree-sitter-language-pack wasm e2e: chunks tests now passing. (crates/alef-backend-wasm/src/gen_bindings/functions.rs,crates/alef-backend-wasm/src/template_env.rs, new templates:gen_input_dto.jinja,serde_config_required.jinja,serde_config_optional.jinja) -
alef-backend-php: emit
with_visitorwither onConversionOptionsfor trait-bridge opaque types. E2e PHP tests call$options->with_visitor($visitorHandle)to set trait-bridge visitor handles on options objects, but the generator only checked IR-level opaque types and skipped bridge type aliases likeVisitorHandle. When a struct has anOption<NamedType>field where the named type is a trait-bridge alias, the generator now emits a wither methodwith_field_name(value: NamedType) -> Selfthat accepts the unwrapped type and wraps it inSome()before assignment. Collects both IR opaque types andconfig.trait_bridges[*].type_aliasinto anall_opaque_typesset before field iteration. Fixes html-to-markdown PHP e2e visitor tests from 208/262 to 262/262 (54 visitor tests now passing). (crates/alef-backend-php/src/gen_bindings/types.rs) -
alef-backend-csharp: emit primitive option fields as nullable to preserve Rust defaults. Option-config types (with
typ.has_default) were emitting primitive/string fields with C# default values (false,0,""), which when serialized withWhenWritingNullwould cause issues if the user explicitly set a field to its Rust default value but different C# default. The fix makes all non-optional primitive fields in option-config types nullable in C#, defaulting tonull. WithWhenWritingNullserialization, unset fields stay null and get stripped from JSON, letting Rust apply its own defaults; explicitly set fields (even if matching Rust defaults) flow through as JSON, preserving user intent. Combined with the wrapper template change to useWhenWritingNullfor FFI input serialization, this ensures that explicit user values always reach Rust while unset fields let Rust apply its defaults. Fixes html-to-markdown C# e2e from 175/262 to 262/262 (all tests passing). (crates/alef-backend-csharp/src/gen_bindings/types.rs,templates/wrapper_class_header.jinja) -
alef-backend-csharp: restore
[JsonConverter]attribute on generated enums. Commit342a0f0c(fix alef-backend-java enum serialization) accidentally re-introduced a conditionalif needs_custom_convertercheck in the C# enum template and enums.rs converter generation, which disabled[JsonConverter(typeof(EnumNameJsonConverter))]emission on standard snake_case enums. Without the converter, System.Text.Json serializes enum variants as numeric values (0,1) instead of string names ("function","method"), breaking all assertions that depend on serialized JSON containing variant names. The fix restores unconditional converter generation and attribute emission for all enums (not just those with non-standard naming), ensuring enum-to-string conversion works consistently. Fixes 18 tree-sitter-language-pack C# e2e test failures (410/410 now passing). (crates/alef-backend-csharp/src/gen_bindings/enums.rs,crates/alef-backend-csharp/templates/enum_header.jinja) -
alef-e2e/swift: aggregate every stringy accessor when
containsasserts against aVec<DTO>field.XCTAssertTrue(result.imports().map { $0.source().toString() }.contains("os"))previously relied onresult_field_accessornaming a single "primary" accessor (e.g.imports → source,structure → kind), which fails whenever the asserted value lives on a sibling field (ImportInfo.itemsforfrom pathlib import Path,StructureItem.nameforMyConfig). The codegen now walks the element type's IR fields, classifies everyString/Option<String>/Vec<String>/serde-enum field as a "stringy" accessor, and emits acontains(where: { item in … })closure that gathers every text-bearing value into a[String]before substring-matching the expected value — mirroring python's_alef_e2e_item_texts.Vec<String>accessors are flattened via.map { $0.as_str().toString() }(swift-bridge wraps borrowed RustString elements asRustStringRef, which exposesas_str()fromSwiftBridgeCore.swift, nottoString()). The aggregator only fires when the element type carries ≥2 stringy fields, leaving the existing single-accessor path untouched for trivial cases. Unblocks 2 process tests (testProcessPythonImportsDetail,testProcessRustStructureName) in tree-sitter-language-pack swift e2e. (crates/alef-e2e/src/codegen/swift.rs,crates/alef-e2e/src/field_access.rs) -
alef-e2e/java: gate
FormatMetadataDisplay.javaemission on presence ofFormatMetadatain Javaassert_enum_fields. The helper was previously emitted unconditionally for every Java e2e harness and importsdev.kreuzberg.FormatMetadata, a sealed interface that only exists in the kreuzberg binding crate. Other polyglot repos (e.g. tree-sitter-language-pack) without that type failed Java compilation withcannot find symbol: class FormatMetadata. The generator now walks the resolved Java call overrides (callplus all namedcalls) and emits the helper only when at least oneassert_enum_fieldsentry maps to"FormatMetadata". tree-sitter-language-pack Java e2e: 0 errors → 410/410 tests passing. (crates/alef-e2e/src/codegen/java.rs)
Added
- alef-e2e/c: emit visitor test category for C FFI bindings. The C e2e generator previously filtered out all visitor fixtures and panicked if any reached
render_test_file. It now collects visitor fixtures into a separate list, generatestest_visitor.cwith per-fixture static callbacks and the fullHTMHtmVisitorCallbacks+htm_visitor_create+htm_options_set_visitor_handle+htm_convert+ JSON assertion pattern, adds visitor forward declarations totest_runner.h, wires all visitor tests intomain.c, and includestest_visitor.cin the MakefileSRCS. Adds ~54 visitor tests to the C e2e suite (bringing html-to-markdown C e2e from 208 to 262 total). (crates/alef-e2e/src/codegen/c.rs)
Fixed
-
alef-e2e/r: four codegen fixes to close the kreuzberg R e2e gap. (1)
build_args_stringnow returns an empty argument string wheneverargs = []is declared, regardless of fixtureinputshape — the previous fall-through emitted positionallist(...)from harness metadata (e.g.setup.lazy_init_requiredfor Go's eager-init shim), producingunused argumenterrors on no-arg wrappers likelist_document_extractors(). (2) EmptyVec<String>args (element_type = "String") now emitcharacter(0)instead ofc();c()isNULLin R and extendr rejects it withExpected Strings got NullforVec<String>Rust signatures. (3) Per-callextra_argsis now honoured for R (mirroring Ruby/Zig/Swift) — appended verbatim after declared args, sorender_pdf_page_to_pngcan fill in extendr-required positionals (dpi,password) withNULLwhen the fixture omits them. (4) Terminalmetadata.formataccessors are wrapped with a new.alef_format_value()helper emitted intosetup-fixtures.Rthat collapses the internally-taggedFormatMetadataenum ({image: {format: "PNG", ...}, excel: NULL, ...}undersimplifyVector = FALSE) down to the inner format string, matching the assertion expectation. The codegen also threadsresult_is_bytesintorender_assertionsomin_length/max_lengthassertions on raw-byte returns uselength()instead ofnchar()(raw vectors element-wise onnchar, breaking the scalarexpect_truecontract). Combined, the kreuzberg R e2e suite goes from 153/158 to 159/160 with only the env-dependent tesseract-not-registered failure remaining. (crates/alef-e2e/src/codegen/r.rs) -
alef-backend-csharp: strip nulls (not defaults) when serializing config objects to FFI. Commit
980e6f10introducedJsonSerializationOptions(noDefaultIgnoreCondition) for FFI-input serialization so explicitfalse/0weren't silently elided, but it then includednullfor every C# nullable field (PreprocessingPreset? Preset = null, etc.). When the Rust source declares the corresponding field as non-Option(e.g.PreprocessingOptions { preset: PreprocessingPreset }), serde deserialisation chokes on"preset": nulland the whole options object is dropped — regressed html-to-markdown C# e2e from 7 failures to 87. SwitchedJsonSerializationOptionstoDefaultIgnoreCondition.WhenWritingNull, which drops null-valued nullable fields (so required Rust fields fall back to Rust defaults) while still serialising explicitfalse/0(so the original WhenWritingDefault regression stays fixed). (crates/alef-backend-csharp/src/gen_bindings/types.rs) -
alef-backend-jni: treat empty-string complex-param payload as
Nonefor optional params. The Kotlin/Java JNI client emitsoptions?.let { mapper.writeValueAsString(it) } ?: ""forOption<DTO>parameters — an empty string is the legacy host-language sentinel for "caller passed null". The Rust JNI shim previously fed that empty string intoserde_json::from_str::<DTO>unconditionally, which fails withEOF while parsing a value at line 1 column 0and throwsRuntimeExceptionfrom every call that omits options (e.g.HtmlToMarkdownRs.convert("…", null)). For optional complex params the shim now checksis_empty()first and yieldsNonewithout invoking serde, leaving the existing non-empty parse path intact for genuine payloads. Required params remain strict (empty payload still raises). Fixes every kotlin_androidnulloptions call in html-to-markdown e2e. (crates/alef-backend-jni/src/gen_shims.rs) -
alef-backend-kotlin-android: drop trait-bridge
type_aliasfield when itsparam_nameis inkotlin_android.exclude_functions. When a host configured[crates.kotlin_android].exclude_functions = ["visitor"]to suppress the bridge function (e.g. because the JNI trait-handle bridge isn't implemented yet inalef-backend-jni), the visitor function was filtered out of the module facade butConversionOptions.visitor: VisitorHandle?(andConversionOptionsUpdate.visitor) was still emitted on the data classes. SinceVisitorHandleitself has no Kotlin representation in this configuration, every Kotlin file referencing the data class failed to compile withUnresolved reference 'VisitorHandle'. The fix collects aneffective_excluded_typesset ingen_bindings::emitthat includes anytrait_bridge.type_aliaswhoseparam_namematches akotlin_android.exclude_functionsentry (or whoseexclude_languageslistskotlin_android), then drops both the alias type itself and any field whoseTypeRefreferences it before the data-class emission. Mirrors the existingexclude_types-driven filter inalef-backend-kotlin-android/src/lib.rsfor the case where the user opts out of the bridge function alone. Fixes html-to-markdown kotlin_android e2e compilation. (crates/alef-backend-kotlin-android/src/gen_bindings.rs) -
alef-e2e/kotlin_android: emit
.orEmpty()forMap.get(key)assertions in kotlin_android style. Kotlin'sMap<K, V>.get(key)returnsV?, so calling.trim()directly on the result fails kotlin_android compilation withOnly safe (?.) or non-null asserted (!!.) calls are allowed on a nullable receiver of type 'String?'. The assertion emitter previously short-circuitedfield_is_optionaltofalsefor any field path withhas_map_access, regardless of target. The branch now returnskotlin_android_styleinstead, so kotlin_android assertions on map-access paths coalesce the nullable receiver via.orEmpty()before invoking.trim()/.contains(). The kotlin/JVM target keeps its legacy behaviour to avoid churning unrelated snapshots — Java records' platform types make the missing.orEmpty()harmless there. Resolves 10 MetadataTest compile errors in html-to-markdown kotlin_android e2e (testOgBasicTags,testOgMultipleTags,testTwitterCardTags). (crates/alef-e2e/src/codegen/kotlin.rs) -
alef-backend-kotlin: fully-qualify
kotlin.collections.List/Mapinside sealed-class data variants whose siblings shadow the stdlib name. When a sealed enum variant carries the simple nameList(orMap), Kotlin resolves bareList<T>inside the sealed body to the nested data class rather than tokotlin.collections.List. TheMetadataBlockvariant ofNodeContentdeclaredval entries: List<String>, which the compiler rejected withNo type arguments expected for 'data class List : NodeContent'.render_type_ref_disambiguatednow consultsvariant_names: when"List"(resp."Map") is present as a sibling variant, generic emissions usekotlin.collections.List<…>/kotlin.collections.Map<…, …>so the stdlib type wins over the nested shadow. Fixes html-to-markdown kotlin_android NodeContent.kt compilation. (crates/alef-backend-kotlin/src/gen_bindings/object_wrapper.rs) -
alef-e2e/ruby: skip all empty-string config values, not just marked enum fields. When a fixture's config object contained a key with an empty string (e.g.,
embedding_model: ""), the Ruby codegen only skipped it if that key was registered in the call'senum_fieldsmap. For enum-typed fields discovered during fixture rendering but not pre-declared in alef.toml, the empty string was rendered as a literal'', causing deserialization errors like "Unknown embedding preset: ". The fix: skip empty-string values unconditionally in the config builder loop — all empty strings are invalid for enum fields, regardless of whether they were pre-declared. Resolves 2 Ruby e2e failures:embed_texts_async_preset_switchandembed_texts_batch. Ruby: 88→90/91. -
alef-e2e/swift: fix two visitor-method codegen bugs that left every visitor test silently inert. (1)
swift_visitor_paramsemitted_ ctx: Stringfor every callback, but the swift backend declares the protocol method with_ ctx: NodeContext(a typealias toRustBridge.NodeContext). Swift overload resolution treats the mismatched signature as a brand-new method, so the local visitor class never overrode the protocol's default implementation and every callback silently returned.continue— fixtures using.custom/.skipactions produced unchanged output. Now emits_ ctx: NodeContext, matching the protocol declaration exactly so overrides take effect. (2)swift_action_bodyinterpolated optionalString?parameters (e.g.visit_video'ssrc) directly via\(src), which Swift renders asOptional("tutorial.mp4")for fixtures whose template is[VIDEO: {src}]— comparing against[VIDEO: tutorial.mp4]always failed. Added aswift_visitor_param_is_optionaltable mirroring the?suffix inswift_visitor_paramsso optional-typed placeholders emit\(src ?? "")and unwrap to the underlying string. Combined, these two fixes take html-to-markdown swift_e2e from "compiles but 72/262 visitor assertions fail" to 262/262 passing. (crates/alef-e2e/src/codegen/swift_visitors.rs) -
alef-e2e/swift: emit visitor callback actions with correct case naming and tuple-variant label. Fixture-driven callback action codegen (
swift_action_bodyinswift_visitors.rs) emitted.continue_and.custom("payload"), both inconsistent with how the swift backend declaresVisitResult: theContinueunit variant iscase `continue`(backtick-escaped becausecontinueis a Swift keyword), and theCustom(String)tuple variant iscase custom(field0: String)— swift-bridge synthesisesfield0:labels for single-field tuple variants. Without the corrections, every fixture using a custom or continue action produced'VisitResult' has no member 'continue_'ormissing argument label 'field0:' in call. Now emits.`continue`and.custom(field0: "payload"). (crates/alef-e2e/src/codegen/swift_visitors.rs) -
alef-backend-swift, alef-e2e/swift: unbreak Swift e2e visitor compilation across three fronts. Three independent codegen bugs combined to keep html-to-markdown's swift_e2e suite at hundreds of compile errors. (1) The trait-bridge protocol default extension emitted
return .continue_(Rust-side trailing-underscore escape style) for every visitor method, but the actual enum cases are emitted viaswift_case_identwhich uses Swift-idiomatic backtick escapes (case `continue`) — producing'VisitResult' has no member 'continue_'at every callback site. The default extension now derives the return literal from the first unit (no-field) variant of the result enum (swift_case_ident(&variant.name.to_lower_camel_case())), so the generatedreturn .literal matches whatever the enum declaration emits. (2) The trait-bridge{options_type}FromJsonWith{Field}shim (e.g.conversionOptionsFromJsonWithVisitor) was only emitted as a swift-bridgeextern "Rust"function in theRustBridgemodule — there was no top-level forwarder in the user-facing module, so e2e tests callingHtmlToMarkdown.conversionOptionsFromJsonWithVisitor(json, handle)sawmodule 'HtmlToMarkdown' has no member named 'conversionOptionsFromJsonWithVisitor'. The swift backend now emits a public top-level wrapper alongside themake{Trait}Handlefactory, forwarding intoRustBridge.{options_fn}. (3) Every e2e test file imported both the user-facing module andRustBridge, but each opaqueextern "Rust" { type T; }declaration produces apublic class TinRustBridgethat collides with the first-class Swift Codable enum/struct of the same name —VisitResultambiguity was the most prominent, blocking every visitor callback signature. With the new top-level forwarder in place, the e2e codegen no longer needsimport RustBridge(test files only reference public-module symbols), so the line is dropped fromrender_test_file. Combined, these three fixes take html-to-markdown swift_e2e from "fails at module compile" to a runnable test suite. (crates/alef-backend-swift/src/gen_bindings.rs,crates/alef-e2e/src/codegen/swift.rs) -
alef-e2e/typescript: handle FormatMetadata assertions with display helper function. TypeScript e2e codegen for optional FormatMetadata fields (e.g.,
metadata.format) was applyingString(...)to the tagged-enum object, which returns[object Object]instead of the format string. Now emits a_alefE2eFormatMetadataDisplay()helper that pattern-matches the FormatMetadata tagged-enum variant and extracts the format field if present. Resolves Node e2e smoke_image_png test assertion failure. (crates/alef-e2e/templates/typescript/helpers.jinja,crates/alef-e2e/templates/typescript/assertion.jinja,crates/alef-e2e/src/codegen/typescript/assertions.rs) -
alef-backend-java: emit Builders for all serializable types (has_serde=true) in Auto mode, even without has_default. When a Rust type has
#[derive(Serialize, Deserialize)]but implements Default manually (not via#[derive(Default)]), the alef extractor markshas_default=false, causing the Java backend to skip Builder emission. Without a Builder, Jackson deserializes nested fields likePreprocessingOptionsusing the record constructor directly, applying Java defaults (all false/0) instead of Rust defaults. This causes serialized options sent to Rust to override preset-level settings — e.g.,{"preset":"Aggressive"}deserializes withremoveNavigation=false, silencing the preset's intent. The fix: inshould_emit_builder()Auto mode, force Builder emission for any type wherehas_serde=true, regardless ofhas_default. All serde types benefit from a Builder to ensure nested deserialization respects Rust defaults. Additionally fixed import generation to checkwill_emit_builder(instead of onlytyp.has_default) when deciding whether to importjava.util.Optional,java.util.List,java.util.Map,@JsonProperty, and@JsonPOJOBuilderso Builders get required imports even when emitted for has_serde-only types. Fixes 2 failing html-to-markdown Java e2e tests:testOptionsPreprocessingAggressiveandtestOptionsPreprocessingRemoveForms(all 262 Java e2e tests now pass). (crates/alef-backend-java/src/gen_bindings/types.rs) -
alef-e2e/zig: skip
chunks_have_heading_contextsynthetic-field assertion instead of emitting a derived predicate.heading_contextonTextChunkisOption<HeadingContext>with#[serde(skip_serializing_if = "Option::is_none")], so chunks without a heading context produce no JSON key at all. The previous codegen emitted a Zig predicate that calledc.object.get("heading_context")and required the value to be non-null for every chunk, which spuriously failed on extraction results where some chunks legitimately have no heading. Matching the Ruby codegen's behaviour, the assertion is now emitted as a// skipped:comment. Fixes kreuzberg'sconfig_chunking_prepend_heading_contextzig e2e (final blocker getting kreuzberg zig to 88/88 passing). (crates/alef-e2e/src/codegen/zig.rs) -
alef-backend-csharp: re-apply separate JsonSerializationOptions for FFI parameter serialization. The C# generator used a single
JsonSerializerOptionswithDefaultIgnoreCondition.WhenWritingDefaultfor both deserializing FFI responses and serializing input parameters (likeConversionOptions) to pass to Rust. When a test explicitly set an option to false/0/null, the serializer skipped writing that field — Rust received incomplete JSON and applied defaults, overwriting the caller's intent. This fix was originally landed but accidentally removed in a refactoring. Restored: added a secondJsonSerializationOptions(withoutWhenWritingDefault) used when serializing Named parameters and config objects in wrappers, streaming methods, and record-level methods; deserialization continues to use the originalJsonOptionsfor sparse response handling. Fixes 7 failing html-to-markdown C# e2e tests:Test_FormSelectOptions,Test_FormInputElements,Test_OptionsPreprocessingEnabledFalseSkipsCleanup,Test_OptionsCompactTablesTrue,Test_OptionsPreprocessingRemoveNavigationFalseKeepsNav. (crates/alef-backend-csharp/src/gen_bindings/types.rs,methods.rs, and templates) -
alef-backend-zig: de-duplicate
VisitorHandle(trait-bridgetype_alias) emission. The zig backend emitted trait-bridgetype_aliastypes twice: once at the top of the file aspub const VisitorHandle = *anyopaque;(the correct form, referenced by struct fields likevisitor: ?VisitorHandleand by the bridge factoryhtml_visitor_handle_from_vtable), and again later as a struct wrapperpub const VisitorHandle = struct { _handle: *anyopaque, ... }through the generic opaque-handle emission loop. Zig rejects the duplicate declaration with aduplicate struct membererror at file scope, failing every zig e2e test compile. The opaque-handle loop now filters out any type whose name matches a configured[[trait_bridges]].type_alias(respecting the bridge'sexclude_languages = ["zig"]setting), so the trait-bridge contract — a raw*anyopaquepointer — is preserved as the single emission. Fixes html-to-markdown zig e2e compilation across all 9 test files. (crates/alef-backend-zig/src/gen_bindings/mod.rs) -
alef-e2e/brew: route subcommand based on fixture tags (crawl → "crawl", map → "map", else "scrape"). Brew e2e codegen was rendering every fixture with the default subcommand (hardcoded "scrape"), so fixtures tagged "crawl" or "map" were invoked as
kreuzcrawl scrape URLinstead ofkreuzcrawl crawl URLorkreuzcrawl map URL. This caused fixture failures because the CLI flags and output shape differ across subcommands. Nowrender_test_functioninspects the fixture's tags and callsdetermine_subcommand()to route: if tags contain "crawl" use "crawl", if tags contain "map" use "map", else use the default. Fixes brew e2e test routing for all crawl/map fixtures. (crates/alef-e2e/src/codegen/brew.rs) -
alef-e2e/python, ruby, php: emit dict/hash literals for tagged-enum arrays in
build_args_and_setup. Python/Ruby/PHP e2e codegen was emitting constructor calls with kwargs (e.g.,PageAction(selector="#open", type="click")) for array arguments with tagged-enum element types like PageAction, but the bindings (PyO3, Magnus, ext-php-rs) expect dict/hash/array literals (e.g.,{"selector": "#open", "type": "click"}). Fixed two code paths: (1) in theoptions_via == "dict"branch ofbuild_args_and_setup, whenelement_typeis set and value is an array of objects, emit dict literals instead of constructor calls; (2) in thejson_object && element_typebranch, when element_type is not BatchBytesItem/BatchFileItem and value is an array, emit array of dict/hash literals. Resolves ~36 Python, Ruby, and PHP e2e failures in interaction tests (interact_click_element, interact_type_field, interact_fill_form, interact_action_sequence, etc.). (crates/alef-e2e/src/codegen/python/test_function.rs:769-788,crates/alef-e2e/src/codegen/ruby.rs:1435-1453,crates/alef-e2e/src/codegen/php.rs:1475-1492) -
alef-e2e/swift: stop skipping
json_objectargs with scalarelement_type. Swift e2e codegen flagged everyjson_objectarg withoutoptions_viaas unresolvable and emittedXCTSkipIf(true, ...)stubs. Args with a scalarelement_type(String,bool,i*/u*,f32/f64) describeVec<T>Rust parameters that the swift-bridge surface exposes as native Swift[T]arrays — these construct cleanly from array literals and never needed the opaque-options path. The unresolvable-arg check now excludes scalar-element json_objects, so tslp'sdownload_languagescall (args = [{ name = "names", field = "languages", type = "json_object", element_type = "String" }]) emits realdownload(names: [...])invocations instead of skip placeholders. Resolves 4 skipped tslp Swift e2e download tests (download_empty_list,download_invalid_language,download_multiple_languages,download_single_language). (crates/alef-e2e/src/codegen/swift.rs) -
alef-scaffold/elixir: drop nonexistent
lib/andchecksum-*.exsfrommix.exsfiles:list. The scaffolded Elixirmix.exsunconditionally advertised~w(lib native .formatter.exs mix.exs README* checksum-*.exs), butlib/is only written when at least one non-OptionsField trait bridge GenServer is emitted, andchecksum-*.exsis only produced bymix rustler_precompiled.download— which alef does not wire into the publish workflow.mix hex.publishvalidates every entry on disk before contacting the registry and aborted withMissing files: lib, checksum-*.exs. The scaffold now omitslibunless a bridge populates it, dropschecksum-*.exsentirely (consumers fall back to building from source via plainrustler), and — whencrates.output.elixirpoints outsidepackages/elixir/lib/— appends the same relative*.exglob already used forelixirc_pathsso the externally-located source actually ships in the Hex tarball. Re-applies the fix originally landed as0d987874which was lost in a later force-push tomain. Surfaced on spikard Hex publish (Goldziher/spikard runs 26215742325 and 26222483530). (crates/alef-scaffold/src/languages/elixir.rs,crates/alef-scaffold/src/tests.rs) -
alef-backend-swift: emit custom
init(from decoder:)for first-class Codable structs whose Rust source has#[derive(Default)]orimpl Default. Swift's auto-synthesised Codable decoder rejects JSON that omits any non-Optional declared property, so JSON produced by Rust serializers using#[serde(default)]or#[serde(skip_serializing_if = "...")](e.g.{"language":"python"}decoding intoProcessConfig, or empty-Vecfields onProcessResultround-tripped viaserde_json) failed withkeyNotFound. The Swift backend now emits a custom decoder wheneverTypeDef.has_defaultis true; each field usesdecodeIfPresent + ?? <fallback>with the per-field literal fromFieldDef.typed_default(BoolLiteral/IntLiteral/FloatLiteral/StringLiteral) or a type-based default ([]/[:]/false/0/""/nil). Fields with no safe Swift fallback (e.g. nestedNamedstructs) decode without??and rely on the nested type's own decoder.CodingKeysis force-emitted alongside the custom decoder. Reduces tslp Swift e2e failures from 355 → <10. (crates/alef-backend-swift/src/gen_bindings.rs) -
alef-e2e/java: handle FormatMetadata assertions with sealed-interface pattern matching. Java e2e codegen for optional FormatMetadata fields (e.g.,
metadata.format) was incorrectly applying enum coercion.map(v -> v.getValue()).orElse("")which fails because FormatMetadata is a sealed interface, not a Java enum. Now registers FormatMetadata types viaassert_enum_fieldsin call overrides, passes the type map to render_assertion, and appliesFormatMetadataDisplay.toDisplayString()for all FormatMetadata fields instead of enum-specific handling. Resolves Java e2e smoke test compilation errors on metadata.format assertions. (crates/alef-e2e/src/codegen/java.rs,kreuzberg/alef.toml) -
alef-e2e/wasm: omit (not auto-skip) fixtures for languages outside static-compiled set. When
[crates.wasm].languageswas set to a curated list (e.g., 31 languages), the codegen logic setbase_include = falsefor fixtures withinput.languagenot in that list, but then emitted them asit.skip()tests instead of dropping them entirely. This produced 281 skip placeholders in tslp wasm e2e with the message"language X not in WASM's static-compiled set". The fix: replace the auto-skip branch with a simplecontinue, so fixtures for unsupported languages are omitted from the test file entirely — no skip clutter, no spurious test count. Resolves wasm e2e fixture emission. (crates/alef-e2e/src/codegen/wasm.rs) -
alef-backend-rustler: emit default-arg signatures for params with | nil in typespec, and use keyword-opts collapsing for any trailing optionals. When a function had params marked optional in the @SPEC (type includes
| nil) but not in the Rust IR (param notOption<T>), the Elixir wrapper counted trailing optionals only viap.optional, missing those with| niltypespecs. This caused two failures: (1) when emitting arity variants, no defaults were added (e.g.,def embed_texts_async(texts, config)for config:String.t() | nil), breaking intermediate-arity calls likeembed_texts_async(["text"]); (2) when deciding whether to use keyword-opts collapsing, only 2+ trailing optionals triggeredopts \\ []form, leaving 1-optional functions with positional signatures that conflicted with e2e codegen's keyword emission. Fix: (1) when counting trailing optionals, check bothp.optional || type_str.contains("| nil")so params with Option Rust types are counted even if IR doesn't mark them.optional; (2) lower the keyword-opts threshold from>= 2to>= 1so e2e tests calling with keyword syntax (e.g.,extract_file_sync(path, config: "...")) map correctly to theopts \\ []form. Fixes 46 Elixir e2e failures across smoke, extraction, and embedding categories. (crates/alef-backend-rustler/src/gen_bindings/mod.rs) -
alef-backend-csharp: fully-qualify all Marshal calls with global::System.Runtime.InteropServices. C# codegen was emitting bare
Marshal.*references in multiple locations: gen_visitor.rs (PtrToStringUTF8 conversions), trait_bridge.rs (StringToCoTaskMemUTF8 callbacks), and gen_bindings/methods.rs (JSON deserialization). Template-based codegen in named_param_handle_from_json.jinja and multiple other templates also had bare Marshal references. Without the global:: qualifier, these break compilation with CS0103: name 'Marshal' does not exist. Now all emissions useglobal::System.Runtime.InteropServices.Marshal.*consistently. Resolves C# e2e compilation errors in ExtractionResult.cs and related wrapper classes. (crates/alef-backend-csharp/src/gen_visitor.rs,crates/alef-backend-csharp/src/trait_bridge.rs,crates/alef-backend-csharp/src/gen_bindings/methods.rs,crates/alef-backend-csharp/templates/*)