Skip to content

feat(xmldsig): parse full x509data sub-elements#51

Merged
polaz merged 4 commits intomainfrom
feat/p2-002-x509data-parsing
Apr 9, 2026
Merged

feat(xmldsig): parse full x509data sub-elements#51
polaz merged 4 commits intomainfrom
feat/p2-002-x509data-parsing

Conversation

@polaz
Copy link
Copy Markdown
Member

@polaz polaz commented Apr 9, 2026

Summary

  • implement P2-002 by parsing full X509Data sub-elements into structured X509DataInfo
  • replace count-only tracking with parsed values for certificates, subject names, issuer-serial tuples, SKI, CRL, and X509Digest entries
  • add strict validation for malformed X509IssuerSerial and missing X509Digest Algorithm
  • add regression tests for malformed issuer serial, missing digest algorithm, and invalid certificate base64

Validation

  • cargo check --workspace
  • cargo clippy --all-targets --all-features -- -D warnings
  • cargo nextest run --workspace
  • cargo test --doc --workspace
  • cargo build --workspace

Closes #50

- parse X509Certificate/X509SubjectName/X509IssuerSerial/X509SKI/X509CRL/X509Digest into structured X509DataInfo

- validate issuer-serial shape and X509Digest Algorithm presence

- decode and validate X509 base64 payloads with bounded text handling

- extend parse regressions for malformed issuer serial, digest attrs, and cert base64
@coderabbitai
Copy link
Copy Markdown

coderabbitai bot commented Apr 9, 2026

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: ASSERTIVE

Plan: Pro

Run ID: 081b6df2-98bb-467e-8113-6912e496e1bb

📥 Commits

Reviewing files that changed from the base of the PR and between adf6147 and 1099631.

📒 Files selected for processing (2)
  • README.md
  • src/xmldsig/parse.rs

📝 Walkthrough

Summary by CodeRabbit

  • New Features

    • X509Data parsing now provides structured decoded certificate data, including certificates, subject names, issuer information, key identifiers, and digest information instead of just element counts.
    • Added validation to enforce maximum limits on X509 entries and decoded data size.
  • Documentation

    • Added status and version badges to README.

Walkthrough

X509Data parsing now materializes parsed collections (decoded certificates/SKIs/CRLs, subject names, issuer-serial tuples, dsig11 digests with Algorithm URIs). The dispatcher enforces structural rules, element-order requirements, per-entry and aggregate binary-size budgets, and uses a bounded base64 decoder; unit tests updated for new validations.

Changes

Cohort / File(s) Summary
X509Data parsing & tests
src/xmldsig/parse.rs
Replaced *_count fields on X509DataInfo with parsed collections: certificates: Vec<Vec<u8>>, subject_names: Vec<String>, issuer_serials: Vec<(String,String)>, skis: Vec<Vec<u8>>, crls: Vec<Vec<u8>>, digests: Vec<(String,Vec<u8>)>. Added bounded base64 decoder, parse_x509_issuer_serial, stricter parse_x509_data_dispatch validation (leaf-only checks, required Algorithm on dsig11:X509Digest, per-entry and aggregate decoded-size limits). Updated/added unit tests for decoded outputs and rejection cases (malformed issuer-serial, missing digest Algorithm, invalid base64, exceeding entry/size budgets).
Repository metadata
README.md
Added repository badges (crates.io, docs.rs, CI status, MSRV indicator, license link); no code behavior changes.

Sequence Diagram(s)

sequenceDiagram
    rect rgba(200,220,255,0.5)
    participant Client
    end
    rect rgba(200,255,200,0.5)
    participant XMLParser
    end
    rect rgba(255,230,200,0.5)
    participant Dispatcher
    end
    rect rgba(255,200,200,0.5)
    participant Base64Decoder
    end
    rect rgba(220,200,255,0.5)
    participant Storage
    end

    Client->>XMLParser: supply KeyInfo/X509Data XML
    XMLParser->>Dispatcher: iterate X509Data children
    Dispatcher->>Dispatcher: validate element type, structure, budgets
    alt Certificate / SKI / CRL / Digest (binary leaf)
        Dispatcher->>Base64Decoder: normalize & decode base64 (limits)
        Base64Decoder-->>Dispatcher: bytes or error
        Dispatcher->>Storage: push decoded bytes (with Algorithm for digest)
    else SubjectName (text)
        Dispatcher->>Dispatcher: extract bounded text
        Dispatcher->>Storage: push subject name
    else X509IssuerSerial
        Dispatcher->>Dispatcher: require IssuerName then SerialNumber (ordered)
        Dispatcher->>Storage: push (issuer, serial)
    end
    Dispatcher-->>XMLParser: success or parse error
    XMLParser-->>Client: X509DataInfo or error
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

Possibly related PRs

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 73.68% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Title check ✅ Passed The PR title 'feat(xmldsig): parse full x509data sub-elements' accurately describes the main change: implementing full X509Data sub-element parsing with structured data instead of just counts.
Description check ✅ Passed The PR description clearly relates to the changeset, covering the implementation of P2-002 X509Data parsing, validation, testing, and build validation commands.
Linked Issues check ✅ Passed The changeset fully implements issue #50 requirements: X509DataInfo now holds parsed values (certificates, subject names, issuer-serial tuples, SKI, CRL, digests) with base64 decoding, structural validation (malformed issuer-serial rejection, missing Algorithm attribute detection), invalid base64 handling, and regression test coverage.
Out of Scope Changes check ✅ Passed README.md shield badges addition is a minor documentation enhancement unrelated to issue #50 requirements; however, it is a reasonable housekeeping change that does not conflict with core objectives.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
📝 Generate docstrings
  • Create stacked PR
  • Commit on current branch
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch feat/p2-002-x509data-parsing

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
src/xmldsig/parse.rs (1)

490-530: ⚠️ Potential issue | 🟠 Major

Add aggregate bounds for <X509Data> accumulation to avoid memory-DoS.

Line 495 iterates over untrusted children without a per-X509Data item/byte budget, and Lines 499, 515, 519, and 529 append decoded payloads directly. A crafted document with many large entries can cause unbounded memory growth during parse.

Proposed hardening pattern
+const MAX_X509_DATA_ENTRY_COUNT: usize = 64;
+const MAX_X509_DATA_TOTAL_BINARY_LEN: usize = 1_048_576; // 1 MiB aggregate per X509Data

 fn parse_x509_data_dispatch(node: Node) -> Result<X509DataInfo, ParseError> {
     verify_ds_element(node, "X509Data")?;
     ensure_no_non_whitespace_text(node, "X509Data")?;

     let mut info = X509DataInfo::default();
+    let mut total_binary_len = 0usize;
     for child in element_children(node) {
         match (child.tag_name().namespace(), child.tag_name().name()) {
             (Some(XMLDSIG_NS), "X509Certificate") => {
                 ensure_no_element_children(child, "X509Certificate")?;
-                info.certificates
-                    .push(decode_x509_base64(child, "X509Certificate")?);
+                let cert = decode_x509_base64(child, "X509Certificate")?;
+                if info.certificates.len() >= MAX_X509_DATA_ENTRY_COUNT {
+                    return Err(ParseError::InvalidStructure(
+                        "X509Data contains too many X509Certificate entries".into(),
+                    ));
+                }
+                total_binary_len = total_binary_len.checked_add(cert.len()).ok_or_else(|| {
+                    ParseError::InvalidStructure(
+                        "X509Data exceeds maximum allowed total binary length".into(),
+                    )
+                })?;
+                if total_binary_len > MAX_X509_DATA_TOTAL_BINARY_LEN {
+                    return Err(ParseError::InvalidStructure(
+                        "X509Data exceeds maximum allowed total binary length".into(),
+                    ));
+                }
+                info.certificates.push(cert);
             }
             // Apply the same entry-count and aggregate-byte checks to SKI/CRL/X509Digest payload pushes.

As per coding guidelines: Tier 1 - Flag logic bugs: missing validation (unchecked index, unvalidated input from network/disk/external source).

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/xmldsig/parse.rs` around lines 490 - 530, In parse_x509_data_dispatch,
add aggregate bounds to prevent unbounded accumulation by enforcing a maximum
number of X509Data items and a maximum total byte budget before pushing into
info.certificates, info.subject_names, info.issuer_serials, info.skis,
info.crls, and info.digests; introduce constants like MAX_X509DATA_ITEMS and
MAX_X509DATA_TOTAL_BYTES, track a running count and running_total_bytes
(counting decoded byte lengths for certificates/skis/crls/digests and text
length for subject_names/issuer_serial fields) and check those limits prior to
appending—if exceeded return a ParseError::InvalidStructure with a clear
message—ensure these checks wrap every push site in parse_x509_data_dispatch
(and any helper called like decode_x509_base64/collect_text_content_bounded
usage) so a crafted document cannot cause unbounded memory growth.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Outside diff comments:
In `@src/xmldsig/parse.rs`:
- Around line 490-530: In parse_x509_data_dispatch, add aggregate bounds to
prevent unbounded accumulation by enforcing a maximum number of X509Data items
and a maximum total byte budget before pushing into info.certificates,
info.subject_names, info.issuer_serials, info.skis, info.crls, and info.digests;
introduce constants like MAX_X509DATA_ITEMS and MAX_X509DATA_TOTAL_BYTES, track
a running count and running_total_bytes (counting decoded byte lengths for
certificates/skis/crls/digests and text length for subject_names/issuer_serial
fields) and check those limits prior to appending—if exceeded return a
ParseError::InvalidStructure with a clear message—ensure these checks wrap every
push site in parse_x509_data_dispatch (and any helper called like
decode_x509_base64/collect_text_content_bounded usage) so a crafted document
cannot cause unbounded memory growth.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: ASSERTIVE

Plan: Pro

Run ID: af317b17-04dd-475c-8384-cdd64f494cd0

📥 Commits

Reviewing files that changed from the base of the PR and between adf6147 and cd327dd.

📒 Files selected for processing (1)
  • src/xmldsig/parse.rs

Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Implements ROADMAP task P2-002 / issue #50 by extending XMLDSig <KeyInfo>/<X509Data> parsing to capture full X509Data sub-elements into a structured X509DataInfo, with added validation and regression tests.

Changes:

  • Replace X509Data child element counters with parsed/stored values (certificates, subject names, issuer-serial tuples, SKI, CRL, and dsig11 X509Digest).
  • Add parsing helpers for X.509 base64 payloads and strict structural validation for malformed X509IssuerSerial and missing X509Digest Algorithm.
  • Update and add tests for new parsing behavior and rejection cases (malformed issuer serial, missing digest algorithm, invalid certificate base64).

- enforce per-entry and aggregate X509Data limits before accumulation\n- reuse required_algorithm_attr for dsig11:X509Digest\n- reject whitespace-only X509IssuerName/X509SerialNumber\n- add regression tests for budget and fail-closed branches
Copy link
Copy Markdown

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@src/xmldsig/parse.rs`:
- Around line 509-527: The code incorrectly charges text-only X509Data entries
against the binary-length budget: in the X509SubjectName and X509IssuerSerial
branches (which use collect_text_content_bounded and parse_x509_issuer_serial
and push into info.subject_names / info.issuer_serials) remove the calls to
add_x509_data_usage that update total_binary_len/MAX_X509_DATA_TOTAL_BINARY_LEN;
only binary payloads (e.g., X509Certificate, X509CRL) should call
add_x509_data_usage. Keep ensure_x509_data_entry_budget and the text
parsing/collection but do not count their UTF-8 lengths toward the binary
budget. Ensure other places that represent actual binary data still call
add_x509_data_usage.
- Around line 646-699: The current loop in the X509IssuerSerial parsing accepts
any ordering or extra element children; change it to explicitly validate the
element-child sequence is exactly two elements in spec order: first an element
with namespace XMLDSIG_NS and local name "X509IssuerName", then an element with
namespace XMLDSIG_NS and local name "X509SerialNumber", rejecting anything else
(wrong order, wrong namespace, extra element children) with
ParseError::InvalidStructure; implement this by collecting
element_children(node) into a Vec, ensure vec.len() == 2, check the first/second
tag_name() values before calling
ensure_no_element_children/collect_text_content_bounded (reuse those functions),
and return clear InvalidStructure errors when the checks fail.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: ASSERTIVE

Plan: Pro

Run ID: 802c5055-6eff-4bc4-8a02-35b2df73de45

📥 Commits

Reviewing files that changed from the base of the PR and between a2a5fdd and 9183103.

📒 Files selected for processing (1)
  • src/xmldsig/parse.rs

- count only decoded binary payloads in X509Data aggregate byte budget\n- enforce exact X509IssuerSerial child sequence and reject extras\n- add regression tests for text-only budget and malformed issuer-serial shapes
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 2 out of 2 changed files in this pull request and generated no new comments.

Comments suppressed due to low confidence (1)

src/xmldsig/parse.rs:700

  • parse_x509_issuer_serial validates non-empty values using trim(), but it returns the untrimmed issuer_name / serial_number. For pretty-printed XML (newlines/indentation inside the elements), this will store leading/trailing whitespace in issuer_serials, which is especially problematic for X509SerialNumber (xsd:integer whitespace is collapsed). Consider trimming (and ideally normalizing) the values before storing/returning them, and adding a regression test with surrounding whitespace that expects the trimmed values.
    }

    Ok((issuer_name, serial_number))
}

/// Base64-decode a digest value string, stripping whitespace.
///
/// XMLDSig allows whitespace within base64 content (line-wrapped encodings).
fn base64_decode_digest(b64: &str, digest_method: DigestAlgorithm) -> Result<Vec<u8>, ParseError> {
    use base64::Engine;
    use base64::engine::general_purpose::STANDARD;

    let expected = digest_method.output_len();
    let max_base64_len = expected.div_ceil(3) * 4;
    let mut cleaned = String::with_capacity(b64.len().min(max_base64_len));
    normalize_xml_base64_text(b64, &mut cleaned).map_err(|err| {
        ParseError::Base64(format!(
            "invalid XML whitespace U+{:04X} in DigestValue",

@polaz
Copy link
Copy Markdown
Member Author

polaz commented Apr 9, 2026

@coderabbitai re-review

@coderabbitai
Copy link
Copy Markdown

coderabbitai bot commented Apr 9, 2026

@polaz Sure, I'll re-review the PR now!


🧠 Learnings used
Learnt from: polaz
Repo: structured-world/xml-sec PR: 51
File: src/xmldsig/parse.rs:646-699
Timestamp: 2026-04-09T07:54:16.163Z
Learning: In `src/xmldsig/parse.rs` (xml-sec Rust project), `parse_x509_issuer_serial` enforces an exact two-child sequence in XMLDSig spec order: first a `<ds:X509IssuerName>` element, then a `<ds:X509SerialNumber>` element. Wrong order, wrong namespace, or extra element children are all rejected with `ParseError::InvalidStructure`. Do not flag out-of-order or extra-child acceptance as a bug — the strict sequence enforcement is intentional.

Learnt from: polaz
Repo: structured-world/xml-sec PR: 0
File: :0-0
Timestamp: 2026-03-28T18:51:49.919Z
Learning: In `src/xmldsig/parse.rs` (xml-sec Rust project), the private function `n` (formerly `base64_decode_digest`) strips only the four XML whitespace characters (space 0x20, tab 0x09, CR 0x0D, LF 0x0A) from `DigestValue` base64 text before decoding, and returns `ParseError::Base64(format!("invalid XML whitespace U+{:04X} ...", ...))` for any other `is_ascii_whitespace()` character (e.g. form feed 0x0C). Regression tests `n_accepts_xml_whitespace_chars` and `n_rejects_non_xml_ascii_whitespace` cover these cases.

Learnt from: polaz
Repo: structured-world/xml-sec PR: 47
File: src/xmldsig/parse.rs:543-556
Timestamp: 2026-04-08T23:10:56.420Z
Learning: In `src/xmldsig/parse.rs` (xml-sec Rust project), `parse_reference` delegates `<DigestValue>` decoding to `decode_digest_value_children`, which streams over `node.children()`, rejects element children with `ParseError::InvalidStructure`, calls `normalize_xml_base64_text` on each text chunk into a single buffer pre-allocated with `digest_method.output_len().div_ceil(3) * 4`, and enforces the algorithm-derived base64 cap after each append — returning `ParseError::Base64("DigestValue exceeds maximum allowed base64 length")` immediately if exceeded. No full intermediate `digest_b64` string is built before the cap check. The existing `base64_decode_digest` function is still present (and tested) but is no longer the call site for `DigestValue` text collection from XML nodes.

Learnt from: polaz
Repo: structured-world/xml-sec PR: 0
File: :0-0
Timestamp: 2026-04-08T20:07:42.404Z
Learning: In `src/xmldsig/parse.rs` (xml-sec Rust project), `parse_key_value_dispatch` accepts `ECKeyValue` only when its namespace is `XMLDSIG11_NS` (XMLDSig 1.1). A `ds:`-namespaced `ECKeyValue` (i.e., in the base XMLDSig namespace) falls through to `KeyValueInfo::Unsupported`. The regression test `parse_key_info_marks_ds_namespace_ec_keyvalue_as_unsupported` covers this boundary. Do not flag `ds:ECKeyValue` → Unsupported as a bug; it is the correct fail-safe behavior per the XMLDSig 1.1 spec.

Learnt from: polaz
Repo: structured-world/xml-sec PR: 41
File: tests/donor_full_verification_suite.rs:165-178
Timestamp: 2026-04-04T06:14:34.735Z
Learning: In `tests/donor_full_verification_suite.rs` (xml-sec Rust project), the `Expectation::Skip` arm must exercise skipped vectors through `VerifyContext::new().verify(xml)` (no injected PEM key) so that the real key-resolution path runs, and must assert a concrete blocker outcome per case: `Invalid(KeyNotFound)` for vectors with unresolved KeyValue/X.509 key material, and `ParseSignedInfo(UnsupportedAlgorithm)` for DSA-signature vectors. Accepting any `Err` or non-Valid result as a valid skip is insufficient and can mask stale skip reasons or unrelated failures.

Learnt from: polaz
Repo: structured-world/xml-sec PR: 0
File: :0-0
Timestamp: 2026-04-08T21:56:49.678Z
Learning: In `src/xmldsig/parse.rs` (xml-sec Rust project), `base64_decode_digest` enforces a pre-decode normalized base64 length cap derived from `DigestAlgorithm::output_len()` and returns `ParseError::Base64("DigestValue exceeds maximum allowed base64 length")` if the normalized base64 text exceeds that cap before passing it to the decoder. The regression test `base64_decode_digest_rejects_oversized_base64_before_decode` covers this boundary. Do not flag the absence of a pre-decode length check in `base64_decode_digest` — it is present and algorithm-specific.
✅ Actions performed

Full review triggered.

@polaz polaz merged commit 6b0698d into main Apr 9, 2026
9 checks passed
@polaz polaz deleted the feat/p2-002-x509data-parsing branch April 9, 2026 11:33
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

P2-002: parse full X509Data sub-elements in KeyInfo

2 participants