feat(xmldsig): parse full x509data sub-elements#51
Conversation
- parse X509Certificate/X509SubjectName/X509IssuerSerial/X509SKI/X509CRL/X509Digest into structured X509DataInfo - validate issuer-serial shape and X509Digest Algorithm presence - decode and validate X509 base64 payloads with bounded text handling - extend parse regressions for malformed issuer serial, digest attrs, and cert base64
|
No actionable comments were generated in the recent review. 🎉 ℹ️ Recent review info⚙️ Run configurationConfiguration used: Organization UI Review profile: ASSERTIVE Plan: Pro Run ID: 📒 Files selected for processing (2)
📝 WalkthroughSummary by CodeRabbit
WalkthroughX509Data parsing now materializes parsed collections (decoded certificates/SKIs/CRLs, subject names, issuer-serial tuples, dsig11 digests with Algorithm URIs). The dispatcher enforces structural rules, element-order requirements, per-entry and aggregate binary-size budgets, and uses a bounded base64 decoder; unit tests updated for new validations. Changes
Sequence Diagram(s)sequenceDiagram
rect rgba(200,220,255,0.5)
participant Client
end
rect rgba(200,255,200,0.5)
participant XMLParser
end
rect rgba(255,230,200,0.5)
participant Dispatcher
end
rect rgba(255,200,200,0.5)
participant Base64Decoder
end
rect rgba(220,200,255,0.5)
participant Storage
end
Client->>XMLParser: supply KeyInfo/X509Data XML
XMLParser->>Dispatcher: iterate X509Data children
Dispatcher->>Dispatcher: validate element type, structure, budgets
alt Certificate / SKI / CRL / Digest (binary leaf)
Dispatcher->>Base64Decoder: normalize & decode base64 (limits)
Base64Decoder-->>Dispatcher: bytes or error
Dispatcher->>Storage: push decoded bytes (with Algorithm for digest)
else SubjectName (text)
Dispatcher->>Dispatcher: extract bounded text
Dispatcher->>Storage: push subject name
else X509IssuerSerial
Dispatcher->>Dispatcher: require IssuerName then SerialNumber (ordered)
Dispatcher->>Storage: push (issuer, serial)
end
Dispatcher-->>XMLParser: success or parse error
XMLParser-->>Client: X509DataInfo or error
Estimated code review effort🎯 4 (Complex) | ⏱️ ~45 minutes Possibly related PRs
🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches📝 Generate docstrings
🧪 Generate unit tests (beta)
Comment |
There was a problem hiding this comment.
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
src/xmldsig/parse.rs (1)
490-530:⚠️ Potential issue | 🟠 MajorAdd aggregate bounds for
<X509Data>accumulation to avoid memory-DoS.Line 495 iterates over untrusted children without a per-
X509Dataitem/byte budget, and Lines 499, 515, 519, and 529 append decoded payloads directly. A crafted document with many large entries can cause unbounded memory growth during parse.Proposed hardening pattern
+const MAX_X509_DATA_ENTRY_COUNT: usize = 64; +const MAX_X509_DATA_TOTAL_BINARY_LEN: usize = 1_048_576; // 1 MiB aggregate per X509Data fn parse_x509_data_dispatch(node: Node) -> Result<X509DataInfo, ParseError> { verify_ds_element(node, "X509Data")?; ensure_no_non_whitespace_text(node, "X509Data")?; let mut info = X509DataInfo::default(); + let mut total_binary_len = 0usize; for child in element_children(node) { match (child.tag_name().namespace(), child.tag_name().name()) { (Some(XMLDSIG_NS), "X509Certificate") => { ensure_no_element_children(child, "X509Certificate")?; - info.certificates - .push(decode_x509_base64(child, "X509Certificate")?); + let cert = decode_x509_base64(child, "X509Certificate")?; + if info.certificates.len() >= MAX_X509_DATA_ENTRY_COUNT { + return Err(ParseError::InvalidStructure( + "X509Data contains too many X509Certificate entries".into(), + )); + } + total_binary_len = total_binary_len.checked_add(cert.len()).ok_or_else(|| { + ParseError::InvalidStructure( + "X509Data exceeds maximum allowed total binary length".into(), + ) + })?; + if total_binary_len > MAX_X509_DATA_TOTAL_BINARY_LEN { + return Err(ParseError::InvalidStructure( + "X509Data exceeds maximum allowed total binary length".into(), + )); + } + info.certificates.push(cert); } // Apply the same entry-count and aggregate-byte checks to SKI/CRL/X509Digest payload pushes.As per coding guidelines:
Tier 1 - Flag logic bugs: missing validation (unchecked index, unvalidated input from network/disk/external source).🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/xmldsig/parse.rs` around lines 490 - 530, In parse_x509_data_dispatch, add aggregate bounds to prevent unbounded accumulation by enforcing a maximum number of X509Data items and a maximum total byte budget before pushing into info.certificates, info.subject_names, info.issuer_serials, info.skis, info.crls, and info.digests; introduce constants like MAX_X509DATA_ITEMS and MAX_X509DATA_TOTAL_BYTES, track a running count and running_total_bytes (counting decoded byte lengths for certificates/skis/crls/digests and text length for subject_names/issuer_serial fields) and check those limits prior to appending—if exceeded return a ParseError::InvalidStructure with a clear message—ensure these checks wrap every push site in parse_x509_data_dispatch (and any helper called like decode_x509_base64/collect_text_content_bounded usage) so a crafted document cannot cause unbounded memory growth.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Outside diff comments:
In `@src/xmldsig/parse.rs`:
- Around line 490-530: In parse_x509_data_dispatch, add aggregate bounds to
prevent unbounded accumulation by enforcing a maximum number of X509Data items
and a maximum total byte budget before pushing into info.certificates,
info.subject_names, info.issuer_serials, info.skis, info.crls, and info.digests;
introduce constants like MAX_X509DATA_ITEMS and MAX_X509DATA_TOTAL_BYTES, track
a running count and running_total_bytes (counting decoded byte lengths for
certificates/skis/crls/digests and text length for subject_names/issuer_serial
fields) and check those limits prior to appending—if exceeded return a
ParseError::InvalidStructure with a clear message—ensure these checks wrap every
push site in parse_x509_data_dispatch (and any helper called like
decode_x509_base64/collect_text_content_bounded usage) so a crafted document
cannot cause unbounded memory growth.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: ASSERTIVE
Plan: Pro
Run ID: af317b17-04dd-475c-8384-cdd64f494cd0
📒 Files selected for processing (1)
src/xmldsig/parse.rs
There was a problem hiding this comment.
Pull request overview
Implements ROADMAP task P2-002 / issue #50 by extending XMLDSig <KeyInfo>/<X509Data> parsing to capture full X509Data sub-elements into a structured X509DataInfo, with added validation and regression tests.
Changes:
- Replace X509Data child element counters with parsed/stored values (certificates, subject names, issuer-serial tuples, SKI, CRL, and dsig11 X509Digest).
- Add parsing helpers for X.509 base64 payloads and strict structural validation for malformed
X509IssuerSerialand missingX509DigestAlgorithm. - Update and add tests for new parsing behavior and rejection cases (malformed issuer serial, missing digest algorithm, invalid certificate base64).
- enforce per-entry and aggregate X509Data limits before accumulation\n- reuse required_algorithm_attr for dsig11:X509Digest\n- reject whitespace-only X509IssuerName/X509SerialNumber\n- add regression tests for budget and fail-closed branches
There was a problem hiding this comment.
Actionable comments posted: 2
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@src/xmldsig/parse.rs`:
- Around line 509-527: The code incorrectly charges text-only X509Data entries
against the binary-length budget: in the X509SubjectName and X509IssuerSerial
branches (which use collect_text_content_bounded and parse_x509_issuer_serial
and push into info.subject_names / info.issuer_serials) remove the calls to
add_x509_data_usage that update total_binary_len/MAX_X509_DATA_TOTAL_BINARY_LEN;
only binary payloads (e.g., X509Certificate, X509CRL) should call
add_x509_data_usage. Keep ensure_x509_data_entry_budget and the text
parsing/collection but do not count their UTF-8 lengths toward the binary
budget. Ensure other places that represent actual binary data still call
add_x509_data_usage.
- Around line 646-699: The current loop in the X509IssuerSerial parsing accepts
any ordering or extra element children; change it to explicitly validate the
element-child sequence is exactly two elements in spec order: first an element
with namespace XMLDSIG_NS and local name "X509IssuerName", then an element with
namespace XMLDSIG_NS and local name "X509SerialNumber", rejecting anything else
(wrong order, wrong namespace, extra element children) with
ParseError::InvalidStructure; implement this by collecting
element_children(node) into a Vec, ensure vec.len() == 2, check the first/second
tag_name() values before calling
ensure_no_element_children/collect_text_content_bounded (reuse those functions),
and return clear InvalidStructure errors when the checks fail.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: ASSERTIVE
Plan: Pro
Run ID: 802c5055-6eff-4bc4-8a02-35b2df73de45
📒 Files selected for processing (1)
src/xmldsig/parse.rs
- count only decoded binary payloads in X509Data aggregate byte budget\n- enforce exact X509IssuerSerial child sequence and reject extras\n- add regression tests for text-only budget and malformed issuer-serial shapes
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 2 out of 2 changed files in this pull request and generated no new comments.
Comments suppressed due to low confidence (1)
src/xmldsig/parse.rs:700
parse_x509_issuer_serialvalidates non-empty values usingtrim(), but it returns the untrimmedissuer_name/serial_number. For pretty-printed XML (newlines/indentation inside the elements), this will store leading/trailing whitespace inissuer_serials, which is especially problematic forX509SerialNumber(xsd:integer whitespace is collapsed). Consider trimming (and ideally normalizing) the values before storing/returning them, and adding a regression test with surrounding whitespace that expects the trimmed values.
}
Ok((issuer_name, serial_number))
}
/// Base64-decode a digest value string, stripping whitespace.
///
/// XMLDSig allows whitespace within base64 content (line-wrapped encodings).
fn base64_decode_digest(b64: &str, digest_method: DigestAlgorithm) -> Result<Vec<u8>, ParseError> {
use base64::Engine;
use base64::engine::general_purpose::STANDARD;
let expected = digest_method.output_len();
let max_base64_len = expected.div_ceil(3) * 4;
let mut cleaned = String::with_capacity(b64.len().min(max_base64_len));
normalize_xml_base64_text(b64, &mut cleaned).map_err(|err| {
ParseError::Base64(format!(
"invalid XML whitespace U+{:04X} in DigestValue",
|
@coderabbitai re-review |
|
🧠 Learnings used✅ Actions performedFull review triggered. |
Summary
Validation
Closes #50