[pkg/ottl] Enhance ParseXML function with intuitive parse format and add MarshalXML function to convert parsed output back to XML #35210
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Description:
Current implementation of ParseXML
The current implementation of the ParseXML produces output that is difficult for users to manipulate, particularly when the XML is complicated. The current design loses the sense of key:value pairs which are helpful for understanding the data, and, more importantly, always parses the XML into arrays, which are difficult to manipulate in OTTL, and don't provide a stable path that can be used to access the data.
This change proposes an additional parsing version for the ParseXML function that would parse XML into a nested map, where the keys are the paths to the data, and the values are the data itself, or in the case of nodes with both values and attributes, using the special keys
xml_attributes
andxml_value
. It still maintains the order of nodes using the special keyxml_ordering
. Additionally, the proposed implementation can optionally "flatten" arrays, such as theEventData
node in the following XML, into a single map. The arrays must have a single, common attribute, and no other children (in this case,EventData
has onlyData
children). This would allow users to access the data more intuitively and would allow for easier manipulation of the data in OTTL.This proposed change also includes a new function,
MarshalXML
, to reconstruct the XML from the parsed map. This would allow users to manipulate the data in OTTL, and then convert it back to XML, typically for ingestion into another system.The proposed implementation would be backward compatible with the current implementation, allowing the user to specify the new implementation as an optional
version
parameter to the ParseXML function.Consider the following XML from a Windows event log, a common telemetry source:
The current
ParseXML
function renders this output, in a flattened view, using dot-notation:This view is extremely opaque, and difficult to manipulate in OTTL. The current design also doesn't provide a stable path that can be used to access the data. For example, the path to the
EventID
ischildren.0.children.1["content"]
, which is not intuitive.Proposed implementation:
The proposed implementation would parse the XML into nested maps. In the case where a node has multiple children with the same name, the children would be stored in an array. If a node has no attributes or children, its value would be stored as a string. Otherwise, it would be stored in the special key
xml_value
. Thexml_attributes
key would store the attributes of the node, and the special keyxml_ordering
would store the order of the children of the node, which is important for marshaling the nodes in the same order back into XML. For example, the schema definition for<EventData>
in a Windows Event log specifies that the<Data>
nodes are a sequence (<xs:sequence>
), so the order of the nodes should be preserved:We prefix the special keys with
xml_
to avoid conflicts with the actual data. It is a limitation that if the original XML uses tagsxml_value
,xml_attributes
,xml_ordering
, orxml_flattened_array
, it will not parse correctly. A future enhancement could be to add an optional parameter to the function to allow users to specify a different prefix.Additionally, the proposed implementation can optionally "flatten" arrays, such as the
EventData
node in the XML, into a single map. The arrays must have a single, common attribute, and no other children (in this case,EventData
has onlyData
children). This would allow users to access the data in a more intuitive way, and would allow for easier manipulation of the data in OTTL.Flattened Parse
MarshalXML
The MarshalXML function is able to reliably reconstruct the XML, with the limitation imposed by the GO XML library, which doesn't support self closing tags (In this example, tags like
<Security />
.) The Unit tests for the MarshalXML function use theParseXML
function to parse the XML, and then compare the output of theMarshalXML
function to the original XML. The testXML
purposely omits self-closing tags.Link to tracking Issue: #35076
Testing:
Added Unit tests for the new version, previous tests still pass for version 1.
Documentation:
Added documentation for the new parse version to pkg/ottl/ottlfuncs/README.md