New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Provide ability to ignore namespaces when parsing #130
Comments
The way it is intended to be achieved is by implementing your own subtype of the policy ( Lines 136 to 147 in 5de8121
But I'll give you that this is not the cleanest way to solve your particular issue (where you just want to provide the name to use given a particular input and context). I'll keep this open to consider a nicer way to handle that case. Of course you could use a filtering XmlReader instead that would just do the mapping outside of serialization. |
I couldn't get it working to ignore or change the namespace On this PR which gets a versioned namespace, 1, 5, or 7. I settled for
|
For this particular case I would go with a filter before getting to serialization. Something in line with /examples/DYNAMIC_TAG_NAMES.md, (perhaps you only need the reader here, not the other "magic" that makes it work transparently). This would allow you to "filter" the xml input/output to remove namespaces. Using the fallback can work too, but would be more complex/less efficient. |
I can't justify that, so I'll stick with a search and replace. Thanks for the example. |
Actually the filter can be fairly simple. Much less than that example. You just handle tags and "rename" them. It's effectively structural search and replace. Most of the complexity in "https://github.com/pdvrieze/xmlutil/blob/master/examples/src/main/kotlin/net/devrieze/serialization/examples/dynamictagnames/DynamicTagReader.kt" is to do with the dynamic introduction of attributes and a lot of mess there. In your case you could just override the |
I'll take a look when I get some time. thanks for providing the Ktor XML implementation, it's saved me twice now. |
You're welcome. Btw. for ktor, when going to version 2, use the binding provided by ktor. The module in my project is now officially deprecated (left around only for those still using older versions). |
The problem I'm seeing with the approach in #130 (comment) is that it doesn't seem to propagate the overridden namespace to children elements - I'd have to do that manually. That is, with a simple delegating reader like:
the override happens successfully for I guess I could do this by maintaining a stack of namespaces, pushing and popping at the start and stop of each element, respectively. But that feels more complicated than the find-and-replace I have now. |
The way you would do it is to have a mapping from qname to qname (probably only the namespace). Mapping from localname only leads to all kinds of issues. The parser will/should present the correct namespace even for child types. If you want to handle "triggers" you'll have to do that based upon the depth of the reader (and reset it on an end tag of the initial depth. |
You're referring to |
I mean to override namespaceURI (as well as name) to return whatever you want it to (or an empty string). But you do this by matching the original namespaceURI, not merely the local name. But you need to be consistent. |
I'm not 100% sure I follow, but I think what you're essentially saying is that we need to override namespaceURI for the child elements, not just the root elements where the namespace is specified in the XML - by matching by the "invalid" namespace in namespaceURI's getter, rather than just using the localName for the impacted root elements. If so, I think this falls apart in my case because there are multiple hierarchies, e.g. the root tag has expected namespace A, but it has two children, one with expected namespace B and one with expected namespace C. And one of the scenarios that I'm trying to handle is that the namespaces are just omitted from the XML entirely. So if I just map from namespaceURI, and it is blank, I can't know whether to return B or C unless I also track the namespace I returned most recently from a parent, which means I have to maintain a stack of namespaces. |
Basically the filter works at quite a low level. It doesn't retain any scope. So if the namespace to use is variable you must handle that in each place that namespace is returned (potentially even on attributes). Tracking the namespace wouldn't be too difficult (store it in relation to depth and remove it when the depth is lower than the recorded depth of initialisation). An alternative is to just unify the namespaces (which effectively ignores it) by (for example) always returning the same namespace. If you want none at all (effectively ignoring namespaces) you would just return the empty string for all namespaces and all prefixes. |
I think the alternative you suggest doesn't work because then the data classes used for serialization would need to not declare any namespaces either. I want to be accepting (ignore namespaces) when parsing, but strict (provide valid namespaces) when writing (per the robustness principle). This appears to be working, per your first suggestion. I'm probably making some simplifying assumptions based on the XML I expect to see. I'm not sure if there's a simpler way, or if there's a gap here I'm not seeing: private class NamespaceNormalizingReader(reader: XmlReader) : XmlDelegatingReader(reader) {
private val namespaceDepthStack = ArrayDeque<Pair<String, Int>>()
override val namespaceURI: String
get() {
while ((namespaceDepthStack.lastOrNull()?.second ?: 0) > depth) {
namespaceDepthStack.removeLast()
}
val newNamespace = when (localName) {
"crossword-compiler-applet" -> CCA_NS
"crossword-compiler" -> CC_NS
"rectangular-puzzle" -> PUZZLE_NS
else -> null
}
newNamespace?.let {
namespaceDepthStack.addLast(it to depth)
}
return namespaceDepthStack.last().first
}
} I'm on the fence as to whether this is better than the simple find-and-replace I had before, but it's probably a bit more clean/robust. On the other hand, it's not as simple as the ideal API - either a simple boolean "ignore namespaces when parsing" or a way to override namespaces on a per-tag basis and have that propagate to children tags/attributes unless a new namespace appears. |
For ignoring the namespace when parsing, you could use a different policy that doesn't give namespaces in any case (for reading only). I'm not sure whether that would also fit the particular problem (if namespaces are actually needed). Dealing with broken XML is always a mess. |
@jpd236 thank you for code snippet. Just today I faced with the same problem. |
I'm dealing with parsing XML in the wild that is occasionally inconsistent about specifying the correct/expected namespace for certain tags, or for that matter, any namespace at all. While I'd like serialization to include the correct namespace, when deserializing, I really only want to look at the tag name and can safely ignore the namespace value altogether. (Most other applications parsing this particular type of XML file already do so).
I'm not seeing an easy way to accomplish this; I don't think an unknown child handler can work here because the child isn't treated as unknown; it matches the expected
@Serializable
class with that tag name but then fails because the namespace doesn't match the expected one for that class, resulting in this error.Did I miss an API for this? If not, this would be a helpful feature request. In the mean time, I've resorted to manually doing find-and-replace tweaks to the raw content to try to normalize the namespaces before parsing.
The text was updated successfully, but these errors were encountered: